Debugging and patching Clojure code running in production using socket server and REPL

Common Lisp is well known for its interactivity. The story of NASA engineers who used remote REPL to patch Lisp code running 60 million miles away inside the interplanetary spacecraft is well known too. You can debug and patch Clojure code running in production using REPL too. This is where Clojure socket server comes in handy.

Practicle example

Let’s interact with the live web service written in Clojure and try to change the running program’s state using REPL and Clojure’s socket server. I assume you have Java 17 or later (earlier versions might work but not tested), Clojure 1.11 or later, git and curl installed on you computer to work through this tutorial.

Get sample code

Get dienstplan app (it’s opensourced):

$ git clone git@github.com:pilosus/dienstplan.git
$ cd dienstplan
$ git checkout ba1aa1f846bff8f690e14a6fbb8989ae73a10b53

Compile the code

Compile an uberjar. It produces app.jar in your working directory:

$ clojure -T:build uberjar :uber-file '"app.jar"'

Run the app with socket server

Let’s run the app with a Java option string -Dclojure.server.repl="{:port 5555 :accept clojure.core.server/repl}" needed to start a socket server with REPL on localhost with the port 5555 open. Environment variables are needed to start the app:

$ APP__DEBUG=false \
SLACK__TOKEN="xoxb-Your-Bot-User-OAuth-Token" \
SLACK__SIGN="Your-Signing-Secret" \
ALERTS__SENTRY_DSN="https://public:private@localhost/1" \
SERVER__PORT=8080 \
SERVER__LOGLEVEL=INFO \
DB__SERVER_NAME=localhost \
DB__PORT_NUMBER=15432 \
DB__DATABASE_NAME="dienstplan" \
DB__USERNAME="dienstplan" \
DB__PASSWORD="dienstplan" \
java -Dclojure.server.repl="{:port 5555 :accept clojure.core.server/repl}" -jar app.jar

Check if app is running properly

Let’s check if the app is working properly by making a HTTP GET request to its healthcheck API endpoint:

$ curl http://localhost:8080/api/healthcheck
{"status":"ok"}

$ curl -SsL -I http://localhost:8080/api/healthcheck | head -n 1
HTTP/1.1 200 OK

Connect to REPL

Now, let’s connect to the REPL socket server running alongside the app. Remember, it’s a pure REPL accessible via UDP connection, so your emacs CIDER’s nrepl, implementing more complex tooling protocol and working over TCP, won’t be able to connect to it! We can use simpler things like telnet or netcat (or nc) instead. For better line editing experience, like moving the cursor back and forth, using backspace, and history support, you may use readline library with rlwrap (readline wrap) CLI tool:

$ rlwrap nc localhost 5555
user=>

Access state of the running app

Now let’s tinker with our running app’s state from within the REPL! First, let’s take a look at the config initialized with the envorinment variables a few steps above. To do it we need to load app’s namespace dienstplan.config:

user=> (require '[dienstplan.config :as config])
nil

user=> config/config
{:application {:name "dienstplan", :version "latest", :env "production", :debug false}, :server {:port 8080, :loglevel "INFO", :access-log true, :block-thread true}, :slack {:token "xoxb-Your-Bot-User-OAuth-Token", :sign "Your-Signing-Secret"}, :alerts {:sentry "https://public:private@localhost/1"}, :db {:dbtype "postgres", :password "dienstplan", :minimumIdle 20, :username "dienstplan", :maxLifetime 1800000, :port 15432, :dbname "dienstplan", :connectionTimeout 10000, :host "localhost", :keepaliveTime 0, :maximumPoolSize 20}}

Patch the code of the runnning app

And now for something completely different! Let’s override some code in the running app! Let’s update the API handler /api/healthcheck that requested above with this:

(defmethod multi-handler :healthcheck
  [_]
  (log/warn
   "If I were to suggest that between the Earth and Mars there is a
   china teapot revolving about the sun in an elliptical orbit...")
  {:status 418
   :body {:status "I'm a teapot"}})

To do it, let’s use in-ns function to switch the current REPL’s namespace to dienstplan.endpoints where we want to override the method:

user=> (in-ns 'dienstplan.endpoints)
#object[clojure.lang.Namespace 0x50c83f27 "dienstplan.endpoints"]

dienstplan.endpoints=> (defmethod multi-handler :healthcheck
[_]
(log/warn
 "If I were to suggest that between the Earth and Mars there is a
china teapot revolving about the sun in an elliptical orbit...")
{:status 418
 :body {:status "I'm a teapot"}})
#object[clojure.lang.MultiFn 0x5d8ea5bb "clojure.lang.MultiFn@5d8ea5bb"]

Check if changes applied

Let’s make another HTTP GET request to the /api/healthcheck handler:

$ curl http://localhost:8080/api/healthcheck
{"status":"I'm a teapot"}

$ curl -SsL -I http://localhost:8080/api/healthcheck | head -n 1
HTTP/1.1 418 I'm a Teapot

Response body has changed, so did the HTTP status code. In the app logs we also see:

2023-07-07 18:52:00,665 [qtp1116648405-23] WARN  dienstplan.endpoints - If I were to suggest that between the Earth and Mars there is a
  china teapot revolving about the sun in an elliptical orbit...

Method has been sucessfully overriden!

Revert changes by restarting the process

Tinkering with the running code doesn’t change the code itself though. Restart the app process and fire another HTTP request to make sure all is back to normal:

$ curl http://localhost:8080/api/healthcheck
{"status":"ok"}

Conclusion

Socket server with REPL running along with the main app is a nice way to use various debugging tools and techniques the Clojure and third-parties provide. Code override won’t work if you use ahead-of-time compilation (doube-check your project settings, especially if you build uberjar with lein). But for on-the-fly compilation to JVM bytecode you can debug the running code without touching the code itself, right in the REPL, and revert the changes by simply restarting the app’s process.

Precautions

Running a socker server allows an intruder to connect to it over the network if an exposed port isn’t protected with a firewall, VPN and/or other access control tool. It may impose huge security risks on your app in production environment.

Patching code directly in the running app maybe the only available solution when your production environment is millions miles away and costs $150M. While patches themselves are easily reverted by restarting the process, your app’s state may be damaged irreversably.

Touching code in production is a great power, but also is a huge responsibility. Oftentimes risks of the live patching are higher than longer time needed to prepare and deploy the hotfix. You should know what you are doing.