D H Write — https://www.flickr.com/photos/59521130@N00/5916636166

Macros, Meta, and Logging

Howard M. Lewis Ship

--

Through my years stewarding Apache Tapestry, I always had an emphasis on descriptive exception and logging messages. Exceptions and logging went hand in hand: logging to identify the overall state of the system, and exceptions to provide detail when things went wrong … and to provide guidance and hints to the developer trying to fix the problem.

The major part of Tapestry’s exception and logging approach concerns constructing those detailed exception and logging messages. For example, rather than throw a NullPointerException after an expected value is missing from a HashMap, Tapestry would detect the missing key and throw an exception whose message and extra properties would identify what was being looked up, as well as what possible values could be selected. Ask for a component and munge the name and the exception message clearly states “Component TaxtField not found.” along with a list of all the actual component types.

I still stand by that level of sanity checking, despite the minute runtime cost. I’ll trade a few nanoseconds of processing time (even, if necessary, on every request!) to save a developer minutes or hours.

Likewise, where logging takes place, I’m used to creating log messages as readable English.

But a discussion some ways back with Stuart Halloway has led me in a different direction. Stuart champions the idea of “logs as data streams” rather than logs as textual message streams.

You can see this in the io.pedestal/log library. Logging output inside the Pedestal framework is always data. Each log event is a map of keys and values. Here’s an example from Pedestal’s io.pedestal.interceptor.chain namespace:

(log/debug :in 'leave-all :execution-id (::execution-id context))

At runtime we see messages such as the following in the console:

103741.182 DEBUG io.pedestal.interceptor.chain - {:in leave-all, :execution-id 9, :line 241}

That’s what io.pedestal.log does: it packages up those key/value pairs, adds a :line key, and writes the resulting map into the log as a string. Since it is in EDN format, it can be readily parsed and interpreted.

Now, perhaps the Pedestal team could have logged this more conventionally, with a message like "Executing leave-all stage of interceptor execution #9." But my evolved mindset leads me to the conclusion that such Human-readable text is too verbose, too distracting, insufficiently uniform, and ultimately less useful — as I do more work with log aggregation tools such as Splunk, I like having my data as data that I can easily filter and manipulate.

Back to the :line key. This turns out to be very useful; having a line number (and a namespace) for every logged statement makes working backwards from logging output to code considerably faster and less frustrating. It’s one of those little features that you wouldn’t know to demand ahead of time, but realize is all but essential once you have it.

The logging macros can identify the line number very cheaply, as every form read by the Clojure reader is provided with metadata that describes where the form came from, in terms of a line, column, and file.

By contrast, for Logback to figure out the line number requires some odd JVM tricks and is expensive enough that they only recommend including the line number in the logged output where “execution speed is not an issue”.

But there’s an achilles’ heel inside log/debug: it only expects to be used from inside normal code. It doesn’t do quite the right thing inside a macro.

I found this out the hard way. For context, I’m building some code around Apache ZooKeeper and zookeeper-clj, and I’m writing tons of callbacks.

I’ve found that understanding and debugging a ZooKeeper application involves a lot of logging; most of the logging occurs inside the asynchronous callback functions matched to each zookeeper-clj API call.

Each of the callbacks follows a particular pattern:

  • log/trace the callback result when invoked
  • Dispatch, using clojure.core/case, based on the ZooKeeper result code (with values such as :ok or :connectionloss)
  • log/error if no match

For example:

(let [callback
(fn [result]
(let [{:keys [return-code context stat] :as result'}
(normalize-result result)]
(log/trace :event :on-task-data :result result')
(case return-code

:ok
(swap! *state assoc-in [:tasks context :created-at]
(:ctime stat))

(log/error :event :on-task-data :result result'))))])

That’s not just a lot of boilerplate, it’s a lot of tedious, repetitive boilerplate. The ceremony is completely overloading the essence, and I have at least a dozen of these callback functions.

I typically follow the first rule of writing macros: don’t write macros, but this feels like an exception to that rule. My goal is a macro that lets me write the above, more succinctly:

(let [callback
(result-callback :on-task-data {:keys [context stat]}

:ok
(swap! *state assoc-in [:tasks context :created-at]
(:ctime stat)))])

I think the above form really lets us focus on the essence.

Here’s my first pass at writing such a macro:

(defn ^:private as-keyword
[rc]
(-> rc
KeeperException$Code/get
str
str/lower-case
keyword))

(defn normalize-result
[result]
(-> result
(update :return-code as-keyword)
(cond-> (-> result :data some?) (update :data to-string))))
(defmacro result-callback
[tag sym & clauses]
`(fn [result#]
(let [normalized# (normalize-result result#)
~sym normalized#]
(log/trace :event ~tag :result normalized#)
(case (:return-code normalized#)

~@clauses

;; Otherwise:
(log/error :event ~tag :result normalized#)))
nil))

So with result-callback, the first argument, tag, becomes the :event key in the calls to log/trace and log/error. The sym is the local variable to be assigned the normalized result (it can be a binding form), and the clauses are pairs of return codes and matching expressions.

Let’s see how well this works:

(def test-callback (result-callback :test result
:ok (log/info :result result)))
=> #'user/test-callback
(test-callback {:return-code KeeperException$Code/Ok})
=> nil
132001.111 TRACE user - {:event :test, :result {:return-code :ok}, :line nil}
132001.111 INFO user - {:result {:return-code :ok}, :line 2}

result-callback is mostly working correctly: the callback is logging the result on entry, and dispatching based on the keyword version of the Ok return code. The line number for the code inside the callback is correct but the line number for the generated trace call is nil. Where did it go?

Well, the short answer is that macros consume and discard metadata on the forms they transform. The code created by the backtick form doesn’t include any metadata at all.

Here’s the trick: we’re going to get the metadata of the form (the invocation of result-callback) and graft it onto the the two logging calls. This requires just a little bit of gymnastics:

(defmacro result-callback
[tag sym & clauses]
(let [m (meta &form)
normalized (gensym "normalized")]
`(fn [result#]
(let [~normalized (normalize-result result#)
~sym ~normalized]
~(with-meta
`(log/trace :event ~tag :result ~normalized) m)
(case (:return-code ~normalized)

~@clauses

;; Otherwise:
~(with-meta
`(log/error :event ~tag :result ~normalized) m)))
nil)))

The &form is a special variable inside macros: it is the form being expanded, the call to result-callback itself. That’s a good enough hook for the :line metadata. Wrapping the calls to log/trace and log/error with with-meta is the graft mentioned above; now when those macros recursively expand, they have a :line to report:

(def test-callback (result-callback :test result
:ok (log/info :result result)))
=> #'user/test-callback
(test-callback {:return-code KeeperException$Code/Ok})
=> nil
133922.258 TRACE user - {:event :test, :result {:return-code :ok}, :line 1}
133922.259 INFO user - {:result {:return-code :ok}, :line 2}

There we go, the log/trace came up with :line 1, which makes sense for code entered directly into the REPL.

Notice that the call to with-meta is inside a ~ form: it executes at macro expansion time, not at code execution time. Just using (with-meta (log/trace ...)) would not work, since it would expand to code that invokes log/trace, then applies metadata to its return value — we need the meta data on the log/trace form before the log/tracemacro expands. Having to deal with those kinds of distinctions is the reason for the first rule of macros!

In any case, we can now have our cake and eat it too: we get the consistent logging and other behavior via the macro, but we get accurate line number reporting as well. I would expect to use this technique elsewhere under the same conditions: a mix of logging combined with verbose and repetitive boilerplate.

With Clojure there’s always a way.

--

--

Howard M. Lewis Ship

Clojure developer at Walmart labs; Apache Tapestry is my past, and my future is not yet written.