Safely dealing with scientific units of variables at compile time (a gentle introduction to Compile-Time Computing — part 3)

There is constant debate whether to include units in variable and field names such as in “length-in-meters”.

We can do better, much better. What we want is a method where numbers coming in from source code (literals) and from files or streams are all having their unit declared, and the system converts them if and only if it is necessary to do so.

So what we are *not* doing is dragging the unit around at runtime like this:

struct double-ish {
double number;
enum unit;
}

… and then check and if necessary convert the unit on every operation at runtime.

And we want *incremental* development. We want something useful *today*, without being forced to implement ninja subsystem before we can write our production code.

For background you might want to consult the first two parts of my series on compile-time computing:


First order of business for scientific units:

  • write enough code so that when you enter literals into source code you can write down the unit *today*, although you don’t have time to do something smart about the unit today. But every time you write something in the source code you want to put the unit *today*. So that later you can do something smart about the unit *mechanism* without having to edit *all* of your previously entered numbers.

Looks like this-ish:

(defconstant +c+ (unit 299792458 m/s))
(defconstant +h+ (unit 6.62607015e-34 Js))
(defconstant +e+ (unit 2.7182882845 :none))
(defconstant +up-quark-mass+ (unit 2.3 MeV/c^2 +0.7 -0.5))
[...]

That looks like a good way to write things down *today*. As I look up all this stuff in Wikipedia I also write down additional metadata, in this case the variance. Again, it is not that I can do something about the variance today. But the data is right there and I want to enter it *today*.

So, to get things rolling today, we do the simplest implementation that allows us to do the above:

(defmacro unit (value unit &rest more)
(declare (ignore unit more))
value)

That does the following:

  • requires that a unit is given
  • prepares for additional metadata to come in, which we will do something about in a future version of the macro. Right now we just accept more arguments and throw them away

That macro makes the defconstants above work. We can start entering all numbers with units today.


Before I go any further, let me point you to the runnable code file: https://www.cons.org/cracauer/ccunits.lisp

It is hard to keep things up-to-date in the medium doc via copy-n-paste. Please refer to that lisp file for actually running examples.


Next step in simplicity is to just force everything to be SI units:

(defmacro unit (value unit &rest more)
(declare (ignore more))
(case unit
(m/s)
(Js)
(:none)
(t (error "unknown unit ~a" unit)))
value)

Make sure you understand this runs at compile time. You can use it to define those *constants*, and they will be literal constants as far as the compiler is concerned. We do the above processing at compile-time, which means it happens before the compiler compiles assembly code for any functions using these constants. Nothing at runtime. At runtime you just have the number. You just reject units you don’t know.

Writing that code took me less than 20 seconds. Given Lisp’s turnaround time I can test this code by compiling both the macro and the constant declarations in less than 1 second.

OK, so that eats up these constants:

(defconstant +c+ (unit 299792458 m/s))
(defconstant +h+ (unit 6.62607015e-34 Js))
(defconstant +e+ (unit 2.7182882845 :none))

But it doesn’t like this one, which isn’t exactly SI:

(defconstant +up-quark-mass+ (unit 2.3 MeV/c^2 +0.7 -0.5))

The error message is very clear, and it appears at *compile* time:

unknown unit MEV/C^2
[Condition of type SIMPLE-ERROR]

So what do we do? Do we just add MeV/c² into the macro? No, we had decided (for now) that our system works internally with SI units, so it is time to also teach this thing how to convert. We also change the way that the value is returned to be a bit more functional:

(defmacro unit (value unit &rest more)
(declare (ignore more))
(case unit
(m/s value)
(Js value)
(:none value)
(MeV/c^2 (* value 1.78266191e-30))
(t (error "unknown unit ~a" unit))))

That makes it process the quark mass constant. The result looks correct:

Yes, Master? CL-USER> +up-quark-mass+
4.100122e-30

Again, this is at compile time. Let’s check that. Make sure that the compiled code has no trace of unit checking:

(defun testfun ()
(+ +up-quark-mass+ +c+))
;; compile in SBCL, then disassemble:
Yes, Master? CL-USER> (disassemble 'testfun)

; disassembly for TESTFUN
; Size: 13 bytes. Origin: #x52E3B9B6
; B6: 488B15B3FFFFFF MOV RDX, [RIP-77] ; no-arg-parsing entry point
; 2.9979245e8
; BD: 488BE5 MOV RSP, RBP
; C0: F8 CLC
; C1: 5D POP RBP
; C2: C3 RET
NIL
Yes, Master? CL-USER>

Oh lookie. Not only did it only leave the numbers at run time, the compiler even recognized that there is arithmetic on constants and folded it at compile time. This is called constant folding. Constant folding is generally available in C++ and friends, too. The problem is that you need a good way to come up with all the constants in the first place, without entering the same data repeatedly, and while entering them with unit checking.

How long did it take me to write that code? Well the biggest step was entering the units into Google to find out what the conversion constant is. Luckily the Google search engine is sophisticated enough that it knows about units as exotic as MeV/c². So a total of less than a minute. Testing my new code required compiling it and running it. Thanks to the usual Lisp turnaround that took less than a second. For more see here: https://hackernoon.com/software-development-at-1-hz-5530bb58fc0e

OK, so that is useful-ish. You can just add new units that you need into the macro when new ones come up as you enter values. Overhead is limited. You never have to sit down for an evening and expand that macro for extended periods of time.

There are two improvements to do here:

  • instead of putting every combined unit (such as m/s) into the macro you can, at compile time, walk the unit declaration as a string and resolve it into the fundamental, non-combined units (such as parsing it out into meters, the / operator and seconds). That’s like actual work writing an evaluator. It would add safety since it reduces the numbers of conversion constants you have to enter into the macro. I want to do something else first though.
  • SI units kinda suck. Not as hard as imperial units but still. If I’m messing around with Quarks anyway I can use Planck units internally. They have a number of advantages, including that you don’t need to enter all those floating point values into variables with fixed precision. Planck units are robust against “changes” in e.g. c, the speed of light. In Planck units c is a fundamental building block with a value of 1.0. Nifty, eh?

OK, so Planck units it is.

Now, do we have to re-enter our constants? If we had written 300 values all over the source code and changed our mind about what the internal units of our overall system are, do we have to change those 300 pieces in the source code?

No, silly.

This whole theater is about *not* having to do this. As long as all such source code places with literals have the proper (unit …) declaration we only have to change the macro. After recompiling the entire system with the changed macro *all* internal state will be Planck units — even though *no* constants have been entered as Planck units in the first place.

That is why you want compile time computing. Regardless of what units you personally prefer, you should always enter numbers in the unit you find them in. If wikipedia gives you h in Joule*seconds you want to enter it into your source code as Joule*seconds. You do not want to mess around with a pocket calculator when entering those numbers. The error rate can only go up.

OK, so here goes:

(defmacro unittmp (value unit)
;; helper to work around recursive
;; macro call. See unit2 below
;; for proper fix
(ecase unit
(kg (/ value 2.176470e-8))))
(defmacro unit (value unit &rest more)
(declare (ignore more))
(case unit
(m/s (/ value 2.99792458e+8))
(Js (/ value 1.054571800e-34))
(m (/ value 1.616229e-35))
(s (/ value 5.39116e-44))
(kg (/ value 2.176470e-8))
(:none value)
;; this gives an error:
(MeV/c^2 (unittmp (* value 1.78266191e-30) kg))

(EDITED: it has been pointed out to me that the recursive macro call I had there there doesn’t actually work in a straight compilation of the file. See below for a proper fix, which is to move the actual work into a function running at compile-time, “unit2”. The above still serves to illustrate the points about evaluation of macro argument right here)

So far so good, but this one chokes, and we are now getting into the domain of actually having to learn how Lisp macros work. The following error will look annoying to deal with, but in the next section of this document I will show how the properly causing this error can be used for further neat tricks.

OK, so the error is this (at compile time):

The value                                                                                                                                                                                                    
(* VALUE 1.7826618e-30)
is not of type
NUMBER
when binding SB-KERNEL::X

So what’s that? If you read the error message carefully you will realize that the expression (* value 1.7) has not been resolved to a number at the time that we use it in the (now recursive) macro. Evaluation here has been delayed. But this doesn’t work. The reason why this doesn’t work is:

  • we asked the compiler to do the actual arithmetic at compile time
  • but what we are passing to the arithmetic operator is one number, and a piece of code. Since we asked to do the math right now it can’t do it.

This is not the place to teach all about macro argument evaluation. It can be fixed for now with this:

(MeV/c^2 (* value (unittmp 1.78266191e-30 kg)))

Arithmetic with unit checking at compile time.

Let us recap that annoying error needing a workaround, and why I think the mechanism behind the error is a good thing:

This bombs:

(MeV/c^2 (unittmp (* value 1.78266191e-30) kg))

because (unit …) is a macro call, and (MeV/c² (unit (* value 1.78266191e-30) kg)) is not evaluated, so the math operator (* …), which we invoke at compile time, cannot work. It passes that entire code fragment. You can walk that code fragment.

OK, let’s play with this to understand:

;; to be defined
(defmacro unit-test ...)
(defun testfun2 ()
(unit-test 1 kg))

The macro unit-test will be invoked at compile time. That means when you compile the function testfun2, actions in the macro will be carried out. We will use this to debug using the only true debug facility that exists in the world — print statements. Or in the case of Lisp, format statements.

(defmacro unit-test (value unit &rest more)
(declare (ignore more))
(format t "~%value is '~a'~%" value)
(format t "unit is '~a'~%" unit)
value)
(defun testfun2 ()
(unit-test 1 kg))

Gives you, at compile time, into the same stdout as the compiler:

value is '1'                                                                    
unit is 'KG'

Before I go any further, I want to make sure you get this.

YOU HAVE THE ENTIRE LANGUAGE AT YOUR DISPOSAL AT COMPILE TIME!!!

The entire thing. You can use printf/format statements at compile time to print random data structures that float around in your half-compiled code to debug them. Seen such a facility with templates lately?

Anyway, so let’s see what happens with that evaluation thing:

(defmacro unit-test (value unit &rest more)
(declare (ignore more))
(format t "~%value is '~a'~%" value)
(format t "unit is '~a'~%" unit)
value)
(defun testfun3 ()
(unit-test (* 1 1) kg))

Output is:

; compiling (DEFUN TESTFUN3 ...)                                                
value is '(* 1 1)'
unit is 'KG'

Oops. There you have it. (* 1 1) is passed not as an evaluated number, it is passed as a code fragment. This is awesome. I mean not always, e.g. not at Friday night when this happens in deeply nested macros and you have to debug it.

But it exposes a very powerful mechanism:

(defmacro unit-test2 (value unit &rest more)
(declare (ignore more))
(format t "~%value is '~a'~%" value)
(format t "unit is '~a'~%" unit)
(when (listp value)
(dolist (element value)
(format t "list element is '~a'~%" element)))
value)
(defun testfun3 ()
(unit-test2 (* 1 1) kg))

Output is:

value is '(* 1 1)'                                                              
unit is 'KG'
list element is '*'
list element is '1'
list element is '1'

Woah. We can walk this code. We do not only have Turing-complete code walking, we have code walking that can use THE ENTIRE LANGUAGE at compile time.

(defmacro unit-test3 (value unit &rest more)
(declare (ignore more))
(format t "~%value is '~a'~%" value)
(format t "unit is '~a'~%" unit)
(when (listp value)
(dolist (element value)
(if (and (numberp element) (= element 42))
(format t "Looks like the answer to everything~%")
(format t "list element is '~a'~%" element))))
value)
(defun testfun3 ()
(unit-test3 (* 42 1) kg))

Output:

; compiling (DEFUN TESTFUN3 ...)
value is '(* 42 1)'
unit is 'KG'
list element is '*'
Looks like the answer to everything
list element is '1'

See, we can do whatever we want.

It isn’t limited to inspecting the code. Macros are there to make new code. So let’s try this.

(defmacro unit-test4 (value unit &rest more)
(declare (ignore more))
(when (listp value)
(dolist (element value)
(when (and (numberp element) (= element 42))
(return-from unit-test4 `(progn
(dotimes (i 4)
(format t "hello, world~%"))
,value)))))
value)
(defun testfun4a ()
(unit-test4 (* 41 1) kg))
(defun testfun4b ()
(unit-test4 (* 42 1) kg))

Running it:

Yes, Master? CL-USER> (testfun4a)
41
Yes, Master? CL-USER> (testfun4b)
hello, world
hello, world
hello, world
hello, world
42
Yes, Master? CL-USER>

Woah. We actually inserted new code into the function. Can we see what is going on? Sure:

Yes, Master? CL-USER> (macroexpand '(unit-test4 (* 41 1) kg))
(* 41 1)
T
Yes, Master? CL-USER> (macroexpand '(unit-test4 (* 42 1) kg))
(PROGN (DOTIMES (I 4) (FORMAT T "hello, world~%")) (* 42 1))
T
Yes, Master? CL-USER>

That is the most basic macroexpansion debugging there is. There are much more sophisticated macroexpand debugging facilities in the IDE’s, e.g. SLIME. Finely controlled evaluation until you can figure out what’s going on.

Now, how do we use this to our advantage?

Well, we can do arithmetic with unit checking at compile time.

(defmacro plus-with-units (val1 val2)
;; fancy code here
(+ val1 val2))
;; this should work
(defun testfun5a ()
(plus-with-units (unit 5 m/s) (unit 6 m/s)))
;; this should *not* work
(defun testfun5b ()
(plus-with-units (unit 5 m/s) (unit 6 m)))
;; this can be made to work later
(defun testfun5c ()
(plus-with-units (unit 5 m/s) (unit 6 km/h)))

OK, so what is the objective here?

  • if the units are available, they should be checked. In the first version for being equal, in the more fancy version for being compatible. Either way we want to catch errors.
  • we don’t want to spend an entire night implementing this.
  • the check should happen at compile time. The compiled code should have nothing except one compiled out number in Planck units.
(defmacro plus-with-units (val1 val2)
(let (firstunit)
(dolist (thing (list val1 val2))
(when (listp thing)
(if (not firstunit)
(setf firstunit (third thing))
(unless (equal firstunit (third thing))
;; print a clear error message. Not something people
;; need to copy into a web page to translate to human
(error "Incompatible units: ~a ~a~%"
firstunit (third thing)))))))
;; delay evaluation
`(+ ,val1 ,val2))
;; works:
(defun testfun5a ()
(plus-with-units (unit 5 m/s) (unit 6 m/s)))
;; error:
(defun testfun5b ()
(plus-with-units (unit 5 m/s) (unit 6 m)))

The error as displayed for the second test at compile-time is:

crachem.lisp:209:3:                                                                                                                                                                                               
error:
during macroexpansion of (PLUS-WITH-UNITS (UNIT 5 M/S) (UNIT 6 M)). Use
*BREAK-ON-SIGNALS* to intercept.

Incompatible units: M/S M
Compilation failed.

Allright. Looks useful.

So we have the full language at our disposal, at compile-time, with printf/format and all. And we can use that do give useful error messages at compile time. Just like in real code. It is really bad when your language forces you to use a different, crippled language at compile time.

I want to wind down this post here. As you can see, the (plus-with-units …) macro is not sophisticated, it would at least have to check that the first list element is indeed “unit”. Actually, no. It should integrate with the (unit…) macro.

The way to do this further is that you change the (unit …) macro to allow the programmer to use it to learn more about the macro call. Right now you only get the converted number out of the (unit …) call. The (unit …) call knows which unit was used, but you cannot ask it to give it to you. While we are at it, the (unit …) macro could also tell us what kind of unit that is (speed, weight etc).

We use multiple-value returns for this. A function in Common Lisp can return more than one value, and unless you deliberately capture them all but the first are ignored. We also want to convert the bulk of this macro into a function, because that is easier to debug. Did I mention you can define functions and use them at compile time, from macros?

Example implementation:

;; this unit knower returns three values:
;; - the converted value
;; - the unit
;; - what kind of unit is it?
(eval-when (:compile-toplevel)
(defun unit2-helper (value unit)
(let* (whatkind
(newvalue
(case unit
(m/s (setf whatkind 'speed) (/ value 2.99792458e+8))
(Js (setf whatkind 'energy-time) (/ value 1.054571800e-34))
(m (setf whatkind 'length) (/ value 1.616229e-35))
(s (setf whatkind 'time) (/ value 5.39116e-44))
(kg (setf whatkind 'mass) (/ value 2.176470e-8))
(:none (setf whatkind 'none) value)
(MeV/c^2 (setf whatkind 'mass)
(* value (unit2-helper 1.78266191e-30 'kg)))
(t (error "unknown unit ~a" unit)))))
(values newvalue unit whatkind))))
;; this is the dumb frontend you call from regular code
(defmacro unit2 (value unit &rest more)
(declare (ignore more))
(unit2-helper value unit))

(As you can guess, Common Lisp is hygienic. defun results are supposed to be available at run time. They are not supposed to pollute the compile-time environment with their symbols that are intended for runtime. Lisp gives us a way change that, with a “(eval-when …)” statement. That is actually required when you use a properly hygienic Lisp implementation.)

The basic macro (unit2…) behaves like (unit…) did before. That is what you use when writing ordinary code. But the new toy gives us the ability to know more about the (unit2 …) calls coming in. We can use that to improve things tremendously.

(defmacro plus-with-units2 (val1 val2)
(let (firstkind)
(dolist (thing (list val1 val2))
(when (and (listp thing) (equal (first thing) 'UNIT2))
(multiple-value-bind (newvalue unit whatkind)
(unit2-helper (second thing) (third thing))
(print whatkind)
(if (not firstkind)
(setf firstkind whatkind)
(unless (equal firstkind whatkind)
;; print a clear error message. Not something people
;; need to copy into a web page to translate to human
(error "Incompatible units: ~a ~a~%"
firstkind whatkind)))))))
;; delay evaluation until later in compilation
`(+ ,val1 ,val2))
;; this now works, the code recognizes that kg and MeV/c^2
;; are both units of the same kind - mass
(defun testfun6 ()
(plus-with-units2 (unit2 5 kg) (unit2 6 MeV/c^2)))

Make sure that everything happens at compile time:

Yes, Master? CL-USER> (disassemble 'testfun6)
; disassembly for TESTFUN6
; Size: 13 bytes. Origin: #x52E3B9B6
; B6: 488B15B3FFFFFF MOV RDX, [RIP-77]; no-arg-parsing entry point
; 2.297298e8
; BD: 488BE5 MOV RSP, RBP
; C0: F8 CLC
; C1: 5D POP RBP
; C2: C3 RET
NIL
Yes, Master? CL-USER>

Looking good.


Further work:

How do you do this with values that come in via files? Easy, you wrap a macros around the line-by-line file readers that declares which field has which unit. The unit is then checked once when the file is opened, but since it knows that the code block iterating over the lines doesn’t change units the check doesn’t have to happen again. The macro would insert the conversion math into the code of the line-iterating code body. So you can do something like:

(with-unit-file-field ("foo.txt" ((mass :column 1 :unit kg))
(+ *blah* mass)) ; variable mass is converted to Planck unit
;; if you are willing to walk the body of code:
(with-unit-file-field ("foo.txt" ((mass :column 1 :unit kg))
(+ *blah* (* (unit 8 kg) mass)))
;; that would throw an error if mass wasn't a mass unit

That evaluator to do algebra on the text of the units so that you can convert them automatically after having defined only the most basic units would be nifty to have. I want this to be a more generic algebra converter, though. So, for example I don’t want to resolve mathematical functions by several variables and then enter those functions into the source code. I want to enter the function once, then tell a macro to generate a couple of function resolved by various variables. My Scheimpflug code for shift-tilt photography isn’t writing itself — but it should, and it could, in Lisp.

If you are curious what we are doing with Lisp, here is the latest demo (chemistry):