Static type checking in the programmable programming language (Lisp)

Martin Cracauer
11 min readMar 30, 2019

--

There is a misconception that Lisp is a dynamically typed language and doesn’t offer compile-time type checking. Lisp being the programmable programming language that is of course not true.

(defun meh (p1)
(declare (fixnum p1))
(+ p1 3))
(defun meh2 (p1)
(declare (fixnum p1))
(+ p1 "8"))

Gets you this:

2 compiler notes:typecheck-demo.lisp:7:3:                                                        
note: deleting unreachable code
warning:
Constant "8" conflicts with its asserted type NUMBER.
See also:
SBCL Manual, Handling of Types [:node]
Compilation failed.

You can declare pretty much all the things that you would in regular statically typed languages. Function interface:

(defun meh3a (p1)
(+ p1 3))
(declaim (ftype (function (fixnum) t) meh3a))
(defun meh3b ()
(meh3a "moo"))
==>
2 compiler notes:
typecheck-demo.lisp:13:3:
note: deleting unreachable code
warning:
Constant "moo" conflicts with its asserted type FIXNUM.
See also:
SBCL Manual, Handling of Types [:node]
Compilation failed.

Or you can pull the declaration in front so that the definition of first function is already checked against the type declaration when the function is defined:

(declaim (ftype (function (string) t) meh4a))
(defun meh4a (p1)
(+ p1 3))
(defun meh4b ()
(meh4a "moo"))
2 compiler notes:
typecheck-demo.lisp:18:3:
note: deleting unreachable code
warning:
Derived type of P1 is
(VALUES STRING &OPTIONAL),
conflicting with its asserted type
NUMBER.
See also:
SBCL Manual, Handling of Types [:node]
Compilation failed.
;; note that this is the first function failing
;; the second one compiles fine.

Now, obviously this isn’t the most convenient syntax. But that is easy to fix in a programmable programming language that has compile-time computing.

So what we want for this demo is the ability to put the types of the arguments directly into the argument list. A function definition would look like this:

(defunt meh5c ((int p1) (int p2))
(+ p1 p2))
(meh5c 1 2) ; ==> 3

So we have a new name instead of defun and we want to write one piece of compile time computation that takes care of both the actual defun and declaring the types. After we write that macro we just want to write functions in the new syntax.

Here is a macro that does that:

(defmacro defunt (name (&rest args) &body body)
"defun with optional type declarations"
`(progn
(declaim (ftype
(function
,(let (declares)
(dolist (arg args)
(push
(if (listp arg)
(if (equalp (string (first arg)) "int")
'fixnum
(first arg))
t)
declares))
declares)
t) ,name))
(defun ,name
,(loop for arg in args
collect
(if (listp arg)
(second arg)
arg))
,@body)))

I’ll explain a bit more in a bit, let’s just test this thing:

;; simple use with no type declaration
(defunt meh5a (p1 p2)
(+ p1 p2))
;; one declared type, not the other
(defunt meh5b ((int p1) p2)
(+ p1 p2))
;; make sure this works
(defun meh5btest (p1)
(+ p1 "8"))
(defunt meh5c ((int p1) (int p2))
(+ p1 p2))

Compiling meh5test results, as expected, in

typecheck-demo.lisp:81:3:                                                       
note: deleting unreachable code
warning:
Constant "8" conflicts with its asserted type NUMBER.
See also:
SBCL Manual, Handling of Types [:node]
Compilation failed.

Okay, so you wrote that macro once and from then on you can just use it. The macro looks a bit hard to read for programmers who are not yet used to Lisp. I can tell you that I wrote that macro, instead this entire post, while waiting on a flight going from the AmberMD dev meeting to the European Lisp Symposium.

It’s quick. And there are competent debugging facilities. I will demonstrate the debugging later.

Being able to debug compile time computing is critical. Can you imagine single-stepping through a C preprocessor macro, or through C++ template evaluation? Or use debugging printf (otherwise known as the only real debugger) in C or C++ at compile time?

Here is the same macro with some comments:

(defmacro defunt (name (&rest args) &body body)
"defun with optional type declarations"
;; backtick enters "code echoing", do not
;; evalluate when macro is run
`(progn
;; this macro emits two definitions to
;; the compiler:
;; 1 - the declaim type declarations
;; 2 - the defun
(declaim (ftype
(function
;; comma inside backtick means
;; "execute when macro is run"
,(let (declares)
(dolist (arg args)
(push
(if (listp arg)
;; also translate for C people, but case
;; independent
(if (equalp (string (first arg)) "int")
'fixnum
(first arg))
t)
declares))
;; return this list, which is integrated
;; into the code. You see that is why we
;; have so many parenthesis. Because what
;; is a passive data list here turns into
;; code, without any reformatting.
;; Read below for a step-by-step explanation
;; of what is going on
declares)
t) ,name))
(defun ,name
;; use the loop macro here for the same purpose we
;; manually collected the args with dolist and
;; push above
,(loop for arg in args
collect
(if (listp arg)
(second arg)
arg))
,@body)))
;; the result of the above macro is then fed into the compiler.
;; So we use three levels of evaluation time here:
;; 1 - when the macro is run
;; 2 - the output of the macro call, which is fed into the compiler
;; 3 - run time, when you actually call the resulting function

The concept behind this is:

  • a single macro call can issue several new statements. We use this to emit both the declare and the defun, from a single user-issued definition
  • inside the macro you can control which code runs at macro call time and what gets emitted to the compiler. Keep in mind that you have the full language at your disposal at both times. One reason why these macros can be a bit hard to read is that you explicitly need to change between the different evaluation times, and the code has the same syntax. A C preprocessor macro or a C++ template use different languages at compile and run time, so it is a bit more clear what is evaluated when. Of course you cannot use your usual library at compile time like you can in Lisp
  • unless you do say otherwise the code inside the macro runs at macro expansion time. The return value is what is fed into the compiler. The return value much be a list, a list of Lisp language statements. That is why in Lisp you have to use list syntax for code. Got that? It is important. You construct this nested-parenthesis thing that is a list at compile time, and you feed it into the compiler, so the list becomes code.
  • a tick (‘) or a backtick (`) leaves things as lists and does not evaluate it at macro call time. That is how you return a list (which is data, not evaluated) from the macro (which is then fed into the compiler).
  • A comma (,) can be used inside a backticked (`) block to switch evaluation time back to macro call time. Your comma block also returns lists, and they become integrated into the backticked block.
  • The ,@ construct removes one list nesting, so it turns ((foo bar)) into (foo bar). You often need that when you return collections of items from the macro’s own variables or comma block. This is getting into the depths of macros and some obscure syntax, but it is not particularly hard to get right given the debug tools.

So, if you want to write such macros:

  • get an editor that has auto-indentation for Lisp code. It also needs to show you matching open parenthesis when you type a closing parenthesis. No Lisp programmer positions those parenthesis by hand. This is important. Trying to count these things by hand is maddening.
  • have some books ready. Paul Grapham’s “On Lisp” is a free book that explains Lisp macros really well. Going along with this blog post alone is probably tough

Debugging

Let us focus on the hack we introduced into the macro to please C programmers by accepting “int” as a type specifier. Those hacks are always the beginning of interesting debugging sessions.

(defmacro defunt (name (&rest args) &body body)
"defun with optional type declarations"
`(progn
(declaim (ftype
(function
,(let (declares)
(dolist (arg args)
(push
(if (listp arg)
;; <== HACK HERE, with typo
(if (equalp (string (first arg)) "intt")
'fixnum
(first arg))
t)
declares))
declares)
t) ,name))
...)

As you can see, comparing the user string involved a careful choice of conversions and the right comparison operator. Let’s say you have a problem in that the comparison never matches and you want to debug.

First thing is macroexpand:

(macroexpand '(defunt meh5 ((int p1) p2) (+ p1 p2)))
==>
(PROGN
(DECLAIM (FTYPE (FUNCTION (T INT) T) MEH5))
(DEFUN MEH5 (P1 P2) (+ P1 P2)))

OK, so that doesn’t tell us much. It says that the “int” wasn’t recognized, but we don’t know why.

Now, since we have the entire language at our disposal at compile time we can use debugging print:

(defmacro defunt (name (&rest args) &body body)
"defun with optional type declarations"
`(progn
(declaim (ftype
(function
,(let (declares)
(dolist (arg args)
(push
(if (listp arg)
;; <== HACK HERE, with typo
(if (equalp (string (print (first arg))) "intt")
'fixnum
(first arg))
t)
declares))
declares)
t) ,name))
...)

So we just put a (print …) around the value we want to know about. It will be printed at macroexpand time. (print …) returns its own argument, so you can just insert it into the program flow.

More extensive:

(format t "~%debug args: ~a ~a" arg (type-of arg))
(if (listp arg)
;; <== HACK HERE, with typo
;;(if (equalp (string (print (first arg))) "intt")

That’s using an actual printf statement and gives you both the value and the type of the argument.

(macroexpand '(defunt meh5 ((int p1) p2) (+ p1 p2)))
==>
debug args: (INT P1) CONS
debug args: P2 SYMBOL
;; expansion follows

So you can annotate the flow of the macro as you see fit. You have the full language.

This printing at macro call time does not influence the result in any way. Whether you have debug statement in the macro or not, the result is properly fed into the compiler, while the debug print statements just go to stdout or stderr or debugerr or whatever you specified.

In fact your can also open a file at compile time and put all your print statements from macroexpansion into it, for later reference.

(defvar *expandlog* nil)
(setf *expandlog*
(open "expand.log" :direction :output
:if-does-not-exist :create
:if-exists :append))
(defmacro defunt (name (&rest args) &body body)
"defun with optional type declarations"
`(progn
(declaim (ftype
(function
,(let (declares)
(dolist (arg args)
(format *expandlog*
"~%debug args: ~a ~a"
arg (type-of arg))
(push
(if (listp arg)

You can do that a million different ways. For example if you only want the log around some macro invocations:

(defvar *expandlog* nil)
(defmacro defunt (name (&rest args) &body body)
;; as above)
;; use macro without logfile printing
(defunt2 meh6a (p1 p2)
(+ p1 p2))
;; use macro with logfile printing
(setf *expandlog*
(open "expand.log" :direction :output
:if-does-not-exist :create
:if-exists :append))
(defunt2 meh6b ((int p1) p2)
(+ p1 p2)))
(close *expandlog*)
;; important. (format nil ...) works fine, print nothing
;; (format closedfd ...) does not work
(setf *expandlog* nil)

file “expand.log” contains:

debug args: (INT P1) CONS
debug args: P2 SYMBOL

Which is only the output from the second call to meh6b, which is what we want.

It would be better style to put a (if *expandlog* (format *expandlog*…) in the defunt2 macro. (format nil …) works, but does the formatting and then sends it to /dev/null. The formatting can cost time. Keep in mind that this time is taken not at runtime, but at macroexpand time. The user wouldn’t suffer from the slowdown, just the lazy programmer.

Since (format meh …) does not work when meh is neither an open, working file descriptor nor nil nor t, you should guard this in a block that ensures that some statements happen even if nonlocal exit happens. This is equivalent to :finally clauses in other languages exception handling:

(unwind-protect
(progn
;; protected body
(setf *expandlog*
(open "expand.log" :direction :output
:if-does-not-exist :create
:if-exists :append))
(defunt2 meh6b ((int p1) p2)
(+ p1 p2)))
(progn
;; these statements will be executed no matter what.
;; If the protected body has a nonlocal exit (throws
;; an exception through here, or if you interrupt
;; it with Control-C then these statements are still
;; executed like in a "finally" clause
(close *expandlog*)
(setf *expandlog* nil)

So, yes, you have the full language at compile time, even exception handling at compile time.

A quick look at performance

(declaim (ftype (function (fixnum fixnum) fixnum) moo1)
(inline moo1))
(defun moo1 (p1 p2)
(+ p1 p2))
(defun caller1 ()
(let ((n 42)
(new (+ (moo1 1
;; disable compiler optimization
(the fixnum
(parse-integer "2"))))))
(declare (fixnum n new))
(if (= new 45)
(print 'yes)
(print 'no))))

Disassemble it:

Yes, Master? CL-USER> (disassemble 'caller1)

; disassembly for CALLER1
; Size: 78 bytes. Origin: #x52E3FF56
; 56: 4883EC10 SUB RSP, 16 ; no-arg-parsing entry point
; 5A: 488B158FFFFFFF MOV RDX, [RIP-113] ; "2"
; 61: B902000000 MOV ECX, 2
; 66: 48892C24 MOV [RSP], RBP
; 6A: 488BEC MOV RBP, RSP
; 6D: E8065BE6FF CALL #x52CA5A78 ; #<FDEFN PARSE-INTEGER>
; 72: 4883C202 ADD RDX, 2
; 76: 4883FA5A CMP RDX, 90
; 7A: 7514 JNE L0
; 7C: 488B157DFFFFFF MOV RDX, [RIP-131] ; 'YES
; 83: B902000000 MOV ECX, 2
; 88: FF7508 PUSH QWORD PTR [RBP+8]
; 8B: E94831EBFF JMP #x52CF30D8 ; #<FDEFN PRINT>
; 90: L0: 488B1579FFFFFF MOV RDX, [RIP-135] ; 'NO
; 97: B902000000 MOV ECX, 2
; 9C: FF7508 PUSH QWORD PTR [RBP+8]
; 9F: E93431EBFF JMP #x52CF30D8 ; #<FDEFN PRINT>
NIL
Yes, Master? CL-USER>

About what we expected. Ignore the function calls to convert the integer and look at the actual arithmetic. Looks good.

Note that although SBCL normally uses tag bits to identify integers it has omitted type tagging in this situation since none of the integers involved escape this function. So they are raw machine works, no tags. If they came in as parameters or going out as return values they would have to be tagged.

This can be very important, and I mean business building imports, when your Lisp code wants to access C data in a memory mapped region. Say:

struct foo {
int bar;
int baz;
} array[1024];

SBCL and Clasp can access those slots in their structure instances inside that array via a simple pointer deference and then using the machine word. There is no conversion, and — more importantly- there is no need to create any sort of “foreign object” wrapper that would have to go through memory allocation and whatnot. It is very rare for languages to be able to access raw data in memory regions this way. In most cases that plan to “write the low level parts in C and the higher level algorithms in <foolang>” breaks down because of the need to convert the C data, in most cases involving something that cause memory allocation. That’s very slow. A malloc call, or a GC equivalent alloc and dealloc is incredibly much more expensive than a pointer deference and arithmetic on a raw word.

You cannot make those mixed language systems the way that ITA did with QPX (C data, Lisp algorithm) without overhead-free raw word access. Even having to go through a function call (as opposed to a simple dereference) puts you back by an order of magnitude.

--

--

Martin Cracauer

Writing about programmer productivity, teambuilding and enabling a wide variety of different engineering personalities. CTO at www.thirdlaw.tech #lisp #freebsd