Code is Data (reasoning)
Roman Liutikov
252

Some comments.

Lisp source code is not really an Abstract Syntax Tree, since it does not represent much information about syntactic categories of the Lisp code. Lisp code is represented as s-expressions, but the actual Lisp syntax is defined on top of s-expressions. Example: names are represented as symbols, but it is not represented in s-expressions whether these symbols in the code are actually denoting functions, macros, special operators, variables, classes, tags, restarts, … or just data. It’s also not represented in the s-expressions, what the sub-expressions of a Lisp form actually mean: are they function calls, parameter lists, forms which are interpreted by a macro, …? It’s also not represented whether the code is actually syntactically correct — not all s-expressions are valid Lisp code. All this information needs to be (re-)constructed by an interpreter, a compiler or a tool like a code walker. A code walker is a tool which understands the actual Lisp syntax, incl. macros and which can for example compute the various syntactic categories for the source code. Example: is that symbol in the code used as a macro? Usually some of this information would be a part of an AST, which is the output of a parser — and thus represents (usually) syntactically correct code parsed with the syntax knowledge of the programming language and with annotated categories — see your Javascript example above. The AST would tell me the control structure and it parts, it would tell me the variables, etc. The AST might not tell me directly where a variable has been defined, but it would tell me directly that it is used variable.

In Lisp with macros, this ‘programming language grammar’ which would be used by a parser, is not fixed and can be incrementally extended by the programmer — in some implementations even at runtime. Which makes parsing even more challenging.

Note also that there are Lisp variants, which provide macros, but don’t use s-expressions in the surface syntax. It’s only necessary to have a transformation from a surface syntax to some kind of convenient data representation.

Macros are not necessarily run at compile-time. A Lisp interpreter usually does the macro expansion at runtime, possibly every time the form is executed.

The base language of Lisp (without macros) is also not ten forms and not everything else is a macro. Both Scheme and Common Lisp have more special operators. Common Lisp has 25, which is still not much and it does have no user-accessible way to define new special operators. Note that many of the language macros are not simply reduceable to just those special operators — they may use internal special operators and internal functions. Both Scheme and Common Lisp also have zillions of functions which are not macros and are not based on macros. Many of those functions are primitives in the language — they need to be implemented in the runtime somehow, or are for example provided as processor instructions of the underlying machine.

Code as data for Lisp basically means:

  • there are rich data structures which represent source code internally (as data) and externally (as text): numbers, symbols, nested lists, vectors, characters, strings.
  • the execution engine uses this data structure to process source code: both in interpreters and compilers

Still there are differences between an interpreter and a compiler: an interpreter-based Lisp implementation has much more access to the s-expression representation of the source code and thus can much simpler support introspective and meta-programming features, incl. self-modifying code. The interpreter executes Lisp code directly — without compiling it.

Macros as used in Lisp are usually defined in a way such that they can be executed both by an interpreter or a compiler. That’s why they have replaced in Lisp other forms of code transformations, which have more problems with compilation to efficient code. See: http://www.nhplace.com/kent/Papers/Special-Forms.html