Language Wheel — language engineering for everyone

Shouldn’t prototyping language constructs be as easy as choosing colors on a color wheel?

After teaching a university course on domain-specific languages (http://dsl-course.org), I found myself puzzled with two questions:

  • how to get more people interested in the rapidly developing field of DSLs and language-oriented programming, and
  • how to better explain all intricacies of language implementation?

Let’s focus on the second point. Who would have fun implementing a grammar for expressions, with all left-factoring, operator precedence, not to mention the type system? It seems much rewarding to focus on the constructs that constitute your language, that is, make your language specific. After all, expressions are present in many languages and one would usually expect them to be the same everywhere. (An exception here is MUMPS, where there is no priority of operations, but that’s really an exception.) One could argue that the language of expressions could be embedded into another language (for example, JetBrains MPS enables this), thus leaving out the need to re-implement expressions over and over again. That’s certainly true, but such solution would hinder the very idea on how to implement expressions; after all, that’s what am I focusing on in my course — students have to be able to implement languages from scratch.

Scratch — a LEGO of programming

Now about the first question I asked in the beginning. Let’s rephrase it from “how to get people more interested in language-oriented programming” to “how to make people less scared of it”, as a first iteration. Many still think the old-way that designing and implementing a new language is something that requires a lot of knowledge about parsing and LALR(1) grammars and at least a doctoral degree :-) Similar story was with programming when it just appeared, wasn’t it?

Pupils doing programming at an Hour of Code event organized by Turku Centre for Computer Science (Finland).

A lot of effort is made nowadays to simplify programming for lay programmers, trying to make programming intentional and “codeless”. Think about Scratch, building a program resembles playing with LEGO construction set.

Scratch is an example of a projectional editor.

But what about language engineers themselves? Is there a Scratch-like tool for them? Could a new language be built in a LEGO-way?

I’ve recently attended to LangDev’18 in Amsterdam, where R. Willems talked about Miksilo, a language workbench that lets you create languages by mixing existing languages and building on top of them. I liked the idea very much, but using Miksilo still requires quite a bit of previous experience with technologies around language construction.

Excel — programming where code is hidden

Everyone has had some experience with spreadsheets. You have a bunch of cells, you enter some fixed values into some of them, and for some cells you define a formula that can depend on values of other cells in a spreadsheet. The “trick” is that those cells with formulas still look like “normal” cells, and the “code” (that is, the formula you defined for it) is only visible when you inspect the cell. This is a famous M. Fowler’s example of what illustrative programming is — you see the immediate result of computation, and not the program (which is “abstract”).

Cell F5: its illustration (value 5) and its program (=F3+F4).

Now think about defining concrete syntax for a certain language construct, say, if statement. In BNF notation:

IfStatement ::=
"if" "(" BooleanExpression ")"
StatementsBlock
"else"
StatementsBlock

Pretty abstract, right? Compare now with an illustrative definition of if:

IfStatement ::=
"if" "(" 2 > 5 ")"
"{"
// statement 1
// ...
// statement N
"}"
"else"
"{"
// statement 1
// ...
// statement N
"}"

Isn’t it exactly how you would first think about if statement — by thinking of an example of an if statement? Only then you lift the abstraction level to BooleanExpression’s and StatementsBlock’s in BNF (in fact, not even BooleanExpression but just Expression — you’ll have to check the types later, but that’s a separate story). That’s the time when you introduce meta.

Let’s do now some pretty-printing and remove all quotation symbols around keywords and — voilà — what we get is what we expect users of our language to write — an if statement. We are now on the same meta-level with our users. Could this be called model-free modeling, illustrative modeling or non-meta meta? Can we manage at all without meta? Can we hide it (foremost, from ourselves) as we hide formulas in Excel? And even if we can, should we? Does it make sense? These are the big questions.


Introducing Language Wheel

I will describe now my ideas on a language workbench that would allow one to define new languages by focusing on language constructs a language has. After all, this is exactly what you intend to do (recall intentional programming, declarative programming and illustrative programming all at once).

A wheel. Doesn’t have anything to do with the content of this post :-)

A new language workbench?

What’s the first question when thinking about creating a new language workbench in the era of web/cloud applications? ;-) Should it be web-based or not? My answer to this question is “Yes!” Think about Skype: do you want to bother installing and updating it or would you prefer browser-version? Why not defining new languages on an iPad?

The second question is: what’s your target audience? I am focusing here on developers without any previous experience with language engineering and I don’t expect them to have dependent types or covariance in their languages. That’s my deliberate choice to focus on beginners.

Finally, the third question is: what about bootstrapping? Would you define your language workbench in your language workbench? I decided to develop what I call a front-end language workbench: it will only be used to define a language, and then a semantically equivalent definition of the language (with validations, type system, code generation) will be generated for one of the major existing workbenches (Xtext, JetBrains MPS, Spoofax). It’s also my deliberate choice to “pass” a language in the hands of an established industry-level workbench.


Language Wheel is a work in progress and I will include several screenshots of current early-prototype implementation that I have done using Vaadin Framework. A lot of functionality discussed below is not implemented, many things are omitted. My goal is to present mainly the idea behind Language Wheel and get feedback on it.


What does a language have?

Again, my focus is on the beginners. The languages that I expect users of Language Wheel will create are, in a way, simple. If we allow some over-simplification, what does any language have?

  • constructs (if statement, for loop statement, variable declaration, …)
  • type system (do you want to distinguish between integer and float, or having just a number is fine? or are you just fine without types at all? — that’s also a valid type system!)
  • flavor (does you language look like Java/Pascal/COBOL?)

Flavor

Let’s start with flavor — this characteristic of a language is often overlooked. (I used term “characteristic” rather than “property” or “feature” because the latter will have a certain semantics w.r.t. concepts in language). If you want your language look like Java, wouldn’t it be nice to have concrete syntax of all constructs of your language converted to Java-like representation? Language flavor could affect all aspects of a language, including what kind of type system it has and what an expression is and how it should behave (recall now the MUMPS example I mentioned in the beginning). Notion of language flavor was the motivation for the name Language Wheel: you first choose the flavor of the language.

A (very) informal classification of language syntaxes I’ve done.

Concepts

Now about constructs. Are expressions and identifiers constructs of a language? Technically, they are, but the term “construct”, in my opinion, more refers to things like if statement. That’s why I suggest we forget about constructs of a language and talk about concepts of a language. Moreover, this is exactly the terminology in JetBrains MPS, and it I decided to adopt it in Language Wheel.

There are three kinds of concepts in Language Wheel:

  • built-in concepts (there are only 3 of them: identifier, expression, statement)
  • predefined concepts (“standard” language concepts as, for example, in here and here)
  • custom concepts (concepts defined by the user)
Adding a predefined concept to a language in Language Wheel.

Built-in concepts are really special and we will return to them a bit later. Let’s now focus customs concepts. (Predefined concepts can be considered as a library of custom concepts shipped with Language Wheel).

A concept encapsulates both abstract and concrete syntax, in a way much similar to how a rule in Xtext does it:

Entity :
'entity' name = ID '{'
fields += Field*
'}'
;

On the concrete-level syntax, this Xtext rule defines that an entity declaration should start with keyword entity, followed by an identifier, a curly bracket and so on.

On the abstract-level syntax, this rule specifies that an entity has two features: a name and a (list of) fields. If we abstract a bit further, we can skip the detail that fields is a list, and just talk about feature fields, independently on any implementation.

Definition of concept Entity in Language Wheel.

In Language Wheel, each concept of a language specifies:

  • features
  • projection
  • translation to target language

In terms of user interface, a concept is represented as a card. Features area is in the top of a card, and the rest of the area is horizontally split between projection and translation. In the screenshots in this post, the translation area is missing.

Each feature has name and type, for example, condition: expression, or true-branch-statements: statement. Cardinality of a feature is not specified here: it is left to the projection. This is in contrast to, for example, Xtext and JetBrains MPS.

When a feature is created in Language Wheel, its cardinality is not specified.

Projection

Projection plays central role in defining a concept in Language Wheel. First of all, it defines concrete syntax and does so in illustrative way. This is how a variable declaration could be illustratively defined in Language Wheel:

var x;

But that’s a sample of a variable declaration! Yes, and that’s exactly how it is supposed to be. The user does not see an abstract-ish definition of concrete syntax —

"var" ident ";"

— but rather is exposed to a “sample calculation” (recall now Excel); in this case, ident is illustrated as x.

Projection can contain several lines, and each line contains several words, or elements: either a keyword, and indent, or one of the features of the concept. When a feature F of type C is added to a projection, it’s shown as a sample of concept C.

Projectional definition of illustrative syntax in Language Wheel.

On the implementation level, this sample is:

  • for identifiers, a string automatically generated from the regular expression that defines an identifier;
  • for expressions, an expression automatically generated from the implicit grammar that is present when a concept expression is defined (see below);
  • for custom concepts, it could be a comment —
/* for loop statement */
Sample of a concept in Language Wheel.

I think the phrase “abstract-ish concrete syntax” describes what is known as “concrete syntax” and it brings to the surface the oxymoron we implicitly mean by talking about concrete syntax. Should we thus talk about very concrete syntax? Or illustrative syntax?

When illustrative syntax is defined in the way described, it’s easy to generate formatter in Xtext and cell projection in MPS.

Let’s return to the notion of language flavor I discussed before. After we have specified abstract syntax of a concept, wouldn’t it be handy to have several suggestions for concrete syntax?

“Automatic definition” of concrete syntax in Language Wheel.

An interesting idea is to use machine learning to define appropriate concrete syntax based on the concrete syntax of concepts defined so far. Another idea is to use machine learning to get the concept that is being defined, based on the list of its features.

Besides defining illustrative syntax (let me stick to this term to clearly distinguish it from concrete syntax in its usual sense), projection in Language Wheel allows to define “constraints” on features, mainly, cardinality.

Cardinality of feature fields is specified in projection.

You might be now surprised: parts of the type system of a language are also defined in Language Wheel projections. Before going into details, let’s talk about how a type system is defined for a language in Language Wheel.

Type system

A language defined in Language Wheel can have several type systems, but only 1 of them could be active. In Language Wheel, definition of a type system is shared between:

  • type system card
  • concept expression
  • projections of predefined and custom concepts

Type system card contains a list of types that are present in a language.

Type system card in Language Wheel.

Each type has a presentation (visible name) and meaning (one of the predefined ones).

Adding a new type to type system in Language Wheel.

Typing of identifiers is done every time a feature of type identifier is used in a projection. For example, in a illustrative syntax definition of integer variable declaration —

    name:identifier
int
x;

— it is possible to assign type number to feature name.

Example of how identifier is typed in projection in Language Wheel (see first line of projection, x after for has type variant).

For an identifier, its type can be inferred or checked that it is compatible with a type of another feature in a concept (for example, types of identifier and expression in an assignment statement should be compatible).

Defining a type for an identifier in a projection in Language Wheel.

Before going into details on how expressions are typed, let’s have a look at built-in concepts in Language Wheel.

Identifiers

To define concept identifier, the user has to specify a regular expression that would match valid identifiers. It is also possible to force uniqueness of identifiers when they are declared (of course, scoping rules should also be somehow specified).

Built-in concept identifier.

When a feature of some concept is of type identifier, the projection allows to specify whether this identifier is introduced or refers to an already existing identifier. This corresponds to cross-references in Xtext and concept references in MPS.

Defining cross-references in projection.

Expressions

To define what an expression is, the user has to define:

  • primary (primitive) expressions;
  • operations that can be used in expressions, with their priorities and associativity;
  • typing templates (number+number=number, number+text=text);
  • whether expressions can contain identifiers;
  • whether expressions can be nested;
  • whether priority of operations should be taken into account.

This specification will be later used to define a correct implementation (for example, a grammar in Xtext) when language definition is converted to a definition in one of the existing language workbenches.

How primary expressions are defined in Language Wheel.

To define a primary expression, the user has to define:

  • its type (one of the types previously defined in the type system card);
  • the name of the primitive expression (will be used when converting to grammar or MPS concepts);
  • its sample, to enable illustrative definition of syntax.
Defining a primitive expression in Language Wheel.

An operation has a presentation, meaning (among built-in list of possible meanings), and priority.

Operations in expression in Language Wheel.

After the list of types in a language and list of operations in expressions are defined, it is possible to define typing templates.

Typing templates in Language Wheel. In the second line (number+text=text), the fact that number has to be coerced to text is not specified. I would appreciate feedback on how to specify it here in a nice way.

Each typing template is defined in a straightforward way: the user needs to specify types of left and right operands, the operand itself and the resulting type.

Translation (code generation)

Translation to the target language will be defined using a similar projectional editing as for illustrative syntax of concepts. This editor would most likely have only two kinds of elements: a string constant and a feature of a concept. This resembles very much template expressions in Xtend. It should also be possible to implement some operations on features (similarly to node macros in JetBrains MPS):

  • copy feature value verbatim as given in the source language;
  • translate feature into target language using the translation definition as defined.

Some examples of how translation definition could look like in Language Wheel are given in the presentation at the bottom of this post.

A very interesting idea is to use machine learning to define a translation from the source language into target language. Would language workbenches of the future be like this: given a sample code in the language a user wants to define, given a corresponding sample code the users wants to get, everything else is machine learnt? :-) At least, it is possible to automatize formatting now.


In a nutshell

Here is a three-slide presentation about Language Wheel:

  • Language Wheel is a “front-end” language workbench: after a language is defined, this definition will be converted to a definition in existing language workbench, such as Xtext, MPS, or Spoofax.
  • Language Wheel is aimed at beginners. That’s why issues of, for example, language modularity or language evolution, are not taken into account.
  • A language is defined as a set of concepts, and each concept is represented as a card with abstract syntax, illustrative syntax (a.k.a. concrete syntax) and translation to target language.
  • Illustrative syntax is defined in a projectional editor. Besides textual representation of syntax, projection also encapsulates constraints on abstract syntax, and parts of the type system of a language.
  • Language Wheel is web-based and its current prototype implementation uses Vaadin Framework.

I am looking forward to your comments and suggestions!