Language Wheel — language engineering for everyone

Mikhail Barash
Mar 16, 2018 · 13 min read
Shouldn’t prototyping language constructs be as easy as choosing colors on a color wheel?

After teaching a university course on domain-specific languages (http://dsl-course.org), I found myself puzzled with two questions:

  • how to get more people interested in the rapidly developing field of DSLs and language-oriented programming, and
  • how to better explain all intricacies of language implementation?

Let’s focus on the second point. Who would have fun implementing a grammar for expressions, with all left-factoring, operator precedence, not to mention the type system? It seems much rewarding to focus on the constructs that constitute your language, that is, make your language specific. After all, expressions are present in many languages and one would usually expect them to be the same everywhere. (An exception here is MUMPS, where there is no priority of operations, but that’s really an exception.) One could argue that the language of expressions could be embedded into another language (for example, JetBrains MPS enables this), thus leaving out the need to re-implement expressions over and over again. That’s certainly true, but such solution would hinder the very idea on how to implement expressions; after all, that’s what am I focusing on in my course — students have to be able to implement languages from scratch.

Scratch — a LEGO of programming

Pupils doing programming at an Hour of Code event organized by Turku Centre for Computer Science (Finland).

A lot of effort is made nowadays to simplify programming for lay programmers, trying to make programming intentional and “codeless”. Think about Scratch, building a program resembles playing with LEGO construction set.

Scratch is an example of a projectional editor.

But what about language engineers themselves? Is there a Scratch-like tool for them? Could a new language be built in a LEGO-way?

I’ve recently attended to LangDev’18 in Amsterdam, where R. Willems talked about Miksilo, a language workbench that lets you create languages by mixing existing languages and building on top of them. I liked the idea very much, but using Miksilo still requires quite a bit of previous experience with technologies around language construction.

Excel — programming where code is hidden

Cell F5: its illustration (value 5) and its program (=F3+F4).

Now think about defining concrete syntax for a certain language construct, say, if statement. In BNF notation:

IfStatement ::=
   "if" "(" BooleanExpression ")"
      StatementsBlock
   "else"
      StatementsBlock

Pretty abstract, right? Compare now with an illustrative definition of if:

IfStatement ::=
   "if" "(" 2 > 5 ")"
      "{"
          // statement 1
          // ...
          // statement N
       "}"
    "else"
       "{"
           // statement 1
           // ...
           // statement N
       "}"

Isn’t it exactly how you would first think about if statement — by thinking of an example of an if statement? Only then you lift the abstraction level to BooleanExpression’s and StatementsBlock’s in BNF (in fact, not even BooleanExpression but just Expression — you’ll have to check the types later, but that’s a separate story). That’s the time when you introduce meta.

Let’s do now some pretty-printing and remove all quotation symbols around keywords and — voilà — what we get is what we expect users of our language to write — an if statement. We are now on the same meta-level with our users. Could this be called model-free modeling, illustrative modeling or non-meta meta? Can we manage at all without meta? Can we hide it (foremost, from ourselves) as we hide formulas in Excel? And even if we can, should we? Does it make sense? These are the big questions.


Introducing Language Wheel

A wheel. Doesn’t have anything to do with the content of this post :-)

A new language workbench?

The second question is: what’s your target audience? I am focusing here on developers without any previous experience with language engineering and I don’t expect them to have dependent types or covariance in their languages. That’s my deliberate choice to focus on beginners.

Finally, the third question is: what about bootstrapping? Would you define your language workbench in your language workbench? I decided to develop what I call a front-end language workbench: it will only be used to define a language, and then a semantically equivalent definition of the language (with validations, type system, code generation) will be generated for one of the major existing workbenches (Xtext, JetBrains MPS, Spoofax). It’s also my deliberate choice to “pass” a language in the hands of an established industry-level workbench.


Language Wheel is a work in progress and I will include several screenshots of current early-prototype implementation that I have done using Vaadin Framework. A lot of functionality discussed below is not implemented, many things are omitted. My goal is to present mainly the idea behind Language Wheel and get feedback on it.


What does a language have?

  • constructs (if statement, for loop statement, variable declaration, …)
  • type system (do you want to distinguish between integer and float, or having just a number is fine? or are you just fine without types at all? — that’s also a valid type system!)
  • flavor (does you language look like Java/Pascal/COBOL?)

Flavor

A (very) informal classification of language syntaxes I’ve done.

Concepts

There are three kinds of concepts in Language Wheel:

  • built-in concepts (there are only 3 of them: identifier, expression, statement)
  • predefined concepts (“standard” language concepts as, for example, in here and here)
  • custom concepts (concepts defined by the user)
Adding a predefined concept to a language in Language Wheel.

Built-in concepts are really special and we will return to them a bit later. Let’s now focus customs concepts. (Predefined concepts can be considered as a library of custom concepts shipped with Language Wheel).

A concept encapsulates both abstract and concrete syntax, in a way much similar to how a rule in Xtext does it:

Entity :
   'entity' name = ID '{'
      fields += Field*
   '}'
;

On the concrete-level syntax, this Xtext rule defines that an entity declaration should start with keyword entity, followed by an identifier, a curly bracket and so on.

On the abstract-level syntax, this rule specifies that an entity has two features: a name and a (list of) fields. If we abstract a bit further, we can skip the detail that fields is a list, and just talk about feature fields, independently on any implementation.

Definition of concept Entity in Language Wheel.

In Language Wheel, each concept of a language specifies:

  • features
  • projection
  • translation to target language

In terms of user interface, a concept is represented as a card. Features area is in the top of a card, and the rest of the area is horizontally split between projection and translation. In the screenshots in this post, the translation area is missing.

Each feature has name and type, for example, condition: expression, or true-branch-statements: statement. Cardinality of a feature is not specified here: it is left to the projection. This is in contrast to, for example, Xtext and JetBrains MPS.

When a feature is created in Language Wheel, its cardinality is not specified.

Projection

var x;

But that’s a sample of a variable declaration! Yes, and that’s exactly how it is supposed to be. The user does not see an abstract-ish definition of concrete syntax —

"var" ident ";"

— but rather is exposed to a “sample calculation” (recall now Excel); in this case, ident is illustrated as x.

Projection can contain several lines, and each line contains several words, or elements: either a keyword, and indent, or one of the features of the concept. When a feature F of type C is added to a projection, it’s shown as a sample of concept C.

Projectional definition of illustrative syntax in Language Wheel.

On the implementation level, this sample is:

  • for identifiers, a string automatically generated from the regular expression that defines an identifier;
  • for expressions, an expression automatically generated from the implicit grammar that is present when a concept expression is defined (see below);
  • for custom concepts, it could be a comment —
/* for loop statement */
Sample of a concept in Language Wheel.

I think the phrase “abstract-ish concrete syntax” describes what is known as “concrete syntax” and it brings to the surface the oxymoron we implicitly mean by talking about concrete syntax. Should we thus talk about very concrete syntax? Or illustrative syntax?

When illustrative syntax is defined in the way described, it’s easy to generate formatter in Xtext and cell projection in MPS.

Let’s return to the notion of language flavor I discussed before. After we have specified abstract syntax of a concept, wouldn’t it be handy to have several suggestions for concrete syntax?

“Automatic definition” of concrete syntax in Language Wheel.

An interesting idea is to use machine learning to define appropriate concrete syntax based on the concrete syntax of concepts defined so far. Another idea is to use machine learning to get the concept that is being defined, based on the list of its features.

Besides defining illustrative syntax (let me stick to this term to clearly distinguish it from concrete syntax in its usual sense), projection in Language Wheel allows to define “constraints” on features, mainly, cardinality.

Cardinality of feature fields is specified in projection.

You might be now surprised: parts of the type system of a language are also defined in Language Wheel projections. Before going into details, let’s talk about how a type system is defined for a language in Language Wheel.

Type system

  • type system card
  • concept expression
  • projections of predefined and custom concepts

Type system card contains a list of types that are present in a language.

Type system card in Language Wheel.

Each type has a presentation (visible name) and meaning (one of the predefined ones).

Adding a new type to type system in Language Wheel.

Typing of identifiers is done every time a feature of type identifier is used in a projection. For example, in a illustrative syntax definition of integer variable declaration —

    name:identifier
int x;

— it is possible to assign type number to feature name.

Example of how identifier is typed in projection in Language Wheel (see first line of projection, x after for has type variant).

For an identifier, its type can be inferred or checked that it is compatible with a type of another feature in a concept (for example, types of identifier and expression in an assignment statement should be compatible).

Defining a type for an identifier in a projection in Language Wheel.

Before going into details on how expressions are typed, let’s have a look at built-in concepts in Language Wheel.

Identifiers

Built-in concept identifier.

When a feature of some concept is of type identifier, the projection allows to specify whether this identifier is introduced or refers to an already existing identifier. This corresponds to cross-references in Xtext and concept references in MPS.

Defining cross-references in projection.

Expressions

  • primary (primitive) expressions;
  • operations that can be used in expressions, with their priorities and associativity;
  • typing templates (number+number=number, number+text=text);
  • whether expressions can contain identifiers;
  • whether expressions can be nested;
  • whether priority of operations should be taken into account.

This specification will be later used to define a correct implementation (for example, a grammar in Xtext) when language definition is converted to a definition in one of the existing language workbenches.

How primary expressions are defined in Language Wheel.

To define a primary expression, the user has to define:

  • its type (one of the types previously defined in the type system card);
  • the name of the primitive expression (will be used when converting to grammar or MPS concepts);
  • its sample, to enable illustrative definition of syntax.
Defining a primitive expression in Language Wheel.

An operation has a presentation, meaning (among built-in list of possible meanings), and priority.

Operations in expression in Language Wheel.

After the list of types in a language and list of operations in expressions are defined, it is possible to define typing templates.

Typing templates in Language Wheel. In the second line (number+text=text), the fact that number has to be coerced to text is not specified. I would appreciate feedback on how to specify it here in a nice way.

Each typing template is defined in a straightforward way: the user needs to specify types of left and right operands, the operand itself and the resulting type.

Translation (code generation)

  • copy feature value verbatim as given in the source language;
  • translate feature into target language using the translation definition as defined.

Some examples of how translation definition could look like in Language Wheel are given in the presentation at the bottom of this post.

A very interesting idea is to use machine learning to define a translation from the source language into target language. Would language workbenches of the future be like this: given a sample code in the language a user wants to define, given a corresponding sample code the users wants to get, everything else is machine learnt? :-) At least, it is possible to automatize formatting now.


In a nutshell

  • Language Wheel is a “front-end” language workbench: after a language is defined, this definition will be converted to a definition in existing language workbench, such as Xtext, MPS, or Spoofax.
  • Language Wheel is aimed at beginners. That’s why issues of, for example, language modularity or language evolution, are not taken into account.
  • A language is defined as a set of concepts, and each concept is represented as a card with abstract syntax, illustrative syntax (a.k.a. concrete syntax) and translation to target language.
  • Illustrative syntax is defined in a projectional editor. Besides textual representation of syntax, projection also encapsulates constraints on abstract syntax, and parts of the type system of a language.
  • Language Wheel is web-based and its current prototype implementation uses Vaadin Framework.

I am looking forward to your comments and suggestions!

Mikhail Barash

Written by

Programming languages, JetBrains MPS, Kotlin, Xtext. Ph.D. in parsing. https://twitter.com/mikhail_barash