Looking at code through the prism of JetBrains MPS
I am starting a series of posts explaining JetBrains MPS, a powerful tool to define and implement domain-specific languages and IDEs for them. I am trying to find a way to explain MPS to those who do not have any idea about abstract syntax trees, metaprogramming, model-to-model transformations, and other “scary stuff”.
I will start “from the middle”; not with what a domain-specific language is, not even with what parsing is. I will rather focus on several small pieces of code. This first post is about how to look at them through the eyes of a language developer — or, as I metaphorically would like to say it — through the prism of MPS.
Disclaimer
This is by no means a tutorial on JetBrains MPS. Instead, this is an explanation of some of the ideas that are behind it. Many points in this post have been (over)simplified for greater clarity. I refer you to a paper “Language Oriented Programming: The Next Programming Paradigm” by Sergey Dmitriev and a post “A Language Workbench in Action — MPS” by Martin Fowler for more details.
What is JetBrains MPS?
This is a language workbench, which, according to Wikipedia, is
a software development tool designed to define, reuse and compose domain-specific languages together with their integrated development environment
Let’s have a look at Scratch, a visual programming language, where the code is not represented as text, but rather as graphical “blocks” with placeholders.
Scratch has blocks for representing conditional statements, loops, events (along with many more other kinds of blocks), and is essentially a full-blown programming language.
Now, what if we want to create our “own Scratch”, with custom kinds of blocks? For example, we might want to define a block for SELECT
— FROM
statement as in SQL, and so on. This is what Blockly allows us to do.
Here is the point:
- Scratch is a visual programming language with an IDE
- Blockly is also a visual language with an IDE, but it is also a meta-tool: it allows to define visual languages and IDEs for them
Now we can think of JetBrains MPS as a very powerful Blockly :-)
Similarly to Scratch and Blockly, in MPS code is not text, but it may look and feel like text. More on that — in what follows.
Remark
If you wonder why one should even bother with non-textual representation of code, the short answer is: to be able to combine languages with “conflicting” syntaxes. Imagine two semantically different blocks in Blockly that happen to look exactly the same. It surely would not be a problem for Blockly to understand what is what — after all, these two blocks have two different spots on the palette and the IDE knows which one the user selected when “drawing” the code. The same principle applies in MPS.
Let’s forget for a while about fancy visual stuff, and focus on good old Java.
Structure & Editor
Here is a simple if statement in Java:
if (x > y * 2 + 100 && ! myList.contains(200) || this.isVisible()) {
myList.add(x);
myList.add(y);
}
else {
System.out.println("Condition not met!");
}
When you see it, what do you think about? I bet you try to get what it does, that is, you focus on the semantics. Indeed, if checks whether x
is greater than something and that myList
doesn’t contain something or this
is visible. Doesn’t make much sense, I know.
But what if we look at this snippet in a bit more abstract way? First of all, this is an if statement.
And what does an if statement “contain”?
It contains
- a condition:
x > y * 2 + 100 && ! myList.contains(200) || this.isVisible()
- statements that are executed if the condition is met:
myList.add(x);
myList.add(y);
- statements that are executed if the condition is not met:
System.out.println("Condition not met!");
Let’s visualize this.
Statements that are executed if the condition
is met are denoted withtrue_block
, and statements that are executed if the condition
is not met, are denoted with false_block
.
What could be, so to say, the “class” of each of these elements? Well, condition
is an expression, true_block
will contain statements, and so will false_block
.
Every if statement always has exactly one condition
, and may or may not have statements in either true_block
or false_block
.
Voilà! Now we’ve got the structure of an if statement. This is also called abstract syntax.
A thing like “if statement” is called a concept. Examples of other concepts are “while statement”, “variable declaration”, “expression”, and so on. It is, if you wish, a kind/sort/type of a block in Blockly.
After we have fixed the structure of our concept “if statement”, we can start thinking how it may look like in code. We can freely choose this representation (called concrete syntax), be it Java-like syntax with curly braces, or Pascal-like with begin
— end
. It actually doesn’t matter, because in MPS it’s just pretty printing of the structure.
Still, let’s go for Java syntax.
Let’s imagine for a moment that we have a table, and each “word” in this Java representation of if is located in its own cell.
We can now distinguish between four kinds of cells:
- cells that contain keywords (
if
,else
) - cells that contain special symbols (
(
,)
,{
,}
, and whitespace) - cells that represent indentations
- cells that contain elements from the structure of the concept
Cells with keywords and special symbols are “frozen”: it should not be possible to change them. Indeed, keyword if
should always be spelled likeif
in Java; condition should be surrounded by parentheses, and so on.
The only thing that a programmer should be able to modify are the three placeholders for condition
, true_block
, and false_block
.
If we stick to these rules, the code will always be syntactically correct. This is very similar to Scratch.
That is essentially how projectional editing in MPS works. Code is not represented as text, but it is rather a semi-graphical pretty printing of a structure. This pretty printing is editable though. You might think of it as “less visual” Scratch.
Multiple projections
What if someone prefers Visual Basic-like syntax for if statement? No problem! In MPS, we can define multiple syntaxes (called projections) for a single concept.
Moreover, we can switch between different projections when we edit the code.
Editing the structure
All the keywords, parenthesis, curly braces and indentations do not matter when a programmer edits code. This is very similar to Scratch. What matters is the elements that we described in the structure of the concept: condition
, true_block
and false_block
.
So far, we have been looking at same concept “if statement” from two “angles”: one is its structure and the other one is its representation. These are aspects of a language, and in MPS these two particular aspects are called (no surprise) Structure and Editor.
In the remainder of this post I will talk about more aspects.
Intentions Aspect
We start with editing experience: it’s crucial for a user of a language, especially when the language is not textual.
Intentions Aspect allows us to define quick fixes, or intentions that are shown in the IDE.
For our concept “if statement”, we could define intentions for:
- negating the condition (for example, from
x==3
tox!=3
, or fromx>10
to!(x>10)
, and so on) - removing else clause with
false_block
(this will also remove the braces) - exchanging
true_block
andfalse_block
Actions Aspect
Let’s have a look at Scratch again. Suppose, when “drawing” the code, we chose to have an if statement without the else clause. How do we add an else clause later?
In JetBrains MPS, this is one thing that is fixed via the Actions aspect of a language. It allows to specify which editing actions would change the structure of a concept and consequently its projection.
For example, when we want to add an else clause to our if statement, we just type keyword else
after the closing curly brace of true_block
. Remember: code in MPS is not text, it’s a semi-graphical projection with text-like editing experience, that’s why “just typing the else
” wouldn’t be a valid thing.
This could correspond to something like this in Scratch: imagine keyword else
being available on a palette, you move it to the bottom of the existing if
— then
block and the block gets transformed into and if
— then
— else
block. (Again, you cannot “just type else
” in this situation in Scratch— same applies to MPS!)
Behavior
Let’s repeat the exercise with thinking about structure of a concept. This time, our concept will be “variable declaration”. For the sake of simplicity, we only consider integer variable declarations.
So, a variable declaration has:
- name of the declared variable
- possibly its initial value (which can be also an arithmetic expression, of course)
One small detail here: element name
is of primitive (built-in) type, whereas init_value
is an instance of concept expression
that should be defined by us. Let’s keep “primitive” elements to the left, and “our” elements to the right.
In fact, name
is a property of a concept, and init_value
is a child of a concept. Here is the difference between them: in a sense, initializing expression is “embedded” into the declaration of a variable, it’s a different concept (that is, concept “expression”) that we “invoke” from the definition of concept “variable declaration”. On the other hand, name of the variable is something that the variable “possesses”. That’s why children and properties, respectively.
For children, we can specify their cardinality. In this case, init_value
can either be present or absent, so its cardinality is 0..1
. Properties are always present, it makes no sense to specify cardinality for them. We will not go into more details here, though.
Let’s now define the syntax of integer variable declaration — we divert a bit from the conventional syntax and will go with this:
number x equals 10 * 20
Here is how the projection with cells (as explained above) will look like.
When a programmer types number
is the code, a fragment with two placeholders is added.
number _____ equals ______
Again, this is similar to Scratch, with only difference that in Scratch you would “draw” a variable declaration, but in MPS you still type it — or, more precisely, its leading keyword (in our case, it’s number
; this leading keyword is called alias of a concept in MPS).
Behavior aspect allows you to “customize” the structure of a concept. For example, we might want that whenever a variable declaration is added to code, its placeholder name
is pre-filled with abc
.
Now when a programmer will type number
, they will get
number abc equals ______
with only one placeholder (for the initial value) left to be filled in.
Now we are smoothly approaching what I called “scary stuff” in the beginning of this post.
TextGen
So, suppose we have defined all the concepts that we wanted to have in our language. Now we want to transform
number carPower equals 10 * 20
written in our language into, say, Java:
private int carPower = 10 * 20;
There are two ways to make this happen in MPS. The first one is to generate a string that will contain the desired output.
This is essentially a string interpolation (or, a template expression, if you wish): we have some “fixed” parts (for example, private int
) and varying parts (the value of propertyname
, and the value of init_value
).
Trick about
init_value
If the syntax of expressions in our language differs from Java, for example, if in our language instead of
30+20-50
, we could write30 plus 20 minus 50
, then we would also need to generate the code for (that is: “to translate” / ”to compile”)init_value
. We are not going into details on that in this post.
Anyway, now we have generated the desired string
private int carPower = 10 * 20;
Is this Java code? The answer is both “Yes” and “No”.
That string looks like Java code, but nothing could have prevented us to generate gibberish like this:
private variable of type integer carPower := 10 * 20 ;;;
In other words, when we generate code, we do not check whether what we generate makes sense or not: in the end of the day, it’s just a string.
Apparently, MPS has a way to overcome this issue.
Generator
In the second approach to code generation, we specify a correct example of the desired code.
Yes, we do write x
and 0
in the generated code, as if they were hard-coded, and we completely ignore name
and init_value
at this point. This is now a piece of Java code (not just text that looks like code!), and MPS can ensure it is syntactically valid. We couldn’t have skipped or misspelled something — MPS would have complained otherwise and not let us further.
Now we can “bind” (or: “link”) the value of name
to x
and the value of init_value
to 0
.
Again, the trick about
init_value
applies here (see above).
What we have just seen is an example of a model-to-model transformation. We are not going into more details at this point.
Types & scopes
And let’s again repeat the (already familiar) exercise: think about the structure of concept “assignment statement”.
It contains:
- left part (that is, the value being assigned to)
- right part (the value that is being assigned)
Left part is a variable, and right part is an expression.
TypeSystem aspect allows us to specify that the types of left and right should be the same.
Finally, there is Constraints aspect. This aspect enables even more fine tuning of structure of a concept. For example, it is possible to specify the scopes of visibility: in our example, we might want that the variable in the left-hand side part of the assignment statement is either a global variable or a local variable (but not a field of a class, for example — assuming we had classes in our language).
When a programmer will be typing an assignment statement, the autocomplete for the left-hand side part will only suggest local and global variables.
Other aspects
We do not talk in this post about several other aspects that MPS allows to specify for a language, for example, DataFlow aspect. Mind you, it is even possible to define custom language aspects in MPS!
Wrapping up
Here is a diagram with the aspects covered in this post.
There are three main “pillars”:
- Structure (which is in the center of everything for the language developer) supplemented by Behaviour, Constraints and TypeSystem
- Editor (which is in the center of everything for the user of a language) connected to editing Actions and user-friendly Intentions
- code generation aspects: TextGen and Generator — two very different aspects that share the same goal — are center of everything “for the computer”
Acknowledgements
I am grateful to Mikko Pitkänen (University of Turku, Finland) for discussions on aspects of languages in JetBrains MPS.