A tale about domain-specific languages

Mikhail Barash
10 min readSep 14, 2018

--

Disclaimer: The events, characters and firms described are ficticious. Any similarity to actual persons, living or dead, or to actual firms, is purely coincidental.

TL;DR: A story about different tools for DSLs: napkin → Java → Word → Excel → HTML → XML → YAML → Clojure → JetBrains MPS.

1997

Here is John. He just founded a publishing company.

John

So far he is the only employee at his company and he intends to do all stuff by himself.

Recently John got his first client: an old friend of his, William Smith, needs to print a local newspaper on a weekly basis.

PDF has been recently introduced and John explained to William how great it is. William seems to be happy that now he can produce high-quality material to serve his neighborhood.

When John is having a dinner with his wife to celebrate that the business now has a first client, William calls John and reminds him about the margins of pages in his newspaper. As John doesn’t have a piece of paper, he uses a napkin to write down what William told him.

Scan of that very napkin.

At the same time, John gets a pager message from his son, Andrew, asking to buy some milk. After John comes home, he writes on the napkin “covers as usual” — referring to whatever usual meant in his mind. Next day, John checks the margins of all pages in William Smith’s newspaper, and sends the file to the printer.

1999

A couple of years went buy and John realizes now that with the amount of orders he has, he doesn’t manage all by himself anymore. He hires a professional typesetter and a secretary who is taking the orders.

They have established a particular process: when a new order comes by phone, the secretary writes down the details and sends an e-mail message to the typesetter who then does the necessary processing.

Hi,

We’ve just got new order, diary No: 1999/3456/32-A. It’s books, covers 250g/m2.

Left: 30 mm

Top: 15 mm

Right: 20 mm

Bottom: 1 in

Thanks,

This is a free-form English text with some subtle structure though. It definitely looks better than John’s original napkin.

2000

As John’s business grows more and more, e-mails are not the best possible way of communication anymore. John decides to use Microsoft Word that would bring more structure into page margins specification.

Page margins specification in Microsoft Word. Note “error messages” for misspelled words.

John is proactive and he thinks that automation could help his business a lot. He has recently hired Jane, who is a Java developer, and she now expresses page margins requirements in Java.

Jane

She has found out that there is a Java library that performs validation of page margins for PDF files. The only thing that Jane has to do to use that library is to write lots of boilerplate code.

Before joining John’s company, Jane developed software for accountants. She thinks that Microsoft Excel (or some other spreadsheet calculator) could be used to structure numerous requirements for PDF files that John’s clients have.

Here is how Jane works now: a typesetting specialist prepares a Microsoft Word document with specifications, scans it, and sends to Jane. She then writes Java code based on those requirements. Oftentimes there are misprints and corrections made with a red ballpoint pen.

Boilerplate Java code that Jane has to write.

There is a slight problem with using Microsoft Word for specifying page margins: it perfectly shows misspelled words, but gives no feedback on real problems, like non-numeric values of margins (well, nothing prevents a user to type some gibberish in the “field” for a left margin), or negative pages numbers, and so on.

From Word to Excel

Jane has just got a brilliant idea. What if the typesetting specialist would use Microsoft Excel that has a very particular layout of cells and then Jane could programmatically analyze that file and generate the boilerplate Java code? Jane is very entertained by the idea and is happy to implement it — she’s got lots of Visual Basic for Applications training in the past.

Page margins specified in Microsoft Excel. Note how error messages implemented in an ad hoc way.

Jane has essentially developed a domain-specific notation for the kind of problems John’s company deals with. Jane’s VBA script starts processing with the cell that contains text for pages. Six cells to its right on the same row, there will be the value for the left margin. Six cells to right, two rows below, and here is the value of top margin. And so on. Quick and dirty.

What is important about Excel approach is that Jane can now perform some validation of the values entered by users, for example, when an interval is specified, its left part should be strictly less than its right part, and so on. A small red exclamation mark conveniently appears near “problematic regions” of a spreadsheet. It becomes more and more difficult for a typesetting specialist to define something in a wrong way.

At the same time, John’s son, Andrew, got interested in computers and convinced John to buy him a book about web programming. Andrew has quickly implemented a simple web page for his father’s business.

Andrew took a different approach than Jane. While Jane has allocated designated cells to enter page numbers and margin requirements for those pages, Andrew just suggested to use Microsoft Word format when entering the page numbers (for example, 1;2;3-15;20-).

In any case, Andrew’s web page generated Java boilerplate code. John was happy like never before.

This approach wasn’t however introduced into John’s company: when John talked with his typesetting specialists, they all told that they were happy that they could simply copy-paste Excel cells. Clearly, in Andrew’s solution, one would have to click the Add… button and couldn’t easily copy the values of different margins.

2003

Jane decides to try XML technologies: after all, XML is quite popular. Specification documents becomes now XML documents!

Error messages are a bit cryptic for a typesetting specialist, but still possible to grasp.

Documents are validated with an XML Schema, and a suitable tool support can show error messages in where they are. Typesetting specialists are now having hard time defining something in a wrong way! If they make a mistake, the XML document won’t be validated and they will have to fix it.

Jane has implemented an XSL transformation that generates the old good Java boilerplate code for her from the XML document.

Original XML document, XSL transformation, and output of the transformation.

Typesetting specialists are having hard time not only making mistakes, but they really struggle a lot with the editing experience. They resist to learn XML editing tools and are perplexed by syntactic noise of XML.

2004

Jane was upset that she couldn’t really convince typesetting specialists to use XML last year. So she thought that maybe YAML could be an option? “There is definitely much less syntactic noise,” Jane thought.

document:
description: "User manuals"
handler: "John Smith"
- requirement:
description: "cover page"
for pages:
- page: 1
- page: 2
left:
value: 20
unit: "mm"
right:
value: 20
unit: "mm"

However, as expected, typesetting specialists kept forgetting all the time the indents, the list markers and were not sure about quotation for strings. “Well, maybe Excel is not that bad, after all,” they concluded with John.

2007

John’s business is growing incredibly well: in addition to being a publishing house, it is now also a media agency. Jane has been developing a lot of stuff for the company.

But page margins requirements were still done in Excel: everyone just got used to them and was unwilling to change anything. However, Jane, as creative she is, has heard about a new programming language, Clojure, and about domain-specific languages. She got immediately enthusiastic: she well understood that the old good Excel was not alien to this phrase: “domain-specific”.

Jane googled a lot “domain-specific language”. She found out that there are two kinds of them: internal and external.

John in 2007.

Internal languages, as they say, are very easy to build and maintain. Essentially, what you do is that you just use some programming language in a very particular way so that it starts to have a “convenient” syntax. What “convenient” means here depends on who is using the language. Jane wanted to make a language that her typesetter colleagues would be able to use by themselves. She wanted to make a language for non-programmers.

This idea turned Jane to external domain-specific languages. They would give Jane an absolute freedom of syntax. She could make it look like English! “…And then our typesetters wouldn’t even realize that they are in fact programmers”, she told to John.

A typesetter would be now a programmer!

There were a couple of issues with this idea however: implementing an external language with custom syntax requires writing parsers. Jane didn’t know how to do it, and if she had to learn now, it would take too much of her time from other ongoing projects in the company. A final argument against an external language was a claim that she read someone on the web: something along the lines “a language without an IDE is not a good language”. And Jane indeed realized that: if she would implement an external language with her dream syntax, would the company’s typesetters use Notepad to write the page margin requirements? “Definitely not a good idea: this would be a mess”, she thought.

And she decided to go for an internal domain-specific language. That’s how she imagined the process: a typesetter would write the code in Clojure and “compile” it into Java boilerplate, the same way it has been done for many years at the company. Behind the scenes, every function just returned a string that was a piece of corresponding Java code.

Right margin is specified twice. 🙄

Error messages are still cryptic for typesetting specialists… But, typesetting specialists will get used to it, right?

2010s

Everything has been working great for John’s company so far. More than 20 years already. That’s definitely a success story.

Every time a new typesetter is hired at John’s company, they have to learn, among other things, that very same Clojure internal domain-specific language that Jane has developed already a decade ago. “A small price to pay for a stable and tuned process,” always says John.

But Jane, with her craving for adopting new technologies knew at subconscious level that there should be a better way. Without parsing, without the need to implement from scratch an IDE for a domain-specific language.

Jane has been using products of different vendors, and, in particular, she used IntelliJ IDEA.

Somehow Jane has never paid attention to the tab “Languages” on JetBrains web site. Until one day she saw “MPS: Create your own domain-specific language”. That was what she was thinking about for quite some time already. And she decided to give it a try.

Era of JetBrains MPS

MPS is an abbreviation for meta programming system. Well, that’s what Jane has been essentially doing with different approaches for many years already! All her solutions in some way or another generated Java boilerplate code, so that she wouldn’t have to write it herself. This is metaprogramming.

MPS is an example of a language workbench: that’s a fancy name for tools that allow defining new languages and create IDEs for them. What if you could get an IDEA-level custom IDE for your language without needing to write a parser?

Languages implemented with MPS can feature rich (non-textual) notations, and automatically generated IDEs provide custom autocomplete (in our example: measurement units for page margins), intentions (known as quickfixes in Eclipse world), and lots of other features. And all of those are domain-specific, that is, tailored to a particular language. And error messages: they are now also domain-specific, and communicate exactly what’s the issue with the code. It’s “Wrong page interval”, not “Assert failed”.

Editing experience of a page margin specification document in JetBrains MPS.

To allow this syntactic freedom, MPS uses a projectional, or structural, editor. The user edits not the textual code, but rather the abstract syntax tree of it, and these trees can be projected into text, tables, and graphics. Textual form of code is only meant for the programmer, and no parsing at all is required when a projectional editor is used.

Interested how that works? Read more!

Slides

Here are slides of my talk at Small FP 2018 conference held in Helsinki, Finland in September, 2018. The presentation featured the tale.

--

--