Learning from the Prototype

Darius
Sempiler
Published in
5 min readMar 16, 2019

Last year a rough Sempiler prototype was available as a Visual Studio Code plugin.

Whilst related by lineage, the current verison of Sempiler is a complete rewrite, and a wide scale departure from it’s predecessor in a number of notable ways.

Engine Implementation Language

The prototype was written in TypeScript, but now the source code is C#.

This was due to wanting the compiler to complete as quickly as possible (even more so from a cold start) to ensure a fast feedback loop for the developer. Especially given that in a common setup for the tool, diagnostics are obtained by feeding the Sempiled output to another compiler in the backend (eg. javac) — which has its own inherent cost.

Engine Pipeline

The prototype was decoupled from having to use a particular source syntax and target domain, but that was about it.

Now Sempiler is split into 4 phases (parsing, transformation, emission, consumption) that are completely configurable — essential to the goal of democratising compilation (ie. giving the developer complete control over the transformations made to the source code).

Parsing

The prototype parsed TypeScript or JavaScript, but did so by wrapping tsc.

Fundamentally tsc would create its tree first that then Sempiler would walk through, converting the nodes to the equivalent abstract semantic tree nodes in Sempiler land.

There is an obvious performance hit to doing this conversion, particularly on complex source trees.

Moreover, tsc validates that source tree against the domain rules for the web or Node.js. One of the fundamental concepts of Sempiler is to decouple syntax and semantics (the flavour of validation rules you apply to a piece of code).

So the prototype had to detect and suppress semantic errors reported by tsc in a very hacky way, and also waste the time of the developer whilst doing this.

(side note: I alluded to this problem in the Q&A part of the Sempiler FullStackCon 2018 talk)

The new compiler engine has a bespoke TypeScript parser module that can be plugged into the front end.

It’s job is just to parse the syntax constructs and create an abstract semantic tree directly from the source text.

It makes no attempt to reason about the intent behind the code or validate it against any particular domain.

This is much more performant therefore, and the resulting abstract semantic tree can then be validated against the semantics of any domain subsequently (eg. iOS/Swift).

Typechecking and Semantic Validation

The prototype implemented it’s own typechecker.

(another side note: writing a typechecker is probably one of the quickest ways a developer can go insane).

The new engine currently does zero typechecking for multiple reasons:

  • The code you are Sempiling will reference APIs and symbols that exist in a target context (eg. the ‘fetch’ networking API), but Sempiler may not have visibility or knowledge of these symbols
  • The complexity of writing a typechecker is a sizeable undertaking
  • The runtime overhead incurred from typechecking is non trivial
  • The consumption phase is for linking against existing, mature tools like battle hardened compilers, that themselves can report type errors (eg. javac for Java/Android validation)

When collecting diagnostics from a consumer, Sempiler translates back any errors to the line that caused the issue in your original source text.

So if you are writing TypeScript, but that eventually becomes Java and an error is found by javac, that error will be mapped back to the line of your TypeScript that caused the issue.

So the net result is your code will still be statically typechecked and validated against the rules for a particular domain, it just will not be Sempiler having to perform this task. (phew!)

Disabling IDE Intellisense

The prototype tried to work with the existing Visual Studio Code Intellisense system, by providing definitions that would play nice with Intellisense.

However, the way it was implemented involved a number of hacks and work arounds just to fool the autocomplete to suggest the correct options (ie. Android flavoured completions when writing JavaScript).

One of these aforementioned egregious hacks has also been removed..

Fakeywords

The Sempiler prototype used this painful workaround called fakeywords.

This was basically an abuse of type annotations, where you could use ‘fake keywords’ in union types.

These fakeywords were defined against undefined in the type system, which meant Intellisense ignored them, and they did not conflict with the actual type being expressed:

Fakeywords

The need for such mechanisms was due to the prototype’s reliance on tsc and the default Intellisense experience for TypeScript/JavaScript.

The new engine does not have this dependency, and so the need for fakeywords (thankfully) goes away, because your code is now validated by backend consumers, not the frontend.

Stub APIs

Developers also had to write stub APIs (placeholder APIs with the same signatures as their actual counterparts in the target domain) just to satisfy the typechecker.

This huge source of friction is akin to writing declaration files in the current world of TypeScript — something I have seen put people off adoption.

You want to grab a programming tool, and get going asap — not have to wrestle with configuration and setup.

In the new Sempiler there are no stub APIs or declaration files of any kind.

Where a particular symbol your code uses is actually defined varies:

  • Maybe you defined it explicitly in your code?
  • Maybe it gets injected during the compilation process?
  • Maybe it already exists in the target domain?

In any of these cases the result will be the symbol being defined when the code is executed in the target domain. And that is all that matters to us.

Language Service

The prototype was incredibly latent because it was running Sempiler inside the same thread that was updating the IDE or code editor. (yerrr!)

You can imagine how painful that gets when every character you type causes a stutter.

(I had to hack in a rough caching mechanism retrospectively just to bring compile times down from 20 seconds to under 5 seconds!)

The advisable (read: correct) way is to use a language server, namely Sempiler runs in a separate background process that stays alive, providing results asynchronously to the main thread.

CancellationToken

In the new engine a compilation session can be terminated through the .NET CancellationToken mechanism.

This is incredibly useful in a live editor where you want to reevaluate the source text a short time after the developer stops typing (discarding any previous in flight session thereby).

The compilation process can be interrupted immediately, and started anew with updated data (the latest source text typed).

Goals

Lastly, the goals of the project have shifted a bit.

This is a bit more fluffy to articulate, but the original prototype was staunchly opinionated:

As the developer, you should always be mindful of the target domain

Whilst the essence of this is still true in Sempiler, the implication of that statement was that the developer would write different code for different domains (platforms).

That is too purist, a drain on productivity, and not fun (and a stance born out of frustration at the state of software more than rationale).

The developer may well need to write domain specific code in certain scenarios, but the new engine places much more emphasis on using the same piece of code across multiple domains (eg. iOS and Android).

Sempiler now is about producing native code quickly and easily. Being able to share solutions across multiple target domains is not antithetical to that. On the contrary, it is now a key part of the design.

You can see the current goals of the project on the homepage.

--

--

Darius
Sempiler

Software Eng // prev @Microsoft // passionate about compilers & tooling 🛠️