Monoglots: When a subset is not

Jasvir Nagra
10 min readSep 9, 2019

--

A polyglot in computer science is a snippet of text which is valid in two or more programming languages. Think of them as the “Yanni and Laurel” or “blue or gold dress” of programming languages¹ . Or better yet an Escher piece where you interpret differently based on what you’re expecting to see. Different interpreters will recognize the snippet of text as their own.

Day and Night, M. C. Escher, 1938

TypeScript vs JavaScript

Languages which are a superset of other languages are degenerate form of polyglots since every program in the subset is a valid program in the superset. An example of such a pair of languages is TypeScript and JavaScript.

In fact, the second paragraph of the TypeScript specification makes this claim explicit:

TypeScript is a syntactic sugar for JavaScript. TypeScript syntax is a superset of ECMAScript 2015 (ES2015) syntax. Every JavaScript program is also a TypeScript program.

All this says is that every syntactically valid JavaScript program is also a syntactically valid TypeScript program. Casually implied by this claim, though not explicitly stated, is that if a snippet of text is a valid JavaScript program, it would be semantically equivalent to that same snippet of text interpreted as a TypeScript program.

This is a much stronger claim and delves unexpectedly deeply into what it means for two snippets of text when interpreted as programs to be “semantically equivalent”. Let’s not follow this rabbit down it’s hole since it ends up pulling us into a topsy-turvy wonderland but for an intuitive sense of why it might be complicated, consider what it means for two non-deterministic programs (like most JavaScript and TypeScript programs) which output different result on every run, to be semantically equivalent.

Instead, we will focus on the following puzzle. Is it possible to write a snippet of text which when run:

- As TypeScript always outputs “I’m TypeScript”; and

- As JavaScript always outputs “I’m JavaScript”

The Setup

Both TypeScript and JavaScript have evolved over time and come in different flavors and versions. For our purposes, let’s use a recent version of TypeScript, version 3.6, with strict mode settings enabled and a recent version of JavaScript, ES6.

Ok now you have a choice in the story! You can skip the failed attempts that iterate towards the solution and see what we arrive at by skipping to the section called “Success”. Or you can join us for the ride:

A TypeScript/JavaScript monoglot

Failing Techniques

If TypeScript is a superset of JavaScript, one obvious thing we may try to do is to write TypeScript and try to catch the syntax error that gets emitted when the same snippet is interpreted as JavaScript. In our way is the fact that in JavaScript, a syntactically invalid snippet of text is irrecoverable from — you can’t catch syntax errors from inside a syntactically invalid script.

JavaScript Language Subsets are like Russian матрёшка dolls — each fits perfectly into the next: ES5 Strict ⊂ ES6 ⊂ TypeScript

In pure JavaScript, the way you can check if a string contains syntactically valid JavaScript is by passing it to the Function constructor inside a try catch block. You may think we could do the same thing in TypeScript too:

This does not work as expected because even in TypeScript, the Function constructor expects JavaScript

Unfortunately TypeScript does not (yet?) overload the Function constructor. Even in TypeScript, the Function constructor (and eval), only takes as input pure JavaScript. This means even if you write TypeScript, the argument to Function and eval have to be pure JavaScript.

Foiled!

Another obvious way to try to achieve our goal is to find some variable or state that is undefined in JavaScript but which TypeScript defines globally. Subset languages tend to be cautious about introducing such globals not just because “globals are bad” but because they can conflict with the same variables in existing programs. Techniques like this are often used to detect if a JavaScript program is running on the server in node or on the browser by detecting the presence of a global variable like self, or window or even objects from a standard library like document and navigator that a browser provides but which are not present in other environments.

Given the large number of ambiently available useful standard utilities TypeScript exposes like Pick, Record, Partial and Required, it might appear at first glance that this would be the straightforward way for us to proceed. But this first glance is deceptive. While these are all globally available, they’re global utility types. On compilation, these types evaporate and more generally cannot be introspected on dynamically at runtime. That’s what we need. You can see this using the TypeScript playground:

Once compiled to JavaScript, TypeScript types are not available for introspection.

Overall, as near as I can tell, TypeScript truly does not inject variables into the global state. Impressive and useful for the real world but a thorn in our side for the quest we have set ourselves.

Let’s try cheating instead.

Cheating Techniques

Another thing we could take advantage of is that, internally, the TypeScript compiler must (and in fact is!) analyzing TypeScript sources and turning them into JavaScript. In order to achieve this, it injects small shims and utility functions which leak into our global space and are detectable.

For example, if you ever use async functions in TypeScript, you may notice that the compiler injects the following rather chatty stub into the beginning of your compiled code:

When using a feature of TypeScript that isn’t supported by the target JavaScript version you are compiling to, the TypeScript compiler sometimes inserts stubs to implement the support.

Notice in particular the var __awaiter =. So easily done! We can use an async function and detect the unexpected presence of __awaiter using typeof and we’re off to the races.

Unfortunately, this definitely feels like we’re cheating here. Transpilers that compile to JavaScript often do reserve some part of the namespace for these kinds of utility functions. But they’re not really part of the specification of the language. Not only that, some compilers will ensure that this reserved namespace isn’t really accessible by “userspace programs” either by forbidding their use or transforming these variables. Caja, another compiler for a subset of JavaScript, for example, reserved variables ending with two underscores (Internal to Google, I have to admit I’d christened these spicy wunderbars). The Caja compiler ensured no userspace program was able to create such variables. In that sense, the fact that TypeScript compiler artifacts are visible in userspace code can be considered a bug we’re taking advantage of rather than truly a deeper design of how the TypeScript language supersets JavaScript. Not only that, but if you’re targeting ES2017 which supports async natively, no such stub is generated.

Iterating Towards Success: Ah ha!

Ok so we’re failing here.

Instead of a hack or a cheat, let’s do the work. Let’s consider what truly makes TypeScript a superset of JavaScript. It’s not the standard library that it imports implicitly, nor the way in which the transpilation is implemented. It’s the fact that TypeScript defines new syntax unrecognized by JavaScript interpreters. If we found ways in which this extension of the grammar itself was ambiguous then we might be able to tease out this difference into semantically different behavior.

What we are looking for is a snippet of text valid in both JavaScript and in Typescript but which parses one way in the former, and a different way in the latter. By building on top of that difference, we can tease out a different execution.

This may seem like an exotic problem faced only by designers of language supersets but it’s a very common problem with live languages which are evolving. Many new features of programming languages extend the grammar of the language in some interesting way. This is true for JavaScript itself!

For example, TC39 is a standards body that proposes and ratifies new features of JavaScript and regularly is forced to tackle this problem to ensure adding a new grammar feature, they still have all previous programs parse the same as before.

Whether this is doable in general or not given a grammar is an interesting question left as an exercise for programming language academics and experts. :)

Back to our puzzle! TypeScript extends the JavaScript grammar in a lot of places but broadly for our purposes, in four important ways:

  • type casts using a newly introduced “as” keyword
  • type guards on variables, method parameters and return values
  • type casts using angle brackets
  • type arguments to generics

It’s worth noting that the “import” statement which you may think is peculiar to TypeScript is a reserved word in JavaScript and actually spec’ed at least partially in one form or another since at least 2015.

A little bit of critical minutiae here which you can skip — by looking over the JavaScript grammar, you may (as I did) convince yourself that casting using as is not a polyglot friendly mining space. The JavaScript language has a pretty complex grammar but it’s hard to find places where an expression with three or more tokens separated by whitespace like x as y can be valid without that magic wand in every JavaScript polyglot programmers pocket: automatic semi-colon insertion or ASI. Unfortunately, the magical incantation that is ASI is denied to us because the casting with as does not allow the whitespace on left-hand side of the reserved word as to be an end-of-line.

Similarly, type guards may seem temporarily promising. You could imagine a snippet like:

Close! In TypeScript this is a declaration of “foo” as a String initialized to “asdf”. If it were not for the leading “let” it would be, in JavaScript, a label “foo” followed by an assignment of “asdf” to a global variable “String”.

…where “foo” looks pretty close to being interpreted as a label in JavaScript but a variable (of type String) in TypeScript. What stumps us here is that TypeScript only lets you declare types following a colon in …well… declarations ie. in a “let” statement, a “var” statement or parameters in a function declaration. These are all the places in the JavaScript grammar where we can’t have labels!

This brings us to typecasts and generics. Here we have a first taste of success. Let us take generics. They look like:

function foo<A>() {}

…clearly invalid JavaScript. But let’s look at the invocation of foo.

foo<String>();

…here your mad-hatter sense may already be tickling. Sure in TypeScript that looks like an invocation, but those first couple of tokens also look like a JavaScript comparison — comparing foo with String! But gah — followed by a syntax error. What if the generic function took not one but two arguments though?

foo<String, String>();

…almost there! In TypeScript, that’s still an invocation. But in JavaScript, that’s an expression statement foo less than String followed by another expression String greater than …oh… let’s just pass an argument:

foo<String, String>(1);

Yes! That’s both a:

  • TypeScript expression to call foo and pass an argument of 1; and a
  • JavaScript expression to compare foo with String, then compare String with 1.

Iterating Towards Success: Bahaha!

Ok we’re not quite done yet though, are we? In order for us to be able to call a generic function, we need to first define one. And defining one, we run into all the syntactical problems we were trying to avoid in the first place.

We’d almost be back at square one — unless we’re able to find a built-in generic function that’s part of the standard library that we’re able to call.

In order to flow types properly, you need meta-functions like call, bind and apply to return expected types. One of the awesome features TypeScript recently introduced gives them this powerful ability — that itself is worth an entire blogpost. But for now, it also coincidentally gave us a very handily typed Function.prototype.apply:

What this gives us is a generic apply function that takes two generic arguments — just what we needed!

Iterating Towards Success: Callooh Callay!

We’re almost at the finish line. What we have now is the ability to create a function, sayf which:

  • In TypeScript, gets one argument — the result of calling apply; and
  • In JavaScript, gets two arguments, incidentally both of them booleans.

There are some nuances around optional arguments which are worth trying out yourself but which I will not delve into here. There are also some type flow weaknesses in TypeScript that out of laziness I’m taking advantage of here that you ought to spend some time thinking through.

Overall, it’s easy enough to distinguish a function which was called using either one or two arguments using the arguments property and we can use that to distinguish and respond appropriately.

Success

So here it is, the same snippet of code once being called as TypeScript and another as JavaScript.

Execute in TypeScript
Execute in JavaScript

So what…

Well not a lot really — this was in some sense a silly self-imposed exercise. That said, there were a couple of interesting takeaways for me — three actually:

  1. Functionality: When extending the grammar of an existing language to create a superset, it is easy to accidentally let slip in changes which can arbitrarily change the semantics of the subset language you were extending. As far as I know, it’s an open problem whether given a grammar and semantics, if it’s possible to determine if a superset violates this intuitive property.
  2. Security: There may be cases where developers rely on JavaScript being a subset of TypeScript to run JS programs using a TS interpreter or compiler — for example during debugging or static analysis during CI/CD releases. It would be interesting to explore if these semantic differences mean that you can fool a developer or static analysis auditor using your third party library into thinking your code behaves one way during debugging and even static analysis and a different way during production.
  3. Fun!: Finding and teasing apart the parser differences between subset languages such that they continue to work but work differently is a lot of fun on a long weekend. Not only that, TypeScript is a treasure trove of rich features that help make your code better and more bug-free. I find puzzles like this a fun way to explore parts of the language I did not know about before and did not know I might want to use in the future.

Here’s the continuing puzzle for you! My snippet is pretty long.

What’s the shortest piece of text you can generate that has this same “monoglot” property:

- As TypeScript always outputs “I’m TypeScript”; and

- As JavaScript always outputs “I’m JavaScript”

Send it my way on Twitter with the hashtag #tsmonoglot!

[1] I want to call it dress code so badly now.

--

--

Jasvir Nagra

Security @ dropbox. Formerly security product @instart & @google , authored Surreptitious Software, TL for Caja. I love good food, fine wine & great JS.