What is wrong with the web

(And why we may want Moon)

A few days ago, I published an article about Moon, a fundamental building block of a decentralized browser that aims to solve many of Mist’s problems. I did so just to talk about some fancy features such as its decentralized package manager and a cool monadic notation. I guess that made some people angry, wondering why the hell I made yet another programming language when we have so many of them. If you’re on that group: you’re right. I’m sorry.

Believe me when I say I’m as tired of new languages as you, and I’m as pissed with myself as you are. But I’d not have done this if I didn’t have a very good reason. Give me, thus, a chance to justify my sins. For starters, I didn’t actually invent a programming language. At least, not according to several definitions around. From Wikipedia,

A programming language is a formal language that specifies a set of instructions that can be used to produce various kinds of output. (…) The description of a programming language is usually split into the two components of syntax (form) and semantics (meaning).

Moon has no side-effects and, as such, it can’t do trivial things such as “outputting” data to the console. There is no “hello world”. Moreover, Moon has no official “syntax”: it lets the programmer pick whatever syntax it likes. Finally, I didn’t invent it! So, again, WTF is Moon?

Top-down answer: Moon is just the minimal subset of JavaScript that removes as much as possible while still leaving enough things to be remain practical.

Bottom-up answer: Moon is the minimal extension of the λ-calculus that adds just enough things to make it practical.

Formal answer: Moon is just an algebraic datatype,

data Term
= Var String
| Lam String Term
| App Term Term
| Let String Term Term
| Fix String Term
| Pri String Term
| Num Number
| Str String
| Map [[String, Term]]

plus a normalForm : Term -> Term function, following the semantics you'd expect from the PLT literature, and as explained on moon-core.js.

And that’s it. Note that I didn’t invent that. I just selected a very interesting subset of JavaScript, gave it a name, love, a compiler and I decided I’d use it instead of JavaScript as my browser’s main scripting language. But still, why? Why is this subset interesting and important? Why do I care? And, mostly, how in Earth dare I defy the omnipresent, almighty lord JavaScript?

  • Short answer: that micro subset is, given the way CPUs are built, given the way JIT compilers currently work, and as far as I could tell, the most concise format to pass arbitrary pieces of code around in a way that is safe and sufficiently efficient.

Using JavaScript would make critical optimizations and safety measures impractical and, as such, it had to be discarded, regardless of how frightening the consequences might be.

Now, that pretty much ends the article. You could stop here. If you, though, have nothing better to do and wants to read a looong explanation, covering my entire line of thought from the history of web to WASM to parallel lambda evaluators, then move ahead to the long answer.

  • Long answer: because that’s how the web should’ve worked, to begin with.

What is wrong with the web?

To understand what is wrong with the web as it stands, let’s first go through an overview of its history. When it started, the web consisted mostly of static pages with some texts and images. It used to be fast, safe and robust, but also extremely restricted. To circumvent that, people invented JavaScript: a turing-complete programming language that could be executed in events such as window.onClick or setTimeout, allowing a web-developer to select and modify contents of a web-page dynamically. That was extremely permissive, but equally problematic.

As the web grew, web-pages became more and more interactive and complex. Libraries such as jQuery were invented to make the task of “selecting” and “modifying” those pages’s contents easier. Soon, people noticed such coding style didn’t scale. The reason wasn’t obvious at first and they made backbone.js, but, eventually, the problem was figured out: by keeping half of your application’s logic and state inside the DOM, the other half on JavaScript, and mutating that chaotically, programmers often ended up with different sources of truth and confusing, unmaintainable code; the so-called “jQuery spaghetti”. Angular attempted to solve that by making HTML a programming language, so you didn’t need that much JavaScript anymore. React attempted to solve it the other way around, by putting HTML inside JavaScript. Those lines of thought fought for a while but, nowadays, it is quite obvious that the React style won.

React wasn’t only a clever lib, it revealed something deeper than that. React showed us a well-principled and natural way to build any visual application. A React app (or component) combines 3 essential ingredients: a state, which holds all data on your app that can change, a render() function, which translates that state into a visible user-interface, and events, which talk to the outside world and update that state. There are many slightly different ways to make that spirit concrete, but all of them share one thing in common: they're much more restricted than normal JavaScript apps.

That’s important, so, let me elaborate on that. If you’re reading a properly-coded React render() function, you know for sure it isn't doing any HTTP request. If you're reading a "pure component", you know for sure it has no state or event. If you're reading a reducer, you know for sure it is not writing stuff to a database. And so on. Compare that to the jQuery era, where every piece of code could do anything, and you'll understand why it is so comforting to maintain well-made React apps. React achieved most of its success not by inventing a bunch of shiny features, but by removing as many bad ones as possible. Now, consider the following: despite React restricting JavaScript so much, Facebook, one of the most complex web-apps in the world, was built with it! All those restrictions were, thus, not limiting at all.

With that insight, we can finally understand what is wrong with the web: JavaScript. A turing-complete language running in a dedicated thread is by far way too powerful, and clearly not needed to express arbitrarily dynamic applications. That’s why React is so brilliant: it restricted JS as much as possible while still keeping it equally powerful; or, alternatively, it added, to static pages, the right amount of JS to make them arbitrarily dynamic, but no more than that. By doing so, React cleared up the chaotic nature of web-development, leading to legacy code that is robust and highly maintainable, and giving code-maintainers a long-awaited relief.

Now, consider the following: if those restrictions did such a great favor for programmers, what could they do for the actual engine? I mean, suppose that a browser, instead of running a turing-complete language in a dedicated thread performing mutations in a document-tree never meant to be used that way, acted as a mere interpreter of React-like apps? That is, what if it received just an initialState, a render() / reduce() function, and worked with that? As it turns out, that’d enable it to perform amazing optimizations. And when I say amazing, downgrade that to mind-blowing, world-changing. From diffing virtual DOM natively, to coordinating external-data pooling, tocompletely revamping how memory is managed, the possibilities are endless. Such a browser could have hundreds or thousands of tabs open with a tiny fraction of the memory usage of a normal one.

How sexy is that? A lot, I tend to think. But how can we actually do it? That is, suppose we decide to test that hypothesis and build a browser for React-like apps. How would those apps look like? Would they be normal HTML? Would they be JavaScript files? Would the browser eval() them? Well, yes, technically, that could work. But, again: JavaScript sucks. It was made in 1995 to run in a single thread that imperatively mutates the document in a jQuery-like style and, as such, it is not ideal for our purposes. And I'm not even talking about stupid features as on that "wat" video, I'm talking about stuff that downright ruins it as an option. eval, setInterval, with, global side-effects and so on. JS is flooded with things that break most assumptions our browser could use to be as fast as we want it to. So, what now?

How can we fix it?

Once you accept that React-like apps are sufficient and that JavaScript sucks, then it becomes obvious that all we’re missing is a way to send those apps without all the overwhelmingly stupid JS baggage. Since apps are just code, all we need is, thus, a format to send code around the web that is:

  1. Fast to parse, compile and optimize just-in-time. Imagine we received plain-text Rust code and had to type-check it?
  2. Optimizable and performant. How ironic would it be if we did all that only to use Ruby?
  3. Compact. Doesn’t it bother you how inefficient minified JS bundles are?
  4. Safe. You don’t want apps to access your disk midway through their render() function, do you?
  5. Expressive. Nobody wants to write code in brainfuck.
  6. Small. This is not mandatory, but if a feature can be implemented outside of the core language, there is no reason for it to be a primitive.
  7. Pure. This is the main deal-breaker. Suppose the browser decides to optimize the render() function by memoizing it, but then somebody writes code like function render() { return <div>{++GLOBAL_VAR}</div> }. That'd break everything, leading to inconsistent behavior. This is just one of billions and billions of reasons such a language would need to be side-effect-free.

Most existing programming languages don’t pass half of those requisites. In fact, the last one disqualifies pretty much all of them: Python, Rust, C, PHP, Java, even Scheme, they all have side-effects. Problem is, those languages were not really designed to be used as a lightweight code-interchange format as we envisioned. WebAssembly is, perhaps, the closest thing to that. In fact, that is my #2 option. There are, though, many major points that make it not ideal. I could elaborate on that if requested.

Am I, then, looking for something impossible? No, not at all. There are some things quite like that around. Haskell’s Core, for example, is very inspiring. It is the intermediate language to which Haskell programs are compiled before being converted to machine code. This is its definition:

type CoreExpr = Expr Var

data Expr b -- "b" for the type of binders,
= Var Id
| Lit Literal
| App (Expr b) (Arg b)
| Lam b (Expr b)
| Let (Bind b) (Expr b)
| Case (Expr b) b Type [Alt b]
| Cast (Expr b) Coercion
| Tick (Tickish Id) (Expr b)
| Type Type

type Arg b = Expr b
type Alt b = (AltCon, [b], Expr b)

data AltCon = DataAlt DataCon | LitAlt Literal | DEFAULT

data Bind b = NonRec b (Expr b) | Rec [(b, (Expr b))]

No matter how complex, every single Haskell program is eventually compiled to that tiny language. That seems very close to what we need, no? Core is a simple turing-complete language that is fast, performant, has no side-effects, is pure and safe. Moreover, Haskell is a practical language with many high-level features, demonstrating that those can be compiled to Core without loss of performance. In fact, if it wasn’t designed so specifically with Haskell in mind, it’d be perfect. Another option would be Morte, but the world is not ready for it yet. Let’s, thus, take both as an inspiration and design a core-language that is suitable for our purposes. Let’s start with the lambda-calculus, which is the most primitive subset of every functional language:

data Term
= Var String
| Lam String Term
| App Term Term

This gives us functions, variables and function application, which are obviously needed. While recursion and variable assignments can be expressed without further additions (with the Y-combinator and function calls), it has been shown that adding those two as primitives can provide essential performance benefits in many cases, so we add Let (assignments) and Fix (recursion):

data Term
= Var String
| Lam String Term
| App Term Term
| Let String Term Term
| Fix String Term

Now, curiously, this language is almost good enough for our needs! Since there are ways to compile any high-level features and syntaxes to this subset (which is quite obvious, since it is turing-complete), we don’t lose any expressivity. Problem is, how to do so efficiently? To represent data-structures on that core, we could use lambda-encodings: it has been shown that those can be pretty much as fast as native structs. For control-flow, Lisp has showed us tail call optimization is sufficient. Most other features can be expressed as syntax sugars, monads being very handy for that. That leaves us with a few holes: native numbers (representing them with lambda-encodings would be prohibitive), a map/array-like structure with O(1) read/write, C-like structs and strings. All of those can, amazingly, be solved by extending this language with… JSON. (What?)

data Term
= Var String
| Lam String Term
| App Term Term
| Let String Term Term
| Fix String Term
| Pri String [Term]
| Num Number
| Str String
| Map [[String, Term]]

Well, here, Num represents a IEEE 754 double, Str represents an UTF-8 string, and Map maps strings to anything else, i.e., like a JavaScript Object. Pri performs a primitive operation on those types (addition, string concatenation, etc.). The amazing insight is that JSON is all we need to take Moon all way from slower-than-Python terrains to being faster than JavaScript itself in many cases. The reason: V8.

With Num, Moon can execute any number-crunching algorithm as fast as native JS. Map allows us to express 3 different things efficiently: C-like structs, arrays and, obviously, maps, all with O(1) reads and writes. This is possible because JIT engines can easily pick the right representation based on usage. Finally, since Moon is pure and strongly typed, it can perform a wide range of optimizations that impure languages can’t; stream fusion, for example; generating even better machine code.

So, this language is for sure looking optimizable; at least as much as JS, possibly more. We covered expressivity already. Safety comes naturally from its purity: Moon can only access as many system resources as you allow it to. Since Moon’s AST is so close to the λ-calculus, we’re able to compress it in a way similar to John Trump’s BLC, an extremely compact format designed to pack programs as much as their inherent entropies allow. As such, you can expect Moon bundles to be significantly smaller than JS bundles for equivalent programs. That also makes it extremely fast to parse. As a bonus, JSON allows Moon to be a subset of JS and share the same data structures with the host language, which is great for usability.

So, that’s it. We have it all. Moon is still not perfect, at least very close to what is needed here.

As a last note, it is very comforting that, before specific compilers are built, Moon can just borrow JS JIT engines, using them as if designed for it. It is, though, in no way locked to them. Moon is so small we could, for example, just compile it to Haskell, and use all the power of the marvelous GHC. Or we could use Scheme, or OCaml, or anything with closures, numbers, strings and maps. In a crazy future, you could even compile it to Lamping’s abstract-algorithm, running your programs in a massively parallel manner. Not really, but it wouldn’t be an official article of myself without those words.

Conclusion

The first mistake of the web was running each app in a dedicated thread with a turing-complete programming language, when that is clearly overkill and there are much less expensive ways to achieve the same results. The second mistake was using an awful format to communicate code. Consider how it is 2017 and minified JavaScript, of all things, is the way we send code through the wire. At that point we might wonder why even bother with PLT research. Moon solves that by doing things in a saner way. There is nothing special or new about it; all that knowledge is decades old, but, for some reason, we just ignore it. Moon is, above anything, a manifesto to think about what we’re doing, and an initiative to try and do it right.


tl;dr

  1. Moon is not a new language I invented and that you’ll have to learn.
  2. It is just a nano-subset of a JavaScript, which you already know well.
  3. That subset turns out to be a great code-interchange format: it is fast to parse and compile, has ultra compact bundles, is safe and performant.
  4. It will be the core language for apps on a new browser called Mist-Lite.
  5. Yes, I know you love JS, the universe is built with it and nobody will use Moon because it is not full JS. I understand, but I need things to work. Awful things get popular. Popularity is good, but not at all costs.