Rust and the Three Laws of Informatics

Simon Chemouil
16 min readOct 9, 2018

--

After 10 years in the Java world, I have recently been looking at Rust more seriously. I am barely beginning Rust programming, yet I feel enthusiastic: I believe Rust deserves attention because it is a shift in the balance of languages.

tl;dr

Rust can run in embedded chips, Web applications and distributed systems. It combines the speed of C, a modern, strong type system that safely manages memory and a high-quality ecosystem. It makes writing concurrent code easy and safe. The Rust community is welcoming, kind, very active and building great software. Rust is already used in production and it works fine, but some parts are still being perfected. I’m going to join the fun and it would be wrong not letting you know :-)

From Java to Rust

Java and Rust are very different languages, supposedly targeting different spaces. I’ve been asked why a Java developer — a boring developer 😏— would be interested in Rust. It turns out that my rationale for choosing Java is the same that is now leading me towards Rust. Let me describe the principles guiding my choice of language and technology: *drum roll* The Three Laws of Informatics!

The Three Laws of Informatics

There are numerous rules or laws trying to formalize what makes — or how to make — software. A few take inspiration from Isaac Asimov’s Three Laws of Robotics. Asimov’s laws are fun first because they’re a somewhat geeky Sci-Fi reference, but they’re also good because they’re minimal and make precedence clear.

I’ll dub them the Three Laws of Informatics. These laws may not be very original in their content — and not truly laws — but like Asimov’s laws, they try to find the most essential, orthogonal axioms from which desirable properties can be derived. They are best effort laws, because perfection is out of our grasp.

Here we go:

  1. Programs must be correct.
  2. Programs must be maintainable, except where it would conflict with the First Law.
  3. Programs must be efficient, except where it would conflict with the First or Second Law.

The first law means programs should behave as expected in all situations. It implies software should try be devoid of bugs, security issues and should never crash.

The second law implies that programs should be well-designed and documented — because it is required to have a good understanding of the code — and modular, because large monoliths are not maintainable. There should also be tools helping developers maintain programs. Partial evolutions should not warrant a full-rewrite and another developer should be able to make necessary changes because of design clarity and because the language can be mastered.

The third law means programs should tend towards an optimal balance of speed and resource consumption for the most likely inputs — that is, according to how they are meant to be used. It implies we should use the best data-structures for the job, aim at the best trade-offs — because most of the time there is no global optimum — , plan for scale appropriately and make code sympathetic to the computer running it.

The three laws, together, create a requirement for a vibrant ecosystem and community. Developers cannot write correct, maintainable and efficient programs alone, because modern programs depend on more code than one sole developer can produce. They have to reuse components provided by the community. They have to share a belief that the community is healthy— either growing or large enough that existing components will be maintained and new ones will be built. This makes the community surrounding a language a paramount factor when selecting a programming environment.

Three Laws and Order

Correctness is undoubtedly our main concern in software. If a piece of software doesn’t do its job properly, it’s not worth running. If it’s unsafe, it’s a security risk. This clearly makes correctness our First Law.

However, one might argue whether maintainability should matter more than efficiency or vice-versa. A program could potentially achieve better performance when ignoring maintainability completely, and programming with only correctness and maintainability in mind might result in a completely inefficient program.

In reality, maintainability and efficiency are intertwined: a good architecture is built around an efficient solution. For instance the Xi editor, aiming to be a highly performant editor for “the next 20 years”, is built around the Rope data-structure. Designing systems exclusively top-down may prevent the realization of an efficient solution ; designing them exclusively bottom-up may leak too many implementation details and prevent a clean, loosely-coupled architecture.

Still, if a piece of software is correct and extremely efficient but cannot be maintained, it’s barely more than a black box. And are we that confident it is correct, if we can’t understand it? The other way around, an inefficient but correct+maintainable program might yet be salvageable. Maintainability gets to be the second law.

Laws and Languages

The Three Laws of Informatics refer to programs only. Yet programming languages may provide guarantees, constructs, tooling, an execution environment, a community or an ecosystem that help developers build law-abiding programs. Let’s look at some of the players.

System languages

C and C++ share some properties. They’re great to write fast, low-footprint code. Manual memory management leaves safety in the hands of developers (who can barely remember where they left their phone!). C comes with no batteries included, so you often end-up rewriting data structures ; C++ is such a monster that each project defines which part of the language they use or don’t (Google is famous for not using exceptions). Because they are both unsafe, depending on another library increases the risk of security vulnerabilities.

Go filled the void left by C++ scaring developers away. It aims to be fast and to make writing concurrent code easy using routines. It is garbage collected, which makes it safer than C++. It features a simple type system, devoid of generics and algebraic data types, which does not support code reuse as well as other modern languages. Still, it has a very active community, maybe enticed by Google’s aura and following in the footsteps of Docker.

The JVM ecosystem

Java is a reasonably simple, regular language. There is a clear history of reuse in Java, which is probably champion in terms integrating 3rd party artifacts — sometimes making large projects look like Frankenstein’s monster. Java has excellent tooling that is supported by the language and the virtual machine. Modern JVMs compile code “just-in-time” and turn it into rather efficient, native code. There are different garbage collectors that provide different trade-offs. Oh, did I mention the Java community is huge?

Kotlin tries to provide an alternative to Java “the language” by reducing verbosity and providing a stronger, null-safe, type system. It is mainly targeting the JVM and comes with the same benefits as Java there (there is also Kotlin/Native). Created by JetBrains, the tooling is obviously excellent. Now officially supported for Android development, it is here to stay.

Functional Programming — or FP — languages

Haskell and OCaml began their lives as research projects, but have been getting more popular in the industry in recent years. They are safe, provide great design primitives (notably type classes and modules) and a programming model that, in my experience, leads to fewer bugs. They are both garbage collected—GC was invented for LISP, the first FP language. Haskell, in particular, is completely pure which provides great benefits: all effects — such as IO — are explicitly expressed as types, which frees the developer from worrying about side-effects happening unexpectedly, but can become cumbersome. Their communities both include many researchers which help build solid, formal foundations.

And many more languages

I’m not going to go over every other language out there. Some have very obvious strengths in their ecosystem and community. Python has an excellent ecosystem for data analytics, Erlang helps building fault-tolerant distributed systems using actors, Scala is Kotlin’s older, wilder sibling, Clojure and Racket are modern Lisps, and TypeScript tries to make sense of JavaScript!

The Awakening of the Third Law

There are indeed many interesting languages out there. Most of them have their strengths and share of good ideas. How much do they help willing developers follow the Three Laws of Informatics?

Maintainability is mostly addressed by good design primitives — a.k.a language constructs — , good tooling and community. There are different schools with different opinions on what constitutes a good set of primitives: I personally favor those chosen by modern, strongly typed FP languages.

Leaving maintainability aside, there has been two group of mainstream languages: those with manual memory management and those with garbage collection. Since we developers are imperfect, manual memory management also means unsafety and thus lack of correctness. Garbage collection, while incurring an overhead, has been the de facto standard in new mainstream languages for the last 25 years because safety matters more than absolute performance.

Rust is the first popular language to propose an alternative — automatic memory management and safety without garbage collection — and it comes with powerful FP-inspired design primitives to build high-level abstractions, and much more.

If we can now safely build more efficient software, shouldn’t we? Or should we optimize for development productivity? Could we have both?

Here comes a new challenger

Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety. (rust-lang.org)

Rust has been around for a while now, but it was experimental until 2015 when Rust 1.0 was released. Since then, the language and its ecosystem have grown and greatly improved. It will still be improving for years, but the core team committed to no breakage of user code.

Like Go, Rust aimed to provide a serious alternative to developers interested in system programming. However, it borrows (pun intended) much more from language research and from the successful experiences of existing programming languages.

Rust has a modern type system which can automatically manages memory and — along with bounds checking— ensure it is accessed safely. It features generics and a trait system close to Haskell’s type classes. This enables what Rust calls zero-cost abstractions: having a great design should not impair performance. This awesome talk on embedded programming by James Munns describes how the Rust Embedded Working Group is building a set of reusable components to abstract over various type of hardware operations on tiny chips without sacrificing in execution costs.

As I dove into Rust resources, starting with the great Rust Book and exploring GitHub for projects I might be interested in, I noticed how much this uncompromising approach has shaped the community and ecosystem. Rust developers aim for the best runtime efficiency, the most accurate abstractions and the strongest execution safety. It makes Rust suited for low-level programming, obviously, but also very interesting for much higher-level applications: while people can write Linux kernel modules in Rust, it is also used to build REST Web applications, blockchain nodes and even Single Page Web apps in WebAssembly! Let’s go a bit deeper and see how it helps writing law-abiding programs.

Correctness

It is hard to write correct programs. There is often ambiguity in tasks we’re trying to accomplish. Still, languages can help by being expressive enough and not requiring programmers to jump through hoops to formulate their problem. Rust has the modern panoply of language constructs: algebraic datatypes (called here enums), generics, traits, type aliases, tuples, etc. It also features a powerful meta-programming system using macros that can do the heavy lifting, such as generating serializers, trait implementations, or defining embedded DSLs. Again, Rust borrows great ideas, taking inspiration from Scheme and its descendant Racket — which is particularly good at building DSLs —by supporting hygienic macros.

Another part of correctness is the absence of undefined behavior, safety issues and crash risks. Rust manages memory without a garbage collector by making the type system aware of ownership. Resources are owned by one single variable binding and this ownership can be transferred by passing the variable to a function: Rust calls this moving the resource (in a sense similar to C++’s move semantics).

When a variable binding goes out of its lexical scope, the resource it owns will be dropped — a.k.a, deallocated (it is possible to drop the value earlier by just moving it into oblivion). While they are provably “alive”, values can be borrowed, either immutably in a shared manner or mutably in an exclusive manner. This is similar to C++ references but it allows the Rust compiler to prove there will be no shared, mutable access to a given data structure, and the developer will have one thing less to worry about.

Last but not least, Rust protects developers from a number of concurrency problems by ensuring there is no sharing of non thread-safe values across threads.

There are many resources to understand ownership and borrowing in Rust, but I’ll show a simple example. Please note that there are more intricacies to ownership, notably the fact that some types can opt to be copied rather than moved. For instance, it makes no sense to “own” an integer because it is so small it can be trivially copied. The Rust compiler is also constantly evolving, becoming smarter and allowing more intuitive code.

In the snippet above, a new String called s is created and passed to the function foo. Because foo takes a String and not a reference to a String, it acquires its ownership rather than merely borrowing it: the String was moved to foo. Later, still in main when trying to print s the Rust compiler will complain that the resource s was bound to has moved and is thus no longer available. This program will fail to compile.

It turns out that foo returns an unchanged s but the compiler does not know it, and neither would any developer looking solely at the main function, foo’s signature and String’s traits. Knowing these metadata should be enough to know that foo takes ownership and our access to that resource is lost.

To make our program compile, we can simply print newS instead. Rust even lets us call it s again, which is great because s was not usable after passing it to foo anyway! The following program prints “Bar” then “Foo”.

Looking at foo, it also creates another String bound under a variable named s2. It will get dropped when it goes out of scope. So far, this looks a lot like automatic memory management for structures allocated on the stack in C++ or GC-managed reference handles in Java. The difference is that resources — such as stack or heap allocated memory — always have precisely one owner.

Here, returning s2 instead of s in foo will move s2 back to the caller and the program will print “Bar” “Bar”. By the way, if you wonder why s2 is still usable after the call to println!, it’s because it println! only borrowed it! Finally, the exclamation mark shows at a glance that println! is a macro.

Finally, there’s no reason to allocate a String here, because the “Foo” and “Bar” constants are already in the binary. Rust can directly point there and get a “slice” that we can borrow. We use a type called &str that can be either a borrowed String or a slice!

Maintainability

Not only does the ownership system make program execution safe, I’d argue it also makes the code more maintainable and reusable. With one glance at a function or at a lexical scope, Rust developers can determine the properties of its variables. They can build APIs that convey more precisely how they should be used, and summon help from the type checker.

Let me give some examples. In C, the compiler has no idea how long a pointer returned by a function will be valid. In a garbage collected language, you may store any reference returned by a function and prevent a potentially large chunk of memory from being freed. Many Java libraries come with documentation describing when it is safe to call which methods, how long objects will be in a valid state, and enforce those rules through exceptions or let you deal with the consequences undefined behavior. This becomes even harder when multi-threading is involved. Rust makes it possible to return references that may be valid for a given lifetime known at compile time, give a copy or give a shared reference that you may dispose when you like.

Rust comes with Cargo, a command line utility to build, manage dependencies (called Crates), run tests, fix warnings and more. Having a community-approved build tool means efforts can be focused there. It helps that the developers of Cargo made good choices, such as supporting semantic versioning for crates out-of-the-box, using a human readable and editable configuration format (TOML), or supporting reproductible builds. There is also rustfmt, an automatic formater that prevents wasting time manually formatting source files and on endless arguments about tabs-vs-spaces (spoiler alert: 4 spaces won).

Still, tooling for Rust is a work-in-progress. Java had a 20 years head-start, but the language itself was very suited to tooling. How are IDEs supposed to support DSLs in macros? Time will tell. There is an official language server which has a VSCode and an Eclipse integration. There is also a plugin for IntelliJ IDEA.

The Rust compiler is backed by LLVM, a mature infrastructure with an efficient optimizer. It can also target WebAssembly, which lets Web applications be written in Rust, and may allow running non-trusted code in a sandbox.

It seems to me the core developers of Rust have been actively looking for the better ideas out there, which is a refreshing change to Sun’s “Not Invented Here” syndrome that has guided Java for so long. Among those good ideas, Rust’s traits and lack of structural inheritance provide great design primitives that help build modular and maintainable systems.

Rust developers took another great choice with error-handling. Rust has the Result<T, E> type which may be either Ok(T) with the successful value or Err(E) with the error. Haskell programmers will recognize the Either type. Having errors handled using a regular construct means all the usual machinery can be used — including pattern matching, passing a Result as a value or serializing it.

Rust also uses traits to make the code less verbose. Like Java’s Iterable and its foreach loop, or Haskell’s do notation for monads, Rust has a healthy dose of syntactic sugar on top of traits which makes it easy to build types that feel natural.

For instance, Rust’s std::ops::Add trait is used when trying to use the + operation. Operator overloading always had a bad press in C++, but it is also a big reason why Python is so strong in data analytics. Numpy’s arrays and matrices conveniently support the same operators we use on paper. To prevent conflicts, Rust only allows the module defining the trait or the one defining the target type to implement traits. Here’s a simple example of making a custom Point type support summation.

Efficiency

Rust is fast, running at a speed comparable to C. Because it has no garbage collector, there is no hidden cost — even without pauses, GC code runs in separate threads and consumes resources.

Because of the focus on efficiency, the community is very prone to run benchmarks for everything. Because code sharing is easy and safe, we get to reuse performant data structures. In a recent blog, Bryan Cantrill compares a C and a Rust version of a program and attributes the 40% runtime improvement to using BTreeSet, an efficient data structure available out-of-the-box in Rust’s standard collections.

Rust lets users control the memory layout of their data structures and makes indirections explicit. This helps writing cache-friendly code, but also interfacing with C. Rust’s FFI with C is straightforward and has no overhead, which makes calling any system primitive easy (but it is unsafe and should be appropriately wrapped). This is something we are reluctant to do in Java, notably for stability reasons — a segfault will crash the JVM — , but that can be useful. For instance, one of the fastest Java Web servers is using JNI to call Linux’s epoll and seems to perform better than NIO, Java’s standard non-blocking networking lib.

Speaking of which, there’s no point being fast if we’re blocking a thread waiting for IO. Rust comes with zero-cost futures, including non-blocking, back-pressured streams. Because futures and stream chaining can become verbose, is is possible to already use async/await to write asynchronous code like idiomatic Rust code. Right now, await is implemented as a macro but there is work underway to make it a standard Rust feature.

Rust’s flagship non-blocking IO library, Tokio, builds on futures to provide a consistent and fluent abstraction for non-blocking programming. Tokio is in turn used by the Hyper HTTP library, which is used by Web frameworks.

Community and Ecosystem

One can see the great efforts the Rust community has put into the language and its ecosystem for years. They made the experience starting with Rust very enjoyable and welcoming. I found the spirit animating so-called Rustaceans inviting and inciting.

Rust has official forums and discussion channels where you can get help and see core developers discussing technical matters. Everything is developed and debated in the open, and contributions are welcome.

A weekly newsletter gives updates and a sentiment of constant improvement. It elects a “Crate of the Week” to give publicity to community efforts. It calls for help on issues, sometimes even within the official Rust distribution.

The Rust community is very active on GitHub and tags many issues as “First Good Issues” for any wannabe contributor. In fact, there is no Open Source foundation like Apache or Eclipse in Rust, yet there is a strong Free Software culture. There is a sponsorship of Rust by Mozilla — many core developers being Mozilla employees — , but then many large projects are still living on individual GitHub accounts.

The community is still tight enough that everyone is working together to build a complete ecosystem. Rust developers rewrite basically everything for safety reasons, to depend as much as possible on Rust code rather than wrapped C, C++ or Go libraries.

Developers can publish their crates to crates.io. The Rust standard library is quite small and even optional, and by design most of the development of common functionality — such as futures , serialization or logging — happens in different crates. Some crates are standardized through the RFC process.

Because of its qualities and of its community, Rust attracts lot of talent. Rust has a great cryptography ecosystem, libraries for concurrency and data parallelism. You might be interested to use QUIC? There’s a library for it! Were you thinking about Haskell’s Quickcheck? Check! Or fuzz testing? GTK+ UIs? No problem! You like GraalVM? Rust has HolyJit! Nom and Pest are two libraries for parsing. People write OpenGL video games in Rust, others write network services or WebAssembly VMs.

Future<Rust>

Rust creates a new deal by enabling zero-cost abstractions and mixing a set good, proven ideas with a novel approach offering memory safety for free. It reinvents system programming by allowing high-level constructs, and reliably gives high-level programming speed and control.

Still, people choose languages and frameworks because they are productive and guaranteed for a particular purpose. If you want to build a Web application, Ruby on Rails or Java are safe choices.

I’ve personally been spoiled by Java’s overall engineering quality, tooling, productivity and comfort. Rust is clearly less mature there, but I will bet on the dynamic in action in the community, and on more companies adopting Rust and help making it a world-class programming environment.

In a few years, Rust will likely provide a productive, safe programming environment that feels as dependable and risk-free as Java does today. The fun part is getting there.

--

--

Simon Chemouil

Insert recursive meta-circular joke. I’m @simach on Twitter.