Security By Design, A Brief Introduction To Rust

Software security, today, is still a critical issue for both developers and end users

Tadaweb

Published in

Tadaweb

13 min readJan 27, 2020

By Antonin Carette, Machine Learning Engineer

Software vulnerabilities are painful, because attackers can exploit them for remote code executions and the elevation of privilege flaws. They can also access, modify and break software systems. As a consequence, this can lead to critical issues for software users, which can affect the credibility of a software company, increasing the number of bad comments or reviews.

In a recent software security paper submitted at the BlueHat Israel Security Conference in 2019, Microsoft Research showed how memory safety issues are remaining dominant, despite developer tools being published over the past several years to prevent and address them.

Slide from the Microsoft’s Presentation deck, showing the number of patches for memory safety bugs

As defined by Wikipedia, memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows, or dangling pointers.

Those issues are discovered very often and can lead to zero-day exploits in popular software like the Chrome web browser in November 2019 (If you are interested in this, “A Bug Hunter’s Diary” from Tobias Klein, and “Hacking, the art of exploitation” by Jon Erickson, will give you a clear vision of how to discover and to exploit software vulnerabilities).

There are two ways to avoid memory safety issues:

Increase the number of tools that analyse the code automatically, or improving self and pair code reviews
Change, or update, technologies and tools

However, each solution has its own limitations.

In the first solution, we could hire highly experienced developers to check each code review and feature we write. But, this could lead to other problems, like to not be able to scale the review pipeline, or to prevent regressions.

So, we could introduce static and dynamic code analyser tools. But, how do we conduct effective checks with a static analyser, if we have just detected a new way to leak memory in programs? And, in the case of a dynamic analyser, what if the code contains hidden parts that are not executed each time? And what if these tools contain memory leaks or exploits themselves?

This solution is possible but needs highly experienced people, and consequent infrastructures and organisation just to ensure the robustness of the solution.

In the second solution, new technologies to overcome these problems are coming from garbage collected programming languages, which tend to perform slower than system programming languages (like C or C++).
We also let tools perform memory management checks by themselves. This could be problematic.

Developers might get comfortable with this automated, tool-driven process, and avoid thinking about memory management and not notice performance issues. And, as you might already know, garbage collected programming languages are not exempted from memory leak issues.

In 2009, the Mozilla Foundation financed a small internal team to build a new technology that would be the best of both worlds. Technology that would combine, by design, the performance of system programming languages with the safety of garbage collected programming languages: Rust.

In a first part, we will discuss about the strength of Rust by introducing you to the mechanisms behind its philosophy. Part two will showcase existing software and companies that use Rust today, and why they chose this technology. Part three will examine the real safety behind Rust programs.

The philosophy behind Rust

To address the main advantages and targets of Rust, we must first understand the actions that lead to memory safety bugs:

Race conditions: sharing a variable to different code entities, which can be modified by them at the same time, thereby introduce weird behaviours in your program (sometimes hard to reproduce).
Dangling Pointers: sharing a reference to a code entity that does not exists anymore in the program.
Double free: deallocating a variable that has already been deallocated before.

As you can see here, memory bugs can be introduced in very different ways: some of them can be introduced using a single character, and some can be introduced while using “modern” language’s features — like multi-threading.

To avoid memory bugs, we can: i) follow variables across the program, ii) protect the access of each variable (in checking if this one is mutable or not), iii) make sure that the program is thread-safe, and also, iv) make sure that each variable has been freed only once.

This summarises the philosophy behind Rust: to bring software security, not using external tools or people, but by design.

As stated in the research paper “The rust language”, written by Nicholas D. Matsakis and Felix S. Klock:

Rust’s static type system is safe and expressive and provides strong guarantees about isolation, concurrency, and memory safety.

The most interesting part here is “Rust’s static type system”, which means that those guarantees are performed at compile time. For these guarantees, Rust core’s maintainers integrated three well-known concepts as the core components (or mechanisms) in the compiler: ownership, borrowing, and lifetimes.

Ownership
Ownership is a mechanism in the compiler that allows a single code entity to own a dedicated piece of software. This code entity takes care of this piece of software by itself, setting its own lifetime to it.

To be clear, a code entity will ensure that memory is allocated and deallocated for variables it owns.

struct A {
  x: i32,
}

fn own(a: A) {
  println!("I own 'a', and the value of x is {}", a.x);
}

fn main() {
  let a = A{x: 0};
  own(a);
  // As 'a' as been own by another function, the main function does not own 'a' anymore...
  println!("x is {}", a.x);
}

If you are a new Rust programmer, this code does not smell (so) bad.
In fact, we create something simple: we declare a structure, we fill it with a zero value, we pass it as a parameter to a function to print it's internal field, and we print this field again in the main function.

Unfortunately, this code does not compile:

|Compiling playground v0.0.1 (/playground)
error[E0382]: borrow of moved value: `a`
  --> src/main.rs:13:23
   |
10 |   let a = A{x: 0};
   |       - move occurs because `a` has type `A`, which does not implement the `Copy` trait
11 |   own(a);
   |       - value moved here
12 |   // As 'a' as been own by another function, the main function does not own 'a' anymore...
13 |   println!("x is {}", a.x);
   |                       ^^^ value borrowed here after move

Thanks to the compiler, the message is pretty explicit: as we’ve passed directly the Rust a structure to own, this function owns a and when “dying”, drops a (and deallocates it’s memory at the same time).

In five words: a does not exist anymore after the execution of own.

This behaviour avoids completely the fact that several code blocks can own a dedicated variable.

We have different options to fix this code:

Implement the “Copy” trait to let the program copy the structure — this means that ‘own’ will owns a copy of the structure, and deallocates the memory of the copy at the end and not the original structure, which is a good advice for this example.
Pass a reference to the A structure to the ownfunction, instead of the whole entity — we will explore this in the next subsection.

To sum up, ownership ensures:

Memory deallocation for of a variable,
Prevent sharing an already deallocated variable,
A single active binding to any heap allocated memory, at the same time.

All of this, at compile time!

Borrowing
Borrowing is equivalent to C++ references. This means that a function will not owns a variable, but borrows it — this implies that this function is not responsible of the lifetime of the variable.
To borrow a variable in Rust, you have to prepend the type of the variable with the & character.

struct Disk {
  content: [u8; 5],
  title: String,
}

fn borrow(d: &Disk) {
    println!("A user borrowed the Disk named '{}'", d.title);    
}

fn main() {
  let d = Disk{content: [1u8, 2u8, 3u8, 4u8, 5u8], title: String::from("The Rustacean")};
  // A user borrows a disk...
  borrow(&d);
  // Another user borrows a disk...
  borrow(&d);
}

This code compiles, and runs as expected:

A user borrowed the Disk named 'The Rustacean'
A user borrowed the Disk named 'The Rustacean'

But, does it mean that a user can override the existing data of the borrowed structure?

struct Disk {
  content: [u8; 5],
  title: String,
}

fn borrow(d: &Disk) {
    println!("A user borrowed the Disk named '{}'", d.title);    
}

fn evil_borrow(d: &Disk) {
    println!("A user borrowed the Disk named '{}'", d.title);
    println!("User said: 'Let's try to override the data... AHAHAHAH!'");
    d.content = [0u8; 5];
}

fn main() {
  let d = Disk{content: [1u8, 2u8, 3u8, 4u8, 5u8], title: String::from("The Rustacean")};
  // A user borrows a disk...
  borrow(&d);
  // Another user borrows a disk...
  evil_borrow(&d);
}Compiling playground v0.0.1 (/playground)
error[E0594]: cannot assign to `d.content` which is behind a `&` reference
  --> src/main.rs:13:5
   |
10 | fn evil_borrow(d: &Disk) {
   |                   ----- help: consider changing this to be a mutable reference: `&mut Disk`
...
13 |     d.content = [0u8; 5];
   |     ^^^^^^^^^^^^^^^^^^^^ `d` is a `&` reference, so the data it refers to cannot be written

It fails! As you can see, Rust references and variables are immutable by default. Immutable variables are useful because they are inherently thread-safe. Another benefit is that they are simpler to understand, and offer higher security than mutable ones.

For more informations about the borrowing mechanism, please consult this reference: https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html

Lifetimes
Lifetimes is the mechanism that tags a code entity with a life scope, to ensures that every reference to this entity shares the same lifetime.

To be clear, this mechanism ensures that each code reference points to a non-freed variable. This mechanism can be defined using: the stack, the heap, the lifetime of the executed program itself (static), and custom syntax defined by the programmer.

Let’s make the previous example a little more complex: let’s take the example of a “virtual” dvd rental system. In this example, a borrower can borrow an instance of a film if he doesn’t already own one. For this, we can write this code:

use std::time;

struct DVD {
    title: String,
    duration: u32,
}

struct Renter {
    first_name: String,
    last_name: String,
    dvd: Option<&DVD>
}

impl Renter {
    fn borrow(&mut self, dvd: &DVD) {
        if !self.has_borrowed_a_dvd() {
            self.dvd = Some(dvd);
        }
    }
    
    fn give_back(&mut self) {
        self.dvd = None;
    }
    
    fn has_borrowed_a_dvd(&self) -> bool {
        self.dvd.is_some()
    }
}

fn main() {
    // Main objective of a renter is to borrow a DVD
    let mut renter = Renter{first_name: String::from("Antonio"), last_name: String::from("Banderas"), dvd: None};
    // This DVD is in the same scope than the renter
    let dvd = DVD {title: String::from("The Mask of Zorro"), duration: 8160u32};
    // Renter tries to borrow the dvd...
    renter.borrow(&dvd); // ... which fails, because we have to explicit what is the lifetime of the DVD structure
    // the renter wants to borrow
    time::Duration::from_secs(3);
    renter.give_back()
}

If you try to compile this program, the Rust compiler refuses to compile, informing you that it wants more informations from you to be sure that the program is safe:

Compiling playground v0.0.1 (/playground)
error[E0106]: missing lifetime specifier
  --> src/main.rs:11:17
   |
11 |     dvd: Option<&DVD>
   |                 ^ expected lifetime parametererror: aborting due to previous errorFor more information about this error, try `rustc --explain E0106`.
error: could not compile `playground`.To learn more, run the command again with --verbose.

Indeed, as explained in the Rust document:

Every reference in Rust has a lifetime, which is the scope for which that reference is valid. Most of the time lifetimes are implicit and inferred.

In this code, we have to introduce custom lifetimes, to explicit that the reference of a DVD is valid until the renter give it back to the system.

This is the code with lifetimes included:

use std::time;

struct DVD {
    title: String,
    duration: u32,
}

struct Renter<'a> {
    first_name: String,
    last_name: String,
    dvd: Option<&'a DVD>
}

impl<'a> Renter<'a> {
    fn borrow(&mut self, dvd: &'a DVD) {
        if !self.has_borrowed_a_dvd() {
            self.dvd = Some(dvd);
        }
    }
    
    fn give_back(&mut self) {
        self.dvd = None;
    }
    
    fn has_borrowed_a_dvd(&self) -> bool {
        self.dvd.is_some()
    }
}

fn main() {
    let mut renter = Renter{first_name: String::from("Antonio"), last_name: String::from("Banderas"), dvd: None};
    let dvd = DVD {title: String::from("The Mask of Zorro"), duration: 8160u32};
    renter.borrow(&dvd);
    // Wait 3 seconds...
    time::Duration::from_secs(3);
    // And release the dvd
    renter.give_back()
}

To fix the error, we introduced a specific lifetime called a, symbolised with a specific keyword: quotation mark ' . So, when we look at the code, we know that the renter structure will borrow a DVD reference that has a lifetime called 'a. Each operation on the DVD reference will have to share this lifetime in the code.

Now, let’s do something evil… 😈

Can we try to tweak a bit the code in order to drop the structure dvd before the end of the rental? Let’s take a look…

use std::time;

struct DVD {
    title: String,
    duration: u32,
}

struct Renter<'a> {
    first_name: String,
    last_name: String,
    dvd: Option<&'a DVD>
}

impl<'a> Renter<'a> {
    fn borrow(&mut self, dvd: &'a DVD) {
        if !self.has_borrowed_a_dvd() {
            self.dvd = Some(dvd);
        }
    }
    
    fn give_back(&mut self) {
        self.dvd = None;
    }
    
    fn has_borrowed_a_dvd(&self) -> bool {
        self.dvd.is_some()
    }
}

fn main() {
    let mut renter = Renter{first_name: String::from("Antonio"), last_name: String::from("Banderas"), dvd: None};
    {
        // The dvd structure has been created in another scope - it has another scope than 'renter'
        let dvd = DVD {title: String::from("The Mask of Zorro"), duration: 8160u32};
        // Renter borrow an instance of the dvd
        renter.borrow(&dvd);
        // dvd structure is dropped... (Dangling reference!)
    }
    // Here, renter is borrowing a reference to something that does not exists anymore <- compile error!
    time::Duration::from_secs(3);
    renter.give_back()
}

In this program, at the end of the scope on the highlighted line, we introduced a dangling reference — which is basically a reference that points to nothing- in declaring dvd in another scope than renter (let’s call it a “child” scope). At the end of this scope, the compiler will try to deallocate dvd while still borrowed.

Fortunately, the compiler understood that we made something evil here, and explains to us that the lifetime of the dvd instance must be at least equals to the lifetime of the renter:

Compiling playground v0.0.1 (/playground)
error[E0597]: `dvd` does not live long enough
  --> src/main.rs:36:23
   |
36 |         renter.borrow(&dvd);
   |                       ^^^^ borrowed value does not live long enough
37 |         // dvd structure is dropped...
38 |     }
   |     - `dvd` dropped here while still borrowed
...
41 |     renter.give_back()
   |     ------ borrow later used here

error: aborting due to previous error

For more information about this error, try `rustc --explain E0597`.
error: could not compile `playground`.

To learn more, run the command again with --verbose.

Good job Rust!

To sum up, the lifetime mechanism contributes in eliminating dangling references from your program, and ensures that each code entity can be has a defined life scope.

How about "developing"?
The Rust syntax has been borrowed from different functional programming languages, and is closer to the C++ programming language. Expressivity is one of the key feature of using Rust for low-level programming.

Also, Rust comes with a great project manager named cargo, which helps developers write their own Rust crates (libraries or binaries), launch unit tests, build documentation and compute test code coverage that help write and maintain these crates.

The main reason new developers are attracted to Rust is that it prevents memory bugs and offers great software performances while using zero-cost abstraction, as memory checks are performed at compile time. This allows developers to easily prevent memory issues before releasing the product, instead of wasting time trying to debug after.

In short, development time before a release is well utilised, to prevent additional development time after a release.

But, is Rust devoid of any weaknesses? To be completely honest: no.

Two weaknesses currently weigh it down: its learning curve and a long compile time.

Is Rust 100% safe?

One could say that it is 99.9% safe, which is already a great score!
As exposed in the book “The Rust Programming Language”, authors said:

Rust’s memory safety guarantees make it difficult, but not impossible, to accidentally create memory that is never cleaned up (known as a memory leak). Preventing memory leaks entirely is not one of Rust’s guarantees in the same way that disallowing data races at compile time is, meaning memory leaks are memory safe in Rust.

To explain this, Rust lets developers using unsafe code blocs and functions for exotic memory management to facilitate integration of C and C++ code. Those unsafe code blocks must, of course, be used with extreme care.

A lot of work has been done to improve the borrow checker, but also to improve existing Rust programs.

Despite this, Rust could have prevented and avoided the last big CVE to be exposed in the last five years.

Projects, and successes

To fully engage in a technology, it is important to understand its context, its weaknesses, who its users and why they use it.

The Mozilla Foundation has been using Rust since 2016 at the core of the Firefox web browser, to prevent memory safety bugs, browser modules bugs and crashes.

A lot of third-party code is now written in Rust, including the browser media stack.

Due to the main points viewed in the first blog section, Rust could be the perfect programming language for embedded systems, but also for the web, for WASM.

Many projects adopted Rust for complex system programming ecosystems, like Redox-OS, or Servo.

The Rust community is, personally, one of my favourite tech community, always friendly to learn with.

Also, Rust counts many companies that successfully built their solutions using the programming language and language ecosystem. As an example of critical tools using Rust, Tor is currently trying to rebuild several components using Rust to increase its safety.

All those examples show why Rust is the most loved language for developers in the last four years, and why Microsoft, Dropbox, NPM, Mozilla and so many others companies use Rust in production to guarantee memory safe programs.

So, do we have to rewrite everything in Rust?

This question comes back from time to time in forums and initiates many heated discussions.

Honest answer: It depends!

In fact, each problem comes with a corresponding tool that is right for it.

If you write critical or performance programs, like network stacks, embedded systems, drivers or parsers, Rust could be the right solution to implement your project.

To conclude, Rust is not another language to learn and to interact with, without any purpose. Rust offers elegant mechanisms to build secure (by design) software and web applications. So we spend less time debugging after release.

Now, close your eyes, take a deep breath, and imagine a world where we can reduce 70% of our patches for memory safety issues. This world is not so far away, don’t you think?

Thanks a lot to Eric Landuyt, Gauthier Brion, Kiruthika Mani, and Olivier Veneri for giving their time to review this article. Also, thanks to Bárbara Dias and Marcio Souza for their time to post-process the article.

Antonin is an Bayesian Robot Psychiatrist and Computer Neurologist at Tadaweb (aka Machine Learning Engineer). He supports vectors machine. Pythonista, Gopher, and Rust lover. Stepfather of two evil rabbits.