As Seen at PLDI: Precise, Dynamic Information Flow for Database-Backed Applications

Jean Yang
6 min readJun 18, 2016

--

Me giving the talk. Photo thanks to @3pid.

Greetings from the plane! I’m flying back from the Programming Language Design and Implementation conference, where I (finally) presented our paper on Jacqueline, a web framework that prevents information leaks by making it easier to write secure web programs. Here is what the work is about.

Wanted: The Double Hull for Information Security

Aftermath of an oil spill.

To motivate the work, let me first talk about a different kind of leak. In 1989, the Exxon Valdez tanker struck ground in Prince William Sound, leaking 30 million gallons of crude oil. The resulting spill was one of the biggest ecological disasters of the twentieth century. A major factor in the severity of the accident was that the Exxon Valdez had only a single layer of steel separating the crude oil from the sea. The Oil Pollution Act of 1990 required that all tankers have double hulls, where there is a layer of air or ballast water insulating the oil. This regulation of tanker design has prevented numerous spills.

From marineinsight.com.

What we would like is the equivalent of the double hull for information security. But, as you may have guessed, data is a bit more complex than crude oil. For instance, what it means to “strike ground” in information security depends on both the computation and the viewer. Managing sensitive data is so complex, in fact, that the entire area of research on language-based security focuses on developing techniques to envelope programs in a layer of checking to prevent information leaks. (Sabelfeld and Myers have a nice 2006 review of the area here.)

Code with policy spaghetti highlighted.

With existing techniques, however, there remains the problem that the programming models we use were not designed with security concerns in mind. Sensitive data is not treated in any special way in code — and so programmers are responsible for implementing checks across the code to ensure computations keep sensitive values appropriately confidential. As a result, state-of-the-art techniques require the programmer to navigate a tangled “spaghetti” of policy code and other functionality, where code is peppered with policy checks throughout. The programmer must reason about this policy spaghetti to implement new policies or new functionality. Any missing or buggy check can cause a leak. Even if we have tools that can prevent leaky programs from running, it is difficult to write that leak-free code in the first place.

To return to our analogy: people have designed double hulls for preventing information leaks, but with existing programming models it is difficult navigate the ship to avoid accidents. The information might not leak, but it is also difficult to get very far.

Solution: Reduce the Opportunity for Programmer Error

The goal of our work, then, is to make it easier to avoid accidents. In our paper, we propose the design for a web framework that factors out the implementation of information flow policies. We present Jacqueline, a policy-agnostic web framework that allows programs to implement each information flow policy once, instead of as repeated checks and filters across the program. Jacqueline is based on a standard model-view-controller architecture, where the model code specifies the data layout, the view code specifies the frontend layouts, and the controller code specifies everything in between.

In Jacqueline, the programmer specifies policies alongside the model — and only alongside the model, without need for policy checks or explicit declassifications anywhere else in the code. The controller runs in an enhanced runtime that propagates sensitive values and policies across the program. The framework is responsible for attaching policies to all data that flows into the system, propagating policies across the application and database, and displaying values based on the viewer and policies.

Securing the Application-Database Boundary

This paper builds on previous work on runtime policy enforcement with the policy-agnostic Jeeves programming language. Jeeves allows programmers to factor out information flow policies and has an enhanced runtime that propagates sensitive values and policies in order to show the appropriate results. In prior work, we have formalized the execution model and security guarantees and also demonstrated how to implement Jeeves as an embedded domain-specific library.

BP Deepwater Horizon oil rig explosion.

Things would be great if we could do all computations in the Jeeves runtime, just like things would be great if oil existed only within the confines of double-hulled tankers. But, as we learned the hard way with the BP Deepwater Horizon oil spill in 2010, all of the double hulls in the world don’t help if we haven’t taken the appropriate precautions with our oil rigs and the oil leaks even before it gets to the ship. Similarly, applications that naively use Jeeves (or any other language-based solution) have a big gaping hole that is the interaction with SQL databases. The application code can have all the guarantees in the world, but a single policy-unaware database query can subvert all of those guarantees.

To address this problem, we present a strategy for extending Jeeves’s computation model to incorporate unmodified, vanilla SQL databases. We were interested in targeting unmodified SQL databases for two reasons: 1) to avoid having to reinvent decades of database optimizations within the Jeeves runtime and 2) to make this approach more compatible with legacy code. The main challenge involved developing a technique that allowed us to take advantage of SQL query implementations while propagating information about sensitive values and policies through all queries, especially in a way that was not too complicated to model formally in order to extend our theoretical guarantees. You will need to read the paper to learn more about how we satisfied these requirements!

Why You Should Read the Paper

In this paper, we propose a useful way of factoring out policy code from other functionality in a web framework and describe how to make this approach work with an enhanced runtime and vanilla, unmodified SQL databases. Not only do we formalize our approach and prove that we have strong guarantees across the application and database, but we have also implemented Jacqueline as a Python web framework extending Django and demonstrate that we often have reasonable, even negligible overheads.

Even if you are working on a production system and cannot afford additional runtime overhead for policy enforcement, I encourage you to think about how you can factor out policy checks as much as possible and establish good practice to make your code modular with respect to policy checks. And make sure to include database queries when you think about policies!

Finally, a meta-goal I have in doing this work is to encourage people to take a harder look at the programming models we’re using. All kinds of people are writing all kinds of programs these days, with concerns that people did not have in mind when they designed the languages we are using today. This, combined with recent advantages in programming tools, make now a great time to rethink our programming models. I’m excited to see what the community comes up with in the next few years.

You may read our paper, “Precise, Dynamic Information Flow for Database-Backed Applications,” with coauthors Travis Hance, Thomas H. Austin, Armando Solar-Lezama, Cormac Flanagan, and Stephen Chong, here. The full slides for my talk are here. Here’s our current implementation on GitHub. I will post a video when that’s up too.

P.S. Conferences are fun when they are in Santa Barbara!

Outside the conference with my former groupmates Alvin Cheung and Nadia Polikarpova.

With thanks to Wombi Rose for teaching me about ships and to Armando Solar-Lezama, Nadia Polikarpova, Nate Foster, Arjun Guha, and Jean-Baptiste Jeannin for feedback and suggestions for my talk, on which this post is heavily based.

--

--

Jean Yang

Building @akitasoftware to help companies get back in control of their data. Previously @Harvard (ugrad) @MIT_CSAIL (PhD) @SCSatCMU (Assistant Prof).