Diamonds in the Rough

Illustration from the National Journal’s coverage of Quorum https://www.nationaljournal.com/s/26737

Quorum’s database never stops growing. What began as two simple tables, one for people and one for bills, today contains around 60 different sets of data spread across hundreds of tables and multiple databases.

During this time there have been a few inflection points; moments where the scope of what we were building increased so dramatically that it shattered many of the assumptions we’d made and invariants we’d maintained until that point.

These moments in Quorum’s development have produced some retrospectively fascinating architectural decisions, the gemstones forged by the immense pressure of running a software startup.

The best example of one of these moments was the decision to expand beyond the US Congress to all 50 state legislatures. This meant an exponential increase in both the size of the database and number of products; users would want access to single states, groups of states, or just the federal government, and wouldn’t be interested in anything else. This presented a tricky technical problem: we needed the ability to create arbitrary partitions of our increasingly large tables on the fly, and had a very short amount of time to come up with a workable solution.

Micromanaging Models

This time constraint ruled out most simple solutions. We couldn’t afford to create separate tables for each region, and didn’t have the resources to deploy a constellation of servers with different configurations for different clients.

We needed a layer of server-side logic that would preprocess database queries, like a “middleware,” so users would only ever see the data that was most relevant to them.

(query, user) => filtered_query

In a canonical Django stack, filtration of models for a particular user happens in the view layer, whose contract offers an HTTP request (containing a user) in exchange for the HTML to display. There were three problems with this:

  • We had a LOT of view functions, so to rewrite all of them wasn’t very realistic.
  • We were beginning a process of phasing out the view and template layers.
  • This behavior was unique to data, not to features, so it was more closely related to models than to views.

The last point was a big clue for us. We realized that the layer we desired is precisely the use case for Django’s Managers, which allow you to override the way Django generates SQL queries. Unfortunately, Managers don’t expose a method that takes the current user into account. If they did, you could never build a Django app that didn’t require users to log in. Therefore, it made sense that this functionality didn’t exist.

So we built it.

Here the for_user method would be our “middleware,” and by defining it on the managers for our numerous models we could appropriately filter to what a given user should be seeing in advance.

More importantly, this Manager would cause queries to break if they were not properly initialized with a user.

Bill.objects.all() => ValueError “Invalid method ‘all'..."

This query now appropriately throws an error, because it’s nonsensical to query the Bill model without taking into account the region that the user is currently viewing.

By defining for_user on subclasses of UserDataManager we could achieve the necessary polymorphism of “middleware” logic for database queries, guaranteeing that users would only receive appropriately partitioned data.

Slice and Dice

The job wasn’t quite done yet. Changing the way queries were initialized broke a few things, and we often had to compensate by writing our own versions of core Django functions.

A particularly painful, but worthwhile, thing we had to do was create a fork of Tastypie to make the necessary changes for the proper functioning of our REST api.

We needed to make sure that all queries created inside the api were properly initialized, which wasn’t too hard.

A trickier problem was overcoming the effects of modifying managers on relational fields. Because the default manager is also used to do related lookups, accessing foreign keys and many to many fields would break without teaching Tastypie to use our system.

Hold Up

Wait a second. We were trying to avoid having to rewrite every function in the codebase, decided instead to fork our dependencies and rewrite every query?!

Well, yeah. It was surprisingly fast and much easier. While rewriting all of the view functions would involve perusing the platform in each different region and identifying places where data was either missing or shouldn’t have been visible, this approach caused Django to break when data was incorrect, which allowed us to find bugs faster and guarantee that it was correct once it ran properly.

Additionally, as more and more of our queries went through our REST api instead of the view layer, our modified version of Tastypie guaranteed that every endpoint would automatically apply its own middleware.

We’re the first to admit this isn’t perfect. Two unresolved issues we still experience are:

  • Quorum’s API cannot be considered truly REST-ful, as the REST contract requires that the same endpoint generate the same results for the same user. A user with access to all 50 states would see about 22k results at /api/bill?region=california, while the same user whose profile restricted them to the US Congress would get zero results.
  • Maintaining a valid cache has become tricky, because the same user viewing different regions should see different results at the same endpoint.

However, for having solved our larger problem in around 20 lines of code, we were thrilled. Also, many of the drawbacks were actually blessings in disguise:

  • Any and all requests going through Quorum’s REST API would be automatically pre-filtered for the user that sent them. This meant our client side scripts would not have to worry about what a user was allowed to see, and also provided a nice boost to security.
  • While deviating from the canonical Django query API has been a headache (particularly for new developers), forcing insecure code to break is much better in the long run than trying to get it right every time.
  • Augmenting Managers has proved very DRY and forward compatible, as arbitrarily complex for_user methods could be built and reused.

This last point came to bear rapidly, as a second inflection point was just around the corner.

Quorum’s 404 page, a slightly modified word cloud from the first presidential debate between Hillary Clinton and Donald Trump.

Did you mean: permission?

Originally, Quorum sold access to its vast database on a per-user basis. As our functionality expanded beyond aggregating, analyzing, and displaying data to improving the collaborative workflow of our users, we introduced the concept of organizations. Organizations are groups of users that share data and use Quorum to work together. From an engineering standpoint, we knew that organizations would want to have administrators and associates and clients of their own. Perhaps certain organizations would only want access to certain subsets of Quorum’s features. Perhaps they would even want to collaborate with users in other organizations. Once again we found ourselves in the mantle, in immediate need of a permissioning system that could be arbitrary complex.

We needed permissions that could change based on the feature, the data, the user, the region, the organization, or the relationship between two organizations. They needed to be accessible at every part of the stack. This complexity eclipsed all of the tools we saw available to us.

The key insight was that permissions could be recursive. Conceptualizing them as a tree would allow us to have permissions within permissions, with parents defining the permissions of organizations and children corresponding to their users.

To give a user access to a particular feature, we would do something like:

Permission(name="Outbox", user=u).save()

Or, to give access to all of Utah’s bills:

Permission(user=u, name="Bill", regions=["Utah"]).save()

They could be associated with a feature, a model, or a region, and be serialized and shipped via REST to the front end for use in all parts of the stack.

Hitting the Motherlode

While maintaining the permission table would become complicated (though not unmanageable), this system would radically simplify many parts of the code base.

For example, just as our for_user methods began to get complicated, we started being able to express them using permissions.

This was particularly useful for preventing users from visiting pages they did not have access to, or from modifying records they weren’t authorized to edit.

@Permission.check("Outbox")
class Outbox(TemplateView):
...

By decorating urls and views with permissions check we were able to boost the site’s security again. Beyond Quorum’s extensive pre-existing security measures, anyone with malicious intent would now also have to trick the Permission table.

Conclusion

With unrestricted time and resources, we likely would have found more canonical solutions to these problems. Unfortunately, time and resources are the primary constraints of running a startup, and these constraints have forced us to approach engineering challenges differently than we might have otherwise. We’re constantly amazed at what is possible with today’s technology, and have consistently done some of our most creative work when we’ve been under the most pressure. The purpose of publishing this is not to prescribe our solutions, but to share what we’ve learned and reflect on the times we’ve been pleasantly surprised, the glints of gold against the silt.

Disclaimer: The code in this article is for illustrative purposes only. It has been editorialized from Quorum’s codebase, and is not intended to provide plug-and-play solutions to the problems discussed.

Interested in working at Quorum? We’re hiring.