Policy as Code—Security at Sesame

Scott Surette
Sesame Engineering
Published in
4 min readAug 27, 2021

Sesame’s website and infrastructure is used by a wide-range of people including patients, doctors, providers, and even our own internal support team. All of these have different use cases and require access to different and distinct areas of our application.

Role-based authorization is a common paradigm in web applications. Historically, it’s been implemented using roles defined in a SQL database which are then validated by the code. Tools like Hibernate and Spring Security have made this a lot friendlier, but their design still has a couple issues:

  1. Scalability: Having authentication handling within service code is fine if you only have a few services, but this quickly becomes a headache as you scale to tens, hundreds, or maybe even thousands of microservices. At Sesame we’re now up to ~60 microservices, and always adding more!
  2. Maintainability: As your application grows, you’re likely to end up with microservices written in different languages or running on different technologies. Having to maintain separate authorization mechanisms for each technology is annoying at best, and can lead to security flaws at worst. Spring Security might work great for your Java services, but you’d need to write something home-grown for any services written in another language.

OPA

Open Policy Agent is a tool that unifies authorization handling across different technologies. Abstracting the policy enforcement layer away from the services themselves takes care of the scalability and maintainability issues and then some. Policies in OPA are defined using Rego, which is a high-level declarative language that makes it simple to define exactly who is allowed to access which resources of a given service.

Here’s an example of a policy that defines who can access an accounting endpoint — the user must be logged in as a valid owner or admin, and we also validate the account UUID within the URL. How this works in the background is that OPA is deployed as a sidecar container along with our authentication service where we write these policy definitions. When a request comes in via our API gateway, it goes through our authentication service, which can then query OPA locally via HTTP to get a decision on whether to allow that request. Since OPA is running on the same host as our service, policy decisions can be made very quickly.

simplified diagram of our current auth flow

This setup works well and made it easy to get OPA running in our architecture, but as we scale, having a single authentication service could be a bottleneck. To resolve this, each service we deploy could have OPA sidecars running alongside them so that they could decide on their own whether to allow or deny a request. To achieve this network of sidecars, we can use something called a service mesh.

What’s next?

A service mesh essentially adds another layer to an application for observability, security, and reliability. Similar to OPA, this is accomplished by deploying a set of proxies as sidecars to the application code so that it can handle the communication between each microservice. Not only would a service mesh allow us to implement OPA directly in our application pods, but it would also enable us to take advantage of further security enhancements such as mutual TLS.

Transport Layer Security (TLS) is an encryption protocol that’s very widely used on the internet — when a client connects to a server, the server presents a certificate which the client then verifies before exchanging information. Mutual TLS (mTLS) however is even stricter — before the client and server start exchanging information, the server additionally needs to verify the client’s certificate. This ensures that traffic in both directions is trusted, which can prevent many spoofing attacks and other malicious requests. In order to achieve mTLS across our system, we can use a service mesh such as Linkerd, which includes on-by-default mTLS. Since mTLS requires that servers also verify a client certificate in order to communicate, we can use Linkerd’s network of proxies to easily handle this verification. This additional layer of security ensures that each client is who they say they are.

Although implementing Linkerd would help as move away from our current model where all requests need to go through a single authentication service via an API gateway, there are other service mesh technologies such as Kong that bundle OPA within its proxies themselves, rather than needing to deploy an additional OPA sidecar along with the service mesh proxies.

simplified diagram of potential future auth flow

Coupled with OPA, this service mesh architecture would allow us to achieve rock solid authentication that’s entirely abstracted away from our application itself, without any sort of bottlenecks due to an API gateway routing through a single service. The policy-as-code paradigm allows us to define clear and concise rules that aren’t tied to our application services. We’re free to scale as much as we want without worrying about how any new service in any given technology might handle its security. Having a unified way of dealing with these key components not only makes things easier on engineers, but makes our entire system more secure as well.

--

--