Using mutual TLS authentication in a Serverless world.

Golden Spike ceremony completing the transcontinental railroad


Some months ago, I was presented with a requirement to use mutual TLS authentication when connecting to backend resources located in traditional datacenters. For me, the challenge was how to come up with a solution that would satisfy not only our internal security requirements, but also something that could stand up to strict requirements set out by financial and other regulators. What is detailed here is by no means a complete solution but rather an approach to securing dynamic and ephemeral workloads in situations requiring mutual TLS authentication.


The traditional implementation and use of SSL/TLS certificates has been a very procedural, and often manual process. Given that projects took months or years to complete, it was generally tolerated so long as it did not contribute to any major impediments.

This was also the case with event and message driven architectures as they were deployed atop a fixed infrastructure provisioned to support some predetermined SWAG for peak capacities. While these architectures are not new[1], the rapid adoption of cloud services has enabled event and message based models to also be applied to the underlying infrastructure and services. Certificate management and automation tools available in the cloud have all but eliminated the tedium surrounding the care and feeding of TLS certificates in server-like deployments such as AWS’s EC2 instances, Elastic Beanstalk and to a large extent, even containers.

The introduction of AWS Lambda, though, requires us to rethink how TLS is utilized in serverless or ephemeral workloads. For starters, there is no fixed or predefined infrastructure such as a traditional server and there is no expectation of reuse of the underlying container from one invocation to the next. Also, there are no fixed IP addresses or hostnames to anchor the certificates and using generic certificates(and keys) shared by groups or all lambdas is also not very practical or secure[2].

AWS Lambda functions are by design, event driven and short lived, so we should also adapt our usage of TLS certificates to match as well.


My typical first approach for this(or any problem for that matter) was to search the internet for any solution that might solve my problem. In my research, I came across the O’Reilly book “Zero Trust Networks” by Evan Gilman and Doug Barth[3]. The authors recommend the use of PKI and in particular, a private PKI to provide digital certificates. Further search of the internet for a PKI setup I could manage myself yielded only cold-war era products that were either expensive, bloated or couldn’t be easily automated [4]— more work and expense than I was willing to commit to a “light-weight, dynamic” solution.

AWS does offer Private CA and certificate management in their Certificate Manager(ACM) offerings but this was not well-suited for my goal of a light-weight, dynamic PKI. One downside to using ACM with a private CA adds $400–4000/month or more just to apply it to a free usage tier of Lambda alone.

In the end, I came to the obvious conclusion to create a serverless light-weight PKI to dynamically create TLS certificates and keys. This was also not without challenges but trivial in comparison.


In the diagram below is a basic architecture to request and deliver TLS client certificates and keys on demand to the requesting lambda function. The architecture is divided into a data plane and a control plane.

The service hosted on the server in the upper right of the diagram is configured to require mutual TLS authentication(mTLS) and is also configured to accept the root or intermediate CA used to sign our client certificates.

The client Lambda function first makes a call to the control plane requesting a client certificate. The request in my example also includes information such as a transaction id, session id and any other identifying information. This information is used not only to validate the request for a client certificate but some of the information is added to the certificate using X.509 custom extensions.

The Lambda in the control plane upon validating the request, generates a private key and and signs the client certificate using a signing certificate stored in the System Manager Parameter Store. As new certificates and keys are generated for every session or transaction, I opted to have the signing lambda generate all the TLS artifacts, including the private key, so as not to introduce cryptography dependencies and overhead in the client lambdas. The benefit from this is two-fold, first, all cryptography logic to generate keys and sign certificates are contained in a single lambda and second, the client lambdas focus more on the business logic.

The client lambda then uses the certificate and key to make the request to the backend service. The service also uses the data passed in the certificate extensions for application orchestration and calls to the control plane for further validation.

Serverless mTLS Architecture

Below is a snippet from a TLS client certificate showing custom X.509 extensions. The certificate is valid only for a short window — minutes, rather than years or decades. This window can and should be reasonably tuned for for the given usage. For example, if a single lambda with a timeout of 5 minutes requests a certificate, then anything longer than 5 1/2 to 6 minutes might be excessive. A more complex workload defined using step functions, for example, may share a client certificate for a given execution and in this case would use a single certificate generated with a longer duration.

The custom extensions in the certificate serves two purposes; first, along with the standard X.509 objects and extensions[5], the custom extentions are used to verify trust by the application and also downstream for further validation such as fraud scoring. The second provides enriched logging and audit information to support any requirements in financial or legal transactions for non-repudiation.

Version: 3 (0x2)
Serial Number:
Signature Algorithm: sha256WithRSAEncryption
Not Before: Nov 8 08:20:27 2018 GMT
Not After : Nov 8 08:33:57 2018 GMT

Subject: C=NL, ST=ZH, L=Rotterdam, O=Your Bank, CN=92fea10a-0905-40cc-aca1-54ee52e89c71
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:
X509v3 Subject Alternative Name:
X509v3 Extended Key Usage:
TLS Web Client Authentication
iOS Agent String
{"lat": "52.15", "lon": "4.49"}

Signature Algorithm: sha256WithRSAEncryption

Improvements and Variations

As I mentioned earlier, there were some challenges implementing this. The challenges were more related to the general state of available applications and libraries rather than with the approach or with AWS resources and services.

I wrote the lambda functions using Python, so my observation are from that perspective as well as from the underlying linux platform, related open-source applications and libraries. Other runtimes may have the same or similar issues. The average execution time for a simple client certificate was around 9-11 ms, so not much time was spent to further optimize performance.

The issues I encountered in both the client and signing lambda stem mainly from the dependency on the underlying OpenSSL libraries installed on the platform. Both the Python Cryptography framework and the Requests/Urllib3 frameworks rely on language bindings provided by a compiled binary package(CFFI) to call the OpenSSL libraries. This seemed to introduce some fragility in my dependency management, though, mostly, I just ignored any warnings/errors related to CFFI and everything worked anyway.

The bigger issue I found with OpenSSL was its configuration requirement to reference the certificate and key from a file. Since the certificate and key are returned as attributes in the signing request response, it would have been preferable to pass them directly to the requests/urllib3 connection object. The work around was to persist the certificate and key to randomly named files in the local /tmp directory. While less than ideal, the risk is minimal when using modern cryptography ciphers in this context.


Though my initial challenge was to meet an internal requirement to use mutual TLS Authentication, this solution can also be implemented to meet mutual authentication requirements in other areas as well such as those specified by Open Banking(UK)[6] and the Berlin Group(EU PSD2). The specifications require mTLS for calls to APIs as well as Identity Providers(IdP) and more specifically, they recommend the use of Oauth mTLS[7] for secure connections. This may also be of interest to those deploying applications on AWS’s GovCloud.

Unfortunately, the reference architectures for Open Banking I’ve seen have primarily use load balancers and proxies to offload TLS. This introduces problem areas where the offloading device must be at least minimally aware of the application security — either by interacting directly the control plane or by extracting information from the certificate and securely forwarding it to the application. It’s a bit like an airline unpacking your bags before getting off the airplane.

Ultimately with the broad adoption of public cloud services and more specifically, serverless, there will need to be a shift in security architectures and how underlying infrastructures are treated. Applying a perimeter-based approach is not sufficient, even flawed in a serverless/services based context. The fundamentals of Zero Trust Networking provides us with the guidance to implement more robust security but this will not only require changes in the application and supporting software, but also the handling of infrastructure and the determination of trust over time(think Kairos, not Khronos).

Notes and References

[1] ESBs and IBM MQ comes to mind.
[2] Sharing generic certificates and keys exposes risk in multiple dimensions. Shared certificates with longer validity have a greater chance of exposing a valid key over time; exposed to greater variety of possible code exploits; or loss of uniqueness for transactions or sessions.
[4] In all fairness, Let’s Encrypt built on Cloudfare’s PKI and TLS Toolkit is a good modern implementation but still predicated on a fixed infrastructure. One could implement a private PKI using the toolkit but it’s not very lightweight or serverless.
[6] Open Banking Security Profile