Dragging Serverless Web Apps Into the VPC
Serverless web apps are becoming a defacto standard in the industry with static JS web applications consuming serverless API’s. The experience of using AWS tooling such as S3, CloudFront and API Gateway spoils the developer and saps enthusiasm for spinning up and managing servers for mundane tasks such as serving static web content.
Internal apps running within corporate networks are a common sight and over time more of these are being migrated to the serverless paradigm so that the operational pains of running Nginx/Apache instances can be forgotten. However, as it stands, one thorn in the process is that a run of the mill serverless web app exposes a far larger surface for security incidents when they are publicly accessible than when they are locked away within a corporate network. When considering that one of the biggest risks to an organisation is phishing. A reasonably sized company is almost better to assume a given number of users will fall victim to phishing within a given year and have their credentials compromised. In such a case, having applications locked away within the corporate network adds a significant additional step for an attacker to have to overcome.
Internal non-customer facing applications need the same level of attention to detail and operational oversight as any other publicly released application. However as customers don’t use them directly it’s not difficult to understand a case where they are not shown the attention and love that an engineering team would ideally like.
With the onset of the corona lockdown and a sudden onset of free time I thought I’d take a look at bringing a serverless app within a VPC (Virtual Private Cloud) and how much of an overhead that adds.
I set out with a few aims:
- No public access to static web assets
- API should not be reachable on the public Internet
- Should not noticeably increase the hassle of the deployment process
- Should not increase the hassle in using the application
- No significant running cost increases
- No significant changes to the application codebase
In order to do this I created a small app for a fictional company “Cake Incorporated”. This app simply lists the cakes that are currently produced from the bakery, it is intended for use internally only to inform sales of the current operations of the fictional cake factory. It makes a single API call to the Cake Incorporated CakeAPI for the currently available cakes.
From a tech perspective it’s an Angular SPA that calls on a Node API. For completeness. I have publicly deployed it here: https://cakeinc-vpc-app.projects.robertcurran.uk. I have not bothered to mock an authentication system on this, so we will just imagine that the user has authenticated via some method to get to this page and this is then used to call the API in some secured manner. The authentication mechanism is out of scope of this project lest I get distracted by how amazing Cognito is….
A Visualisation:
Step 1: Restricting Access to Web Assets
An unauthenticated user would easily be able to simply fetch the entire SPA and reverse engineer the structure of the API that powers the application, whilst the user would still need to bypass the API authentication to actually successfully call it. I’d prefer that the very existence of the cakeAPI be a total unknown to external parties, to prevent them being able to derive API structure from inspecting the Angular app. Taking the SPA off the public web and only exposing it to clients connecting from within the corporate network would be the easiest way to achieve this.
CloudFront is primarily a CDN and geared towards serving cached content to clients at edge nodes geographically close to users. With the cakeListing app I am not concerned about it’s CDN capabilities, primarily its use is to handle serving the site through SSL.
CloudFront is pretty great at this, the integration between Route53 and Amazon Certificate Manager means a free public SSL certificate that renews automagically, this is ideal for a set and forget type solution.Renewing certificates, whether that be manually or through a periodic script (LetsEncrypt 😐) requires operational oversight that I just won’t be giving it. So any solution that adds operational work is an instant non-starter.
As a CDN CloudFront logically has no concept of a VPC only deployment. So in order to achieve the access restriction a Firewall will be used to filter all requests to the CloudFront Distribution to only those that originate from within the VPC. This does mean that the traffic is still potentially routed over public internet infrastructure, but this is an acceptable risk for the static web content. The discoverability of the application however can be markedly reduced, as the users will now be connecting from within the VPC they will have access to the internal VPC DNS provision. This means that in place of having the CName record on the public organisations DNS it can be placed in the private VPC DNS.
Putting the record in the private DNS hosted zone means that if the URL for the application is accidentally shared externally (overshared documentation, screenshot etc) external parties will not be able to route to the distribution. This can seem like a waste of effort as users would still need to authenticate, but personally no matter how strong the applications security is i’d rather just not have strangers knocking at the door altogether.
From inside the network site loads successfully, outside the network DNS resolution fails. Leaking the app URL (https://app-test.internal.robertcurran.uk/) is no longer such a concern.
When you create a Route53 private hosted zone and link it to your VPC with DNS enabled any compute brought up inside the VPC will automatically pick up the internal DNS server, there is no need to create a NS (Name Server) record within your public zone to delegate the subdomain. It is best to use a subdomain of your corporate domain, with the proliferation of new TLD’s using a fake TLD such as ‘mycompany.internal` could end in bad times.
Happily nothing needs to be changed about the SSL process as there is no requirement that a URI listed on a public SSL certificate be publicly routable, so I added `*.internal.robertcurran.uk` to the list of domains on my certificate in Certificate Manager and let AWS propagate that through to the CloudFront Distro. This is handy because setting up dedicated private SSL infrastructure is super pricey in AWS 😱.
The firewall in front of the CloudFront distribution needs to allow only traffic that originates from the VPC. Traffic intended for resources outside of the VPC will go through the NAT interface to be translated from an internal IP to a publicly routable address, this will be the EIP (Elastic IP) that is attached to the VPC’s NAT gateway. Therefore, as the CloudFront distribution exists as a publicly routable entity outside of the VPC, when it is accessed from a resource within the VPC it will appear to CloudFront to be coming from the IP address of the NAT gateway. So we need to allow only this IP address through the firewall. With AWS WAF (Web Application Firewall) we define a WebACL that has a single rule to allow this IP.
Now when a user navigates to the page from outside the VPC they get an error:
But if you navigate from within the VPC the validation against the firewall happens completely transparently to both your application and users.
Step 2: Bubble Wrapping The API
Adding authentication to an API is a widely solved problem, and whilst I have not modeled any authentication in the demo app we will imagine the user logs in with SSO or something 🤷♂️.
So the value I am looking at here is to move the cakeAPI away from where unknown users can talk to it, an additional bonus is that all interactions with the API are off the public internet infrastructure. The value in hiding the API away is clear, but the value in taking the API interactions off the public internet infrastructure is a more complex question that will depend entirely upon the context and nature of any given API. I’m not going to consider this here. What can be unambiguously said however is that the exposed surface for security incidents will be reduced by moving this into the VPC only.
There is good support for private API’s with API Gateway (henceforth APIG), it’s a pretty standard pattern to run internal microservice API’s in this way. One gripe I do hold is that it is still not possible to use a custom domain name for a private APIG endpoint, in just building this test project the unecessarly tight coupling caused by an unwieldy generated APIG endpoint of the format `https://{rest-api-id}-{vpce-id}.execute-api.{region}.amazonaws.com/{stage}` caused me to have to keep manually updating the cakeAPI resource URL as the api-id changed 😡, although this is just as much a feature of how AWS SAM deploys here. Minor gripes aside though, the AWS documentation for this is extremely succinct and robust making the conversion from a public (edge) API to a Private one a relatively non stressful process.
Conceptually, we need to add a VPC Endpoint, this will allow traffic to be routed from within the VPC to the host VPC of the API Gateway service without going out to the public internet. When a private APIG is deployed it can only be accessed in this manner as it is not exposed to the internet. All of the private APIG’s that we are using in the VPC we use the same VPC endpoint, so for this to work we need to also set a resource policy on the APIG to allow access from a specific source VPC endpoint.
If you like visuals, here is a network diagram:
SAM is pretty handy for this as we can simply pass an array of the VPC endpoints that we want to allow, and the SAM template transform will take care of all the verbose policy composition.
Transforms Into
It’s worth noting that accessing APIG through the VPC endpoint does impose a soft limitation on the throughput that can be sent to APIG, the interface endpoint has a soft limit of 10GBs sustained and 40GBs burst. Although that’s probably more than enough for almost all applications.
APIG calls on the Lambda functions used to implement the API, these like the APIG service are not running within the cakeInc VPC, by default Lambdas run in an AWS operated VPC. This means that they will not be able to access your resources that are only available within the VPC unless you choose to explicitly attach your Lambdas to your VPC. However unless this is explicitly required it really is best avoided. Some Googling found this blog: To VPC or not to VPC which explains nicely the tradeoffs involved in Lambda VPC placement.
Using it
For any user accessing the app through the VPC they will be hard pressed to notice any discernible change in the application, it’s just as simple to use as before, all of the changes happen transparently to the user. However, in reality the assumption that their VPN is up and running is herculean and this is basically guaranteed to cause some level of aggravation at some point when the VPN inevitably falls over.
More importantly, this is all assuming that the network is secure, which is 2000’s era thinking that has long since been debunked. Zero access network policies are all the rage these days…
So at this point I was looking for ways to simplify access for the end user whilst still keeping the application within the VPC. I watched a Webinar from CloudFlare a few weeks ago in which they discussed their Access product, I was instantly sold. This product allows a user to authenticate with CloudFlare via a web portal and then CloudFlare exposes to them the internal application that is being tunneled out of their corporate network with CloudFlare doing all the necessary stitching. This fits under the zero access networking ideologies and importantly allows you to increase the security of your applications without enforcing frictitious hurdles upon the end user and additional operational management on your IT department.
For this CloudFlare needs to be able to access the application, this is achieved by running two tunnels. An API tunnel and a Web tunnel. Based on the successful authentication of the user it then serves the application. If you like diagrams, here is a diagram showing user access to the private API and CloudFront.
The diagram above is showing the Argo tunnel appliance running on ECS (Elastic Container Service (Proprietary Kubernetes 😅)). I wanted to keep the application serverless whilst running persistent services so ECS allowed me to do this with the least amount of work. I created docker images for running each of the tunnels and set these running as a “tunnels service” and let the ECS platform worry about all the finer points of running services. This way I kept a lid on the mounting complexity involved in running the application.
On the downsides, the CloudFlare Access product was extremely limiting in a number of ways and a bit of a pain to work with. Firstly, it seems that you have to allocate the root of your domain to CloudFlares DNS, I had no interest of migrating away from Route53 (AWS DNS) because of it’s integration with AWS services. This means that I had to use a fresh domain for this rather than just a subdomain, visible in the screenshots as “ro5635.co.uk”. The Argo tunnel product is also not very customisable, calling CloudFront directly from the tunnel will result in CloudFront errors (probably) due to forwarding headers, and the Argo tunnel cannot be configured to fix this; it was then necessary to then go through another proxy! Originally I had a separate layer for this with HAProxy but I later swapped this out for a simple Node script in the same container as the tunnel to reduce the complexity of the system overall.
An additional tunnel is going to be needed for every separate API that your app calls on, which for some apps may just make this solution far too unwieldy. Further to this the configuration of the CloudFlare side of things is all through the web console, which presents it’s own challenges.
What drew me to do this project in the first place was watching a webinar from CloudFlare on their Teams/Access/Gateway product, from that I really wanted to see a solution where an app could be tucked away inside a VPC and then be accessed through some painless authentication with a web portal alone. CloudFlare Access has really caught my attention, just as it stands it’s still a bit to much of a pain to use, but I wonder if it was more tightly paired with their Gateway DNS control product it may one day be able to do some pretty awesome magic…🤞
Just fantasizing here, but, I’d love it to do some sort of service discovery and allow me to white list internal services that I want it to expose and then use it’s control over the users DNS to rewrite the resource URLs as necessary. 🤔
Their product page sells the dream well:
An alternative approach to getting the users access to the app is by spinning up AWS Workspaces (AWS Desktop as a Service Product) and attaching them to the VPC, with a minor edit to add the workspaces security group to the VPC endpoint security group.
I am a big fan of WorkSpaces. Employees of Cake Incorporated could from their personal devices simply connect to the WorkSpace and bang, have access to all the internal apps. No need to connect, provision and manage a VPN! The WorkSpaces network interfaces are created within the private subnet of the Cake Incorporated VPC. But, I don’t want to derail this blog into a WorkSpaces blog, but suffice to say I’m very excited by the potential of WorkSpaces and think deploying apps in this way holds particular value for that sort of set up.
Which brings me to my original aims for this project
- No public access to static web assets
- API should not be reachable on the public Internet
- Should not noticeably increase the hassle of the deployment process
- Should not increase the hassle in using the application
- No significant running cost increases
- No significant changes to the application codebase
Whilst I feel all of these have been met, I think there is still some way to go for this to become a dependable plug and play solution. All of the infrastructure is managed as Cloud Formation stacks and there is no big stand out monthly cost rise (assuming Cake Incorporated already had a VPN and VPC in place), in relation to changes to the existing application only the API URLs needed to be updated. Here is the Github Repo that I used to build this.
I really do hold out a lot of hope that soon there is going to be a much improved set of options for exposing VPC apps and I’m going to be interested to see where CloudFlares Access goes, for today the time I have to throw at this has very much elapsed. But I feel like I will be revisiting this again soon, just hopefully not during a pandemic next time 😂.