Three and a half ways with Lamba and VPC.

A few tricks for mitigating cold starts in VPC.

Published in

DevicePilot

4 min readApr 29, 2018

These days we’re a serverless company. But we weren’t always. And like many migrants from the stability of EC2 and Containers to the new world that is Lambda, we fared the rocky seas of needing to access resources in a VPC.

One of the first things any developer investigating Lambda will discover is that there are such things are colds starts; an extra latency when you’ve caught AWS off guard and they needed to spin-up another instance of your function. If your function lives in a VPC, however, these cold starts can be positively arctic (think five to ten seconds).

But you’ve decided to go serverless, and you’ve learnt enough from Netscape not to just burn everything to the ground and start again- and so you need a way to mitigate the damage until you’ve kicked the VPC habit.

Approach One: Put Up.

It might seem a bit flippant, but under a lot of typical load patterns a cold start is very much the exception, not the rule. And sure, if you’re Amazon then 100ms might well cost you 1%, but for a lot of web applications your average user is going to forgive the odd action that seems sluggish.

For a lot of our VPC hosted functions we found we could just hide any exceptional loading times in the UI. As long as something is happening on the screen (from a spinning wheel, to a ‘I’m working on it’ button), the application still felt reactive enough for our users.

Approach Two: Make a Proxy Bet.

Cold starts are a numbers game. You lose if the number of ‘warm’ lambda instances swimming in your VPC pool is less than the instantaneous concurrency required. To improve your chances of winning, you need to maintain a healthy population.

AWS kills off lambda instances for a few reasons. The most common reason is because they haven’t been used for a while (anecdotally this is around 45 minutes, however the real number is kept in a safe in AWS headquarters that only Jeff has the code to). Therefore the more you can keep active at any one time, the faster life will be.

The problem is, the more lambdas you have that need access to a VPC, the more concurrency you have to reserve. You might want to make sure 100 people can log-in at any one time, but keeping 100 lambdas warm just in case will be hard.

You can improve you chances by routing all your VPC resource requests through a single “proxy” lambda. Now instead of needing there to be enough VPC-hosted lambdas per function, you “only” need enough of the proxy for the whole of your application.

Although we found it more effort than it was worth, it also gives you an advantage if you’re so inclined to use the ‘ping’ trick (where every minute or so you send a concurrent set of quick short-circuited invocations in an attempt to beg AWS not to kill your instances). However, it doesn’t matter what you do, every good lambda goes to heaven eventually.

Approach Three: Get Asynchronous

Once you’ve hidden everything you can in the UI, and played your chances with a proxy; you might also consider just how synchronous you need to be about things. Much like a cat in a box, a cold-start latency only really exists if you’re watching. If you’re processing a Kinesis stream, or responding to an SNS call, then, frankly, who cares if every once in a while you take a few seconds rather than milliseconds.

For many updating operations, it can be enough to have your lambda call into the proxy with the ‘Event’ invocation type. This invokes the lambda asynchronously; allowing you to return the expected outcome, without hanging around to make sure you’re right. Of course, this won’t always be suitable, but you’d be surprised how often it is!

Unapproach Four: Open Up

If you can’t go over it, you’ll have to go through it. The fact is, nothing in AWS needs to be in a VPC. It’s just amazingly good practice for a lot of their ‘instance-based’ offerings. If you want to, you can make your ElastiCache instance available to the world wide web; taking it, and your lambda, out the VPC all together.

We never did this. Typically I live by the maxim that AWS can afford people much better at my job than me, and therefore I shouldn’t assume that I can somehow defend my resources better than they can without a considerable effort. But if you think you can do it safely, then I guess that’s a solution.

While I would always recommend iterating to path where you no longer need a VPC gated service in your life; with a few tricks it is more than possible to make it through a migration period without feeling too much pain.