Maestr was adopting the latest and greatest in serverless technology, and had recently switched over some microserves from solid GCE instances, to Google Cloud Functions.
Sadly one problem cropped up, in that they were starting to eclipse their connection quota, which looked exactly like this as soon as they deployed their functions:
GCF Connection quota
Most likely, your GCF code will be doing a lot of external RCP calls. Querying APIs, fetching proxy resources, etc etc. Each outbound request is a connection, and there’s a quota involved with it, so if each one of your functions is doing 10 RCP calls, and you’re running a million instances.. Quota might get in your way.
Optimizing GCF connections
A little unknown feature of GCF is that connections can be re-used between invocations of a cloud function. Note that this is not the default behavior, however using it properly means if you’re fetching from the same host, you don’t have to pay the extra overhead of that connection, since, it’s already opened.
The code below was being used by Maestr to fetch an asset from one of their databases:
Although this code is pretty standard, the HTTP request has no persistence, which means that it performs a new connection on every function invocation. We can fix this and maintain persistent connections by using a custom HTTP agent with the keep-alive option:
(Note : The same approach works for HTTPS — just use https.Agent instead of http.Agent.)
To see how this influenced the number of socket connections, we ran a simple test with Artillery,(fetching the given URL at 30 QPS for 300 seconds) and watched the connection quota on the Cloud Functions API quota page in Cloud Console.
This also has the side effect of decreasing DNS resolves, which, btw, is also something you have to provision quota for.
Also, Google apis!
In my opinion, one of the strongest things about GCF is that it comes with out-of-the-box support & helper libraries to the rest of Google Cloud. However, because those APIs generally make RPC calls through socket connections, we can run into the same problem with connection quota.
Turns out that Maestr was running into this problem as well, where their GCF was querying Cloud Pub/Sub, (but this approach works also for other client libraries — for example, language API client or Cloud Spanner).
The below is an example of the code that they were using, which ends up performing one connection and two DNS queries per invocation:
However, by moving the pubsub variable to a global state, and declaring it const, we can remove unnecessary connections and DNS queries
The fix is in!
While the main benefit of optimizing networking connections lines in it reducing the likelihood of running out of connection or DNS quotas, a byproduct is that it reduces the CPU time spent in establishing new connections at each function call as well.
With these simple changes Maestr was able to get a 8x reduction in connection & DNS usage, removing that pesky quote problem altogether!