Going Serverless: How we handled file upload, processing & notification
Early this year at PitchPrint, we got registered as a US company (we’ re based in Cape town) through Stripe’s Atlas program and along came with it, a $10,000 AWS credits. At the time, we had a monolith NodeJs app on DigitalOcean serving a few thousand stores with millions of uploaded photos in S3. We decided against simply moving that lump over to AWS and instead, to re-build the app afresh on AWS using Serverless.
After a few studies, it was clear, Serverless is exciting but requires a new thinking in the way systems are built. The platform is mostly event driven and as such, bundling up an Express app into a single function just flatly defeats this purpose. Breaking things down into simple, nimble functions presented some interesting challenges and we’ll like to share how we were able to solve them in a few posts. In this post, we’ll share how we solved the issue of client file uploads.
One of the features of our app allows users to upload their photos and use them to customize a print-product on our client’s stores. So people upload all sorts of “images” tiff, psd, coreldraw, you name it. And these images need to be converted to browser friendly versions.
So here’s how the file upload process works in the monolith app.
Client selects a file from their browser, the file is uploaded to the server for processing (while browser waits, maintaining the connection..) The file is then uploaded to S3 by the server and processed using an external service (we use cloudconvert.com for file processing). Once file is processed and saved into a different S3 path, a response is sent back to the waiting browser with details of where to get the thumbnails, the image dimensions, type etc. Pretty simple right?
Well, cloning that to a Serverless environment is highly inefficient. We don’t keep connections active while waiting for a slew of functions to process a job. A waiting function burns money 💵. Serverless systems are event based and needs to be decoupled as much as reasonable. Let each function perform its singular job and when done, dispatch an event. Like the Hollywood mantra, we’ll call you when we’re ready, and don’t you ever poll us.
So we broke the processes down into three and tied them using simple events:
File upload, File processing & browser notification.
But wait, how do you call a damn browser in the wild when job is processed? Well, we’ll get there right at the end 🤞.
In our monolith, images are uploaded to the NodeJs app, which then moves the image to an S3 bucket. From there, it is processed by the external service. You see right there, that’s inefficiency.
Since the browser can save files to an S3 bucket, we decided to upload straight to S3 (don’t use your IAM security credentials in a browser, never ever!). At first, finding the information was pretty hard but after getting a hold of it, it was darn easy. We simply invoke a lambda to generate an upload authorization signature with a time-based expiry, we then append the signature to the form attributes of the uploader and send to S3. Before S3 allows the upload, it validates the signature and accepts the file if it’s correct, otherwise fail.
It’s way more efficient than uploading files to Lambda, which disk space is limited to 512 MB. And even if the file size is not an issue, it’s a waste of compute cost compared to uploading straight to the S3, which is more ubiquitous. In a recent presentation, Chris Munns (Senior Developer Advocate for Serverless at AWS) raised a salient and important point: “Use functions to TRANSFORM, not TRANSPORT data”
I think statelessness should also extend to the disk system; we currently have over 130 Lambdas, none of them ever touch the disks or requir’d fs. We strictly use streams and buffers.
We set up event triggers on S3 to invoke a function based on the file path and extension (type). This function sends the job to cloudconvert for processing, along with the credentials with which to save the processed file into a different S3 path when finished. And it shuts down, not wait for the job to complete. Functions should do one thing, and do that one thing well. When the processed image is saved, S3 dispatches a different event (based on the path where job was saved) which invokes another function to notify the browser.
This is the trickiest part. In our initial versions, we implemented a polling mechanism where the browser calls a function at intervals with the file ID. This function checks for the image in the S3 complete-path and fetches its meta data (the conversion details), returning that to the browser as job details. If none is found, it returns an null until called again. So imagine from when the file upload completed, we’d started polling at an interval. Sheer and complete waste of resource 😖
So how do we notify the browser when the file is processed and ready? Well, here comes sweetness: MQTT piggy-backed on websocket.
At the beginning when the app is initialized in the browser, we created a messaging pipe to AWS iotCore using an MQTT client over websocket. With this socket in place, a Lambda can dispatch an event to any browser running our app, by publishing a message to the app’s unique channel. It’s super sweet and simple 😊
Of course you can also achieve this using external services like Pusher, PubNub etc, but iotCore is way cheaper in both connectivity and message cost, though it only supports MQTT which is pretty limiting.
The journey so far is interesting. Serverless in general is first, about efficiency and then cost. It forces you to think in new ways and write more efficient and resilient code. You want it to start as soon as possible, by reducing dependencies and shut down as quickly as possible by not doing too much in a single code.
Breaking things down also makes it easy to drastically reduce your cost. You get to aptly allocate resources to each function based on how much each requires. It forces new thinking along means of balancing cost vs latency. So for background processes like sending mails, you want to allocate cheaper, lean resources because millisecond latency isn’t an issue. On the other hand, for user-facing end points, you want to dole out more ram+cpu for a faster response time, while still not over-provisioning resource.
Decoupling things makes it easier to dissect even a large swath of resources and easily pin-point deficiencies or issues. It also enforces resilience because you can quickly fix that one function without bringing down or redeploying the whole application. You can roll back features at discreet, granular levels.
And the most important of all, the ability to vastly scale without a care or worry in the world.