Under-using AWS Kinesis Stream to limit AWS Lambda throughput

At Quri we collect on a daily basis hundreds of thousands of data points through our crowd-sourced mobile application Easy Shift. We ask our Shifters to gather data about products and promotional displays in thousands of stores every day. To ensure the best data quality possible we need to help our users locate the items by showing them the most accurate product picture possible.

Despite numerous efforts, gathering those pictures from our customers has proven to be a lengthy task and sometimes almost an impossible process. Even though some Product Database APIs exist, we found them often outdated at best, incomplete in most cases.

We learnt however a few months ago from our main customers that they were already partnering with a third-party company to maintain a product database with key information and best-in-class product shots (mostly used for online retail websites).

Knowing that, we started a project to partner with this third-party to integrate their data into our systems. We quickly learnt that their API was not an on-demand API but rather replication API. Hence we had to design a system to ingest their data, but most importantly their hundreds of thousands of images in a fast and scalable way.

With our main Ruby application being hosted on a Platform-as-a-Service provider who bills by the hour, it made little sense to download these pictures there to then upload them to the cloud.

Thus, I decided to use a serverless design to download these images into our cloud storage systems. As we already used AWS S3 for such purposes, AWS Lambda was a perfect fit.

The design was pretty simple: when fetching the original dump or the incremental updates, we would store the third party remote image URL and then, asynchronously, we would trigger lambda functions with the AWS API to download the assets and then store them in our S3 buckets.

That seemed optimal, until I started processing the hundreds of thousands images we had recorded the URLs for. I quickly started noticing Lambda Errors, shortly after, I received an email from our 3rd party partner informing us that: “Quri has set up automation to make simultaneous requests for images from multiple virtual machines. Our system can’t handle that, Quri’s API account has been temporarily suspended. We need to limit simultaneous requests to no more than 20.” — duh

Basically, I was killing their servers. It didn’t cross our minds that there could be a non-scalable system on their end.

They were asking us to limit the number of simultaneous connections to 20. I was quite disappointed to have designed something that could in theory scale indefinitely on one side but clearly not on the other end. And honestly, I was a bit clueless as how to limit lambda’s throughput while keeping the API calls asynchronous (remember, we didn’t want to spend that time on our main app, the one that had the URL records).

I started researching AWS products. They have so many that there would for sure be one to suit our needs. First I looked into SQS but it didn’t seem to plug so well with Lambda. But I quickly realized it was possible to hook up Lambda functions as consumers for AWS Kinesis Streams.


Kinesis was built as a queue system that can huge amount of data. In the open source world it could be compared to Kafka. People use it to process massive amounts of logs, for instance.

A Kinesis Stream has shards to balance the workload and each can have a Lambda consumer that can fetch between 1 and 100 records at a time for processing.

I was starting to see how we could control AWS Lambda’s execution.

I changed the design to the following:

Instead of calling the AWS Lambda API when processing each image entry, I now put a new record into an AWS Kinesis Stream for each URL to process. I decided that I only wanted one shard so that the Partition Key I use is always the same. I then changed my AWS Lambda function a little bit so that it can take an array of records as input and then asynchronously download and drop to S3 all these images.

Finally, I set up the Lambda consumer of my unique shard to fetch 20 records at a time, that way I could ensure there would no more than 20 simultaneous connections to our 3rd party provider.

Six months later, this system has been running flawlessly and we haven’t heard from the 3rd party provider ever since.

Finally, there are a few further steps we could make to optimize the design:

  • Instead of using a single shard, we could use 20 of them with consumers that fetch only one record at a time. This would bring more concurrency
  • In case of network error, we could also post faulty messages to a Dead Letter Queue and then replay those. This would make the system more fault tolerant.