Image for post
Image for post

Lessons learned building serverless data pipelines

Before many of the cooler features of AI products can be productionalized you need high quality and correct data. Pulling data from across the enterprise can be a non trivial task if not given the proper consideration. Having built a couple serverless data pipelines I thought I’d share gotchas and lessons learned for others interested in building similar solutions.

What makes a good candidate for serverless data pipelining?

Good inputs to serverless data pipelines:

  1. Any message or queueing system (sqs for example) are great
  2. s3 (s3 file updates can easily be triggered to run lambdas)
  3. any API that expose deltas or change sets can work

Important lambda limits:

  1. Lambdas can run for up to 15 minutes
  2. lambdas can use up to 3008mb memory. You may be able to restructure your job to work on smaller portions of your data set but don’t necessarily go out of your way unless there some other compelling reason to go serverless.
  3. Every aws account starts with a pool of 1000 concurrent executions. This is a soft limit but be aware and if you want to scale up concurrent executions, consider requesting an increase on this limit before release. Note that you can reserve concurrency to prevent other pipelines or applications causing you problems: learn more here

Compare costs to traditional methods

Save copies

Make sure you’re reporting errors correctly

Logging

Don’t forget about users right to forget (GDRP / CCPA)

Take advantage of sqs

Beware of ordering / duplication

DLQs are your friend

Back pressure mitigation

eg: data transformation lambda -> sqs -> delivery lambda -> database

By setting a concurrency limit on your delivery lambda you can also easily throttle requests and smooth out your demand on downstream systems assuming the incurred delay is acceptable.

Thanks!

If enjoyed the article let me know by commenting below and follow me on twitter to stay updated on all my latest content.

Hi Im Nathan, lets dive deep into how to code, and take on challenging projects. http://nathanpointer.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store