Building a Serverless Data Pipeline

How I designed my OpenWhisk-based app to work with data efficiently and cost-effectively

The project: A Stack Overflow dashboard

The data pipeline we built with serverless functions feeds a web dashboard that our team uses to monitor Stack Overflow for tags we’re interested in answering.

Why choose serverless?

Designing the serverless application

  • Modular, testable functions: Split the application down by drawing a flow diagram of the different steps. As a starting point, making each step into a serverless function can work well. By thinking about the boundaries between components, our application will also become easier to test.
  • Single purpose components: Think of a Unix commandline program — it does one thing, and one thing only. It is probably brilliant at doing that thing, but if I want to format the output or write it to a file, then I need a different utility for that. The same principle applies here. Try to give each component a singular purpose.
  • Data hygiene: Data hygiene is a bit like kitchen hygiene. Every utensil in the kitchen should not be used in every dish being prepared for the table. Equally, every component in our application should not be making calls to every datastore. Think about which components need which data, and how to achieve that data access with as few contact points between components and datastores as possible.
The serverless architecture of the soingest data ingest tool written for OpenWhisk. We’ll review each JS file here in this article.
  • The collector action makes an API call to Stack Overflow. It checks if we received sane data, and then returns it.
  • The invoker loops over the data fed in from the collector and programmatically invokes a new sequence (qhandler) for each of the questions. it finds
  • First the storer determines whether we should insert this record into the CouchDB database or update an existing record. It also adds some metadata to the data before passing it along ...
  • … to the notifier that looks at what has happened so far, and if it's a new question, sends the webhook to trigger the notification.

Stand by for code

Setting up triggers and rules

Making a serverless API call

Invoking a sequence from code

Store data in the database

Sending webhooks from a serverless action

Data and serverless

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lorna Mitchell

Polyglot programmer, technology addict, open source fanatic and incurable blogger (see http://lornajane.net)