Building Event Driven Architecture using AWS Kinesis, Lambda And Node JS
For the last year we have been building a Microservice that processes incoming events, stores them and delivers the aggregated information to the users via API or Web Sockets.
This article will explain its technical architecture, design decisions we made and how we make it robust and scalable.
What does it mean — event driven?
In a modern world systems are no longer running in isolation. They need to talk to other systems. They need to extract data from multiple systems (some might be legacy, some might be new).
One approach you can come up with is to constantly poll external systems. This is the default approach that is usually taken as it involves no or very little changes in external system.
However, polling can harm other systems if not implemented correctly:
- You do not have real-time updates. They are limited by the poll interval.
- You are putting too much pressure on the other system.
- Most of the time polling does not yield any new updates — hence you are burning CPU and paying for running machines without any benefits.
So what is the alternative? Event driven!
The solution is to tweak or change external systems so they will let you know when they have any new data for you.
Majority of event producers were already hosted in AWS and we had a requirements for low latency. Hence we chose Amazon Kinesis as a way to ingest and process incoming events.
You need to consider scalability and how will you handle triple the amount of events you receive today. Amazon Kinesis is scalable — you can scale it up/down your Streams by adding/removing more shards
We used Lambda functions (i.e. executable piece of code) to process events by linking it to the Kinesis stream. As soon as an event appears on the stream the Lambda function will be executed.
If you allocate 128MB of memory to your function, execute it 30 million times in one month, and it runs for 200ms each time, you pay around $10.
How does Lambda actually work? There must be a server somewhere?
Lambda is serverless by design— you do not know about the server nor do you care about it. You do not need to run security updates or worry that it ran out of disk space or RAM.
There must be some drawbacks! Yes…
Since your function does not have a dedicated server, it is prone to a cold-start. If you haven’t had events for 5 minutes then it will take longer time for the first event to be process.
We found that it can take up to 3–4 seconds to process the first event from Kinesis when using NodeJS. When using heavy languages (Java or C#) this time can be considerably worse.
The cold start up time and inability to use C# (support for C# was added much later) we decided to use NodeJS as primary language for writing Lambda functions.
How can other systems send events?
Amazon provides numerous SDK (for almost every language you can imagine) to send events to Kinesis. Alternatively you can do a HTTP POST request but you will need to encrypt a message in Amazon specific way.
Compromise using hybrid solution
Not all systems can send events containing all the information. Then external system can send an event indicating that there is a change (without actual data). Upon receiving such an event you need to call their API and get actual data.
This solution is still better than constantly polling an external system. We found that it works for low traffic or slow updates.
However, the best solution for high traffic events and frequent updates is to try to embed all the details inside the event.
Serving data via API and WebScokets
On one side we have events entering the system. On the other side we need to inform users about any live updates that these events contain.
Since we need to provide live updates WebSockets were chosen as a way to update user browser when changes are available.
The API servers were used to provide stateless API as well as WebSocket updates.
Since we already chose NodeJs for Lambda function it made sense to chose NodeJs for API and WebSockets. Express and Socket.IO were chosen.
Does that even work?
Yes, we implemented that system that processes around 200 K events per day and expect it to grow into processing 1 million events per day.
We implemented even driven microservice that integrates with multiple external services (14 in total). It consumes events from all of them using described patterns.
I would recommend Kinesis + Lambda pair for any event driven or streaming architecture.
- How we chose the database that can receive a million writes & reads per day with little latency?
- How to scale up and down based on demand and number of users?
- How data is served back to customers using Socket.IO