How to store time series data with dynamodb and Node.js

Abhinav Dhasmana
mfine-technology
Published in
3 min readMay 15, 2019

Time series data in its most basic form is a sequence of data points measuring a particular thing over a period of time. For more information, you can read the first part of this post What the heck is time-series data?

Our use case is pretty simple. We have users who would like to store their medical observables either manually or through some device. At the very basic, think about saving heart rate/blood pressure/sugar several times a day with few hundred thousand active users a day (with a possibility to a much larger traffic). Our query pattern is: Find this observables for this particular user for last X days.

Here are some of the options we considered:

SQL databases (Postgres): This was the first choice that came to mind because of our query pattern. We can create a composite index on user_id and observables and sortBy createdAt field. We decided not to use this because of the following reasons

  • Write throughput is something that we were concerned about
  • We would not be able to shard this table as users and observables continue to climb
  • We rarely need the data beyond a certain point in time. There is no point in keeping this in active db.

Time series databases: We did not evaluate any of those as our requirement was fulfilled by the current tools.

NoSQL databases (MongoDB/DynamoDB): These solve the problems for us. Even though mongo has more features than dynamo as this point of time, we decided to use dynamodb as rest of the stack is on AWS.

Implementation:

Our input data would look like this

data: {
userId: 10,
observable: 'heartRate',
time: <currentTime> // time since epoch
metaData: {
// other data
}
}

When using DynamoDB, it is really important to Choose the Right DynamoDB Partition Key. We decided to use composite primary key with <userId>_<observable> as Partition Key and time as the sort key. So our table would look like this

TableName: timeseries_2019-05
+---------------+---------------+--------+---------------+
| Partition Key | Sort Key | User Id| Meta |
+---------------+---------------+--------+---------------+
| 10_heartRate | 1557807567857 | 10 | { other data} |
| .. | .. | .. | .. |
| .. | .. | .. | .. |
+---------------+---------------+--------+---------------+

We decided to create a different table for each time slots. For now, it’s kept at a month. If our traffic grows further, this can be done per day as well. So each month, a new table will be created with a high read and write capacity. Since the previous data is infrequently used, we would reduce the Read Capacity Units (RCU) and Write Capacity Units (WCU). You can read more about these here.

Let’s come to code. First step is to create a table

Next, lets insert some dummy data

Next, lets read this data

There is one more problem that we have to solve. How do we create a new table every time a month is about to change? Lambda functions to the rescue. The code snippet that describes the creation of table can be reused. You might have to change endpoint in case it cannot be picked from your AWS config. I prefer using serverless for Lambda functions but you can use any other mechanism.

Limitations: At the point of writing (May-2019), dynamoDB does not provide aggregate functions (unlike MongoDB). So if your query involves aggregation (for eg: average resting heart rate), the option is to either load this data in memory and compute or run some offline script and store the computed data somewhere.

Big thanks to Raghava Dss for this idea and help with the design and implementation.

If you found this story interesting or useful, please support it by clapping it👏 .

--

--

Abhinav Dhasmana
mfine-technology

Senior Staff Engineer @freshworks. Ex-McKinsey/Microsoft/Slideshare/SAP, Tech Enthusiast, Passionate about India. Opinions are mine