Amazon Timestream — 101
An overview and example with Amazon’s serverless time series database.
What is it?
Timestream became generally available on AWS from Sep 30, 2020. It’s a scalable and serverless time series database and it’s ideal for IOT and operational applications. With the ability to store and analyse trillions of events per day whilst still being faster and cheaper than traditional relational databases, it will definitely be of interest for anyone looking to optimise the way they store their time series data.
Aside from the high performance, here are some other features that I liked whilst trying out Timestream:
- Lifecycle management — Recent data is stored in fast in-memory storage, older data is stored in magnetic storage; data is transferred automatically based on your preferences
- Encrypted — Data is encrypted at rest and in transit and seamlessly ties in with AWS KMS for data stored in magnetic storage
- Serverless — As your application grows, Timestream grows, no provisioning or managing servers, thank goodness 🤩
Getting Stuck In
Let’s go through an example of how to add records to Timestream. We’re going to use the OpenWeatherMap API to retrieve weather data and then load it into Timestream using the AWS SDK for Python (Boto). Of course, we’ll be safely storing our API key and constant environment variables in AWS Secrets Manager 🔐.
First off, I’m going to create our Secret Store and create a database along with a table in Timestream. Note that you can insert data using the SDK (this is recommended), or you can use the Endpoint Discovery Pattern within the Timestream API.
All the usual IAM rules apply, make sure your IAM user/IAM role has permissions to get values from Secret Manager and access Timestream.
Python Script
In the script below, I’m using PycURL to GET data from the OpenWeatherMap API (current temperature, “feels like” temperature, humidity and pressure) about London. I’ve then defined a class “AddRecord” and then inserted the data into timestream, let’s run it to see what happens.
Jumping over to Timestream in the AWS Management Console:
The script has entered the data with each attribute of the weather (i.e. Humidity, TempFeelsLike, CurrentTemp and Pressure) taking up a row each. You can imagine this applying to metrics from an Amazon EC2 instance for example, with cpu_utilization and memory_utilization being attributes of a certain instance in the London (eu-west-2) region.
You can use SQL to query data in Timestream to retrieve time series data from one or more tables. Hook this up with a JDBC connection to connect Timestream to your BI tools and other applications such as SQL Workbench!
Useful Tips
As with all software products, there are some quirks, here’s a few things I faced and things you can do to avoid them 🎉
RejectedRecordsException
You might get the following error message:
Error: An error occurred (RejectedRecordsException) when calling the WriteRecords operation: One or more records have been rejected. See RejectedRecords for details.
The twist here is no “RejectedRecords” body is returned which is why in my script, I added:
print(err.response)
print(err.response["Error"])
This will give you a meaningful error message which gives you something to work from.
Invalid time for record
You might get the following error message:
Error: An error occurred (ValidationException) when calling the WriteRecords operation: Invalid time for record.
This may simply happen because your timestamp is not of data type: string. But it can also be down the time range of the memory store. The memory store has a data retention period, after which, data is transferred to magnetic storage.
Suppose you set your “Memory store retention” at 1 hour during the creation of your table, then any data that you try to add that is older than 1 hour, will be rejected. Increase this value and make sure that the timestamps you are adding are further forward in time than: current-time minus “Memory store retention” period.
The record timestamp is outside the time range
This error may occur if you are using a unix epoch timestamp instead of one in milliseconds. In my script above, I got around this by multiplying by 1000, see the return values underneath to get a better idea.
timestamp_epoch = dict['dt']
--> returns 1604870976 (Timestream fails to add record)timestamp_millisecond = timestamp_epoch*1000
--> returns 1604870976000 (Timestream adds record successfully)
In Summary
Of course, there are other time series databases on the market (InfluxDB springs to mind) but Timestream, like other AWS services, is fully managed by AWS which means less maintenance overhead, ideal for small teams with big data needs!
I hope you’ve found this overview and example of Timestream useful, the possibilities are endless, get creating! 🏗
If you want to read more on the subject, here are some books I’ve found useful:
🤖 Mastering Machine Learning on AWS
Partha is a Senior DevOps Engineer at Perlego. Want to join team? View our open positions on our career page 👀