Time Series Data use cases and storage (Time Series Database)

Pranav Kulshrestha
Pranav Kulshrestha
Published in
4 min readJan 30, 2023
Fig-1: Monitoring parameters of server performance wrt time

What is time-series data? Why is it collected? How is it collected or stored? These may be some questions in your mind if you have ever been acquainted with time-series data or worked with a time-series database like apache-druid, OpenTSDB, Influx DB etc. I will try answering these questions in this blog,

What is Time-Series Data?

Time-Series data is not a new concept; it has been used for a long time, with the earliest use by sailors to map ocean charts (e.g. Maury Charts). Today it is used by stock market investors, environmental scientists and many other industries because of the abundance of time-series data collected. Time series data is getting useful in various situations as more and more technologies produce this type of data.

So, What is time-series data?

The measurements and observation of events as a function of the time at which they occurred. It is the repeated measurements of parameters with the time at which the measurements were made.

Where do we see time-series data getting used?

Fig-2: Plot of price at which Sensex ended after every two months in a 4 year period (2009–2013)

We all know about stock markets. There are nearly 16 major stock exchanges, and each can have up to hundreds of quotes per second. They also handle more than 100 million transactions every single day. Every measurement in the stock market is a function of time to see the trend and fluctuations of the stock in a short time frame.

Trading data in a short time frame can give a lot of information to visualize the price and volume fluctuations and help in the prediction of the future movement of a stock. Many institutions, hedge funds and mutual funds use various trading algorithms that use this data to correlate trading behavior to other factors, including global events and sentiment analytics.

Fig-3: Plot of CO2 levels in atmosphere tracked by NASA (2005-Present)

Time-series data can also help get environmental trends. By making many measurements of a parameter like ozone level as a function of time, we can determine the behavior of the condition of the environment. For E.g. continuous measurements of greenhouse gas levels (e.g. C02) in the atmosphere as a function of time give us the horrifying and long-term trend of steadily increasing CO2 levels.

Recording the exact time of a particular event or when a critical parameter changed can benefit us in risk reduction.

Hence, time-series data finds its use cases in many situations in present times. But, the main question is how can we store time series data at scale?

How to store time-series data?

As we have discussed, time series is a series of values with time, a time series database is required to store multiple time series. It should also provide queries to retrieve data from one or more time series for a particular time range.

To implement large-scale TSDB we need to ask ourselves some questions. How many distinct time series are there? What is the kind of data? For how long do we want to store the data?

We can think of Relational Database (RDBMS) as a solution, but the cost and complexity of relational databases grow very fast as the scale of the time-series data grows. Relational databases can work reasonably well for hundreds of millions of data points, but the stock markets handle a billion trades in just over three months.

Hence, RDBMS doesn’t seem to be a scalable solution. The rate of ingestion and retrieval of data for RDBMS is not great.

NoSQL non-relational database seems to be the preferred solution for time-series data because it scales well, is efficient and supports rapid queries for a given time range. The rate of ingest and retrieval of data for these databases is also great as compared to relational databases.

There are drawbacks which include an increase in the complexity of the application, but the scalability benefit, especially in the case of time-series data, is far more crucial. Some examples of good NoSQL non-relational open-source time-series databases are Apache Druid, Influx DB, OpenTSDB etc.

Time Series Data and Machine Learning

I want to end this blog by briefly talking about how time-series data can be used with machine-learning algorithms. Businesses are using machine learning to unlock the potential value in their data. Machine learning is used to see signs leading up to failure using time-series data and then perform predictive maintenance.

It can be helpful to look over years of performance to understand the things that happened in the past. The benefits of long-time series databases of sensor data for machine learning models can be enormous for some industries.

--

--

Pranav Kulshrestha
Pranav Kulshrestha

Open-Source Contributor, Developer, Around bugs and exceptions most of the time