What is Time Series and Is There a Magic Storage Bullet? — Part 1

by Chris Herrera

There is a lot of talk these days about time series data storage and analysis in the big data world. However, not all time series datasets are created equal, nor are the use cases that involve them.

Below, we will begin outlining some of the key use cases around time series and why it is a focal point for a wide variety of different technologies.

So, What is Time Series?

The world is time series. Yes, that statement is a bit useless and vague, but it is true. Almost everything is captured, processed, and analyzed in time. Every financial transaction you make, every website interaction you have (this is where the high-powered stuff comes in, i.e. clickstream), every watt of power you use in your house, and every little blip and motion that your car makes is stored in time order.

The Nature of Time Series

These things can be as slow or as fast as the natural time that the process that generates the data takes…that means if you are tracking the growth of Mt. Everest (about 4mm per year), or processing clickstream data from Amazon (about 200TB of data daily) that will vastly affect how you store and process that data.

Of course, the use cases vary widely in how this data is consumed as well. No one from amazon is ever going to track specifically what one user did, but more an aggregate analysis of how to better optimize their navigation etc. (a nice tutorial on this topic can be found at the Hortonworks site here).

Similarly, the data around how fast Everest is growing, when stored on an exception case basis only (meaning when the data has changed beyond the error margin of the GPS device), can probably be easily analyzed using Excel (or notepad…seriously it’s not going to be that big).

Two things are evident from these examples…

1. There is a difference between continuous vs discrete analysis and how those two sets are handled.

2. Essentially you need to try to have a vision of the end use case…I’m not saying it needs to be designed up front, but if you know that your data will predominantly be viewed as the result of some batch processing function, or if it is something that is needed in real time via a streaming pipeline, the way you construct the ingest, data access, and persistence differs.

In this brief introduction, I have not even scratched the surface of time series. The goal of this post was to show you that there is not just one “thing” called time series, but it is a vast array of domain, use cases, and (in a lot of cases) strong opinions. As we talk more about time series, I will touch on elements of data quality, typical types of analysis, technologies in play, and architectural design patterns.

I hope you enjoyed my initial discussion on time series — what is it and the nature of it — please feel free to share this blog on Twitter, LinkedIn, and Facebook and be sure and keep up with all new content from Hashmap at https://medium.com/hashmapinc.

In Part 2, coming soon, I will discuss Using Time Series Datasets and Issues/Challenges with Time Series Collection and Storage.

Chris Herrera is a Senior Enterprise Architect at Hashmap and works across industries to help accelerate the value associated with big data and connected data platforms. Feel free to tweet @cherrera2001 and connect with him on LinkedIn at linkedin.com/in/cherrera2001.

--

--