Divakar P M
2 min readJan 28, 2022

Dealing with Time-series Data issues

The increase in time series data and the use of forecasting requires data with quality. Data that do not have quality leads to predict failures, so there are increasing concerns about the data quality nowadays.

As most of our time goes into pre-processing and understanding the data before actual modeling starts and at this stage, it is good to have checks on data quality. In this article data quality issues associated with time-series data are described

1.Missing data interval in between

If the intervals of the time series are regular but some values are simply not present. Sometimes data received through data ingestion may not have continuous data events as expected.

Missing values can be filled using,
* Linear interpolation
* Forward filling
* Backward filling
* Imputation using Mean, Median, or Mode

2. Units of Measurements

Sudden change in Units of measurement will affect the prediction and also the recommendation generated later. During pre-processing steps, it is necessary to validate the units of measurements.

Ex: Data may come in unit MB instead of agreed unit GB.

3. Data type issues

Keep a validation on important features for sudden changes in data types during data ingestion.

Ex: Float values may start coming in strings. An extra type-casting is required here.

4. Timestamps Format changes

Changes in the Timestamp format may lead to serious issues during time series forecasting. Better to convert it into a standard format during pre-processing.

5. Out of range values

In some cases like percentages value cannot be greater or lesser than the specified range. If this issue occurs it needs to be floored or ceiled to min or max values.

6. Timestamp collected is wrong or having unexpected delay

If time series data collected is wrong or if there is a delay in timestamps may lead to prediction failures in production. Monitor the data in the native tool and find out the cause for it if prediction failures happen.

7. Rounded values or already aggregated data points

If the value is not to the optimal level of detail or has a slight variation or if data is already aggregated may end up in the wrong trend for a model. Communicate effectively with data owners and get the raw data as much as possible instead of getting the already calculated data.

Summary: Understanding the data quality issues in time-series data and fixing them makes the model forecast effectively.

For more Machine learning concepts subscribe to my YouTube channel : https://www.youtube.com/channel/UCfDBCMTgV8bD-ngmsE3mb8Q

Divakar P M

Data science enthusiast. passionate about NLP and deep learning