Analysis of Time Series Data— Lecture 02

In this lecture, we will study analysis of Time series data.

Step -01: Read data and make sure that you convert date column datatype to DateTime object.

data = pd.read_csv('airline-passenger-traffic.csv', header = None)
data.columns = ['Month','Passengers']
data['Month'] = pd.to_datetime(data['Month'], format='%Y-%m')
data = data.set_index('Month')
data.head(12)

Step-02: Plot Time Series data

data.plot(figsize=(12, 4))
plt.legend(loc='best')
plt.title('Airline passenger traffic')
plt.show(block=False)

Observation from above graph:

  • Traffic increasing year on year
  • Pattern repeat after every year, follow summer → winter trend
  • Some data is missing in year — 1951,1954 and 1960 and reason can be data capture issue or data was not recorded as the event did not occur. For example, if sales did not happen on x date may be due to some operational issues, the record entry of sales would be 0 on that particular date.

Step — 03: Handling Missing values

  • Mean Imputation: Imputing the missing values with the overall mean of the data.
data = data.assign(Passengers_Mean_Imputation=data.Passengers.fillna(data.Passengers.mean()))
data[['Passengers_Mean_Imputation']].plot(figsize=(12, 4))
plt.legend(loc='best')
plt.title('Airline passenger traffic: Mean imputation')
plt.show(block=False)

Imputing the missing value with mean, median and mode can reduce the variance. Not suggested in Time Series data.

  • Last observation carried forward: We impute the missing values with its previous value in the data.
data['Last_observation_carried_forward'] = data['Passengers'].ffill()

Imputing the missing value with the next observed value and last observed value can introduce bias in analysis and perform poorly when data has a visible trend.

  • Linear interpolation: You draw a straight line joining the next and previous points of the missing values in the data.
data['Passengers_Linear_Interpolation'] = data.assign(Passengers_Linear_Interpolation=data.Passengers.interpolate(method='linear'))

To deal with missing values in time series data with trends is Linear interpolation as it imputes the missing value with the average of previous and next values.

Step — 04: Decomposition of Time Series

WIP !!

Meanwhile, Please feel free to clap if you liked the article. Also, please subscribe to my YouTube Channel https://www.youtube.com/channel/UC4yh4xPxRP0-bLG_ldnLCHA?sub_confirmation=1

--

--

--

Everything about Forecasting (Zero to Hero)

Recommended from Medium

Prototyping a ML Model for Employee Turnover with Google BigQuery ML

The Basics of Predictive Analytics

Logic Behind The Drop

Reducing the carbon footprint by controlling traffic lights using Python

Is Big Data Doing More Harm Than Good?

Soiling Analysis — Solar PV Modules

An analysis of daily mortality in France during the COVID19 pandemic

Missed opportunities in the EU’s revised open data and re-use of public sector information…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aakash Goel

Aakash Goel

Senior Data Scientist @ Fractal Analytics

More from Medium

Getting started with Sentiment Analysis using Pre-trained NLP Models with python codes

Predicting GDP of Georgia using LSTM

Handling Imbalanced Datasets by Oversampling and Undersampling with Python Implementation

Implementing Decision Trees: Mathematically and Using Python