Time Series Analysis with Python

Published in

quaintitative

5 min readOct 29, 2018

Time Series Analysis with Python

Time series analysis is essential when looking at financial data.

Time series analysis simply means analysing data that changes over time. While we can apply many of the same techniques that we use to analyse other data-types to time series data, there are some other concepts and techniques that we need to learn.

Being able to deal with dates and time stamps is obviously necessary when dealing with financial data. The datetime library has a number of functions to help us deal with dates and time stamps.

import datetime

datetime.datetime(2018, 12, 31)Out:
datetime.datetime(2018, 12, 31, 0, 0)

Converting from string to a datetime object.

datetime.datetime.strptime('2018/12/31', '%Y/%m/%d')Out:
datetime.datetime(2018, 12, 31, 0, 0)

Converting from a datetime object to a string

datetime.datetime(2018, 12, 31).strftime('%Y/%m/%d')Out:
'2018/12/31'

We can also use the Pandas library for this.

pd.to_datetime("21st of Jan, 2018, 10am")Out:
Timestamp('2018-01-21 10:00:00')pd.to_datetime("31/12/2018, 12pm")Out:
Timestamp('2018-12-31 12:00:00')

We can also generate a time series of random data.

index = [pd.Timestamp("2018-01-01, 12pm"), 
         pd.Timestamp("2018-01-02, 6am"), 
         pd.Timestamp("2018-01-03, 3pm"),
         pd.Timestamp("2018-01-04, 2pm"),
         pd.Timestamp("2018-01-05, 10pm")]

ts = pd.Series(np.random.randn(len(index)), index=index)
ts.plot(rot=90)
plt.show()

A simpler way to generate a set of timestamps.

d.date_range(start="2017-01-01", periods=12, freq='M')Out:
DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
               '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'],
              dtype='datetime64[ns]', freq='M')

Other frequencies are also possible -

B(business day), D(calendar day), W(weekly), M(month end), MS(month start), Q(quarter), A(annual), H(hourly), T(minutes), S(seconds), L(milliseconds), U(microseconds)

We can slice and dice time series just like we would with any other Pandas data frame.

ts_random['2018-03-01':'2018-03-10']Out:
2018-03-01    0.777562
2018-03-02    0.413725
2018-03-03   -0.183474
2018-03-04    0.567550
2018-03-05    1.035431
2018-03-06   -1.390770
2018-03-07    0.937524
2018-03-08   -0.192736
2018-03-09   -0.225089
2018-03-10   -0.047632
Freq: D, dtype: float64

We can also deal with difference in time using timedeltas.

pd.Timedelta(days=1) + pd.Timedelta(seconds=1)Out:
Timedelta('1 days 00:00:01')

We would usually need to convert strings to timestamps once we read CSVs.

SGD = pd.read_csv('SGD.csv')
SGD['Date'] = pd.to_datetime(SGD['Date'], format='%d-%m-%y')
SGD = SGD.set_index('Date')

We can also condense our data frame by grouping the data by years or months.

AnnualSGD = AnnualSGD.groupby('Year').mean()

To get the returns, think of the Pandas dataframe as a series of data that you can shift up and down for easy matrix and vector like operations.

Returns (in % terms) can be computed by taking -

price(today) — price (x days ago)/price(x days ago)

In coding terms, this means that we can compute the 1 day (i.e. x = 1) day returns by taking a series from the Pandas dataframe as it is (at time t) and shifting it by 1 day.

To compute returns.

SGD['Returns'] = (SGD[' Last ']-SGD[' Last '].shift(1))/SGD[' Last '].shift(1)

Log returns are possible too.

SGD['LogReturns'] = np.log(SGD[' Last ']/SGD[' Last '].shift(1))

We can also merge time series by dates.

t_series = pd.merge(SGD, CNH, left_index=True, right_index=True)

Time zones are another tricky issue we have to deal with. Most times, our date time objects might not be timezone aware.

t = SGD.index[0]
t.tz is NoneOut:
Truet = pd.Timestamp('2017-01-01, 10am', tz='Asia/Singapore')
t.tz is NoneOut:
Falseimport pytz
tz = pytz.timezone('Asia/Singapore')
rng = pd.date_range('1/1/2017 00:00', periods=10, freq='D', tz=tz)
rngOut:
DatetimeIndex(['2017-01-01 00:00:00+08:00', '2017-01-02 00:00:00+08:00',
               '2017-01-03 00:00:00+08:00', '2017-01-04 00:00:00+08:00',
               '2017-01-05 00:00:00+08:00', '2017-01-06 00:00:00+08:00',
               '2017-01-07 00:00:00+08:00', '2017-01-08 00:00:00+08:00',
               '2017-01-09 00:00:00+08:00', '2017-01-10 00:00:00+08:00'],
              dtype='datetime64[ns, Asia/Singapore]', freq='D')

It’s also possible to convert between timezones.

ts_utc = ts.tz_localize('UTC')
ts_utc.index.tz

ts_utc.tz_convert('Asia/Singapore').index.tz

We can also up and down sample.

Up and Down Sampling

It’s common to want to look at time series data at different time horizons. When we need higher frequency data (e.g. from months to days), we upsample, and when we need lower frequency data (e.g. from months to years), we downsample.

Down Sampling

Let’s create a random dataset that is 600 minutes long (i.e. 10 hours), with a datapoint at each minute (i.e. frequency = T)

datetime_long = pd.date_range(start='2018-01-01', periods=600, freq='T')
ts_long = pd.Series(np.random.randn(len(datetime_long)), index=datetime_long)Out:
2018-01-01 00:00:00   -0.012577
2018-01-01 00:01:00    0.016296
2018-01-01 00:02:00    0.493567
2018-01-01 00:03:00    1.478707
2018-01-01 00:04:00    0.705589
2018-01-01 00:05:00   -1.289139
2018-01-01 00:06:00   -0.515903
2018-01-01 00:07:00    0.727075
2018-01-01 00:08:00    2.496662
2018-01-01 00:09:00   -1.051974print('# of hours:', len(ts_long.resample('60min').mean()))Out:
# of hours: 10ts_long.resample('60min').mean()Out:
2018-01-01 00:00:00   -0.150229
2018-01-01 01:00:00   -0.013467
2018-01-01 02:00:00    0.033141
2018-01-01 03:00:00    0.085284
2018-01-01 04:00:00    0.078558
2018-01-01 05:00:00   -0.032913
2018-01-01 06:00:00   -0.259231
2018-01-01 07:00:00   -0.291667
2018-01-01 08:00:00   -0.121116
2018-01-01 09:00:00   -0.174889
Freq: 60T, dtype: float64

Upsampling.

ts_short.resample('D').mean()Out:
2017-01-01    0.609543
2017-01-02         NaN
2017-01-03         NaN
2017-01-04         NaN
2017-01-05         NaN
2017-01-06         NaN
2017-01-07         NaN
2017-01-08    0.358023
2017-01-09         NaN
2017-01-10         NaN
2017-01-11         NaN
2017-01-12         NaN
2017-01-13         NaN
2017-01-14         NaN
2017-01-15   -0.600486ts_short.resample('D').ffill() #Can also backfill with bfillOut:
2017-01-01    0.609543
2017-01-02    0.609543
2017-01-03    0.609543
2017-01-04    0.609543
2017-01-05    0.609543
2017-01-06    0.609543
2017-01-07    0.609543
2017-01-08    0.358023
2017-01-09    0.358023
2017-01-10    0.358023
2017-01-11    0.358023
2017-01-12    0.358023ts_short.resample('D').interpolate() # linear interpolation2017-01-01    0.609543
2017-01-02    0.573611
2017-01-03    0.537680
2017-01-04    0.501748
2017-01-05    0.465817
2017-01-06    0.429885
2017-01-07    0.393954
2017-01-08    0.358023
2017-01-09    0.221093
2017-01-10    0.084163
2017-01-11   -0.052767
2017-01-12   -0.189696
2017-01-13   -0.326626
2017-01-14   -0.463556

The full code is available here.

playgrd.com || facebook.com/playgrdstar || instagram.com/playgrdstar/

Time Series Analysis with Python

Written by playgrdstar