In this article, we will try our hand to get a big picture view of a huge time series data. Stock market data is a good example of such a data where the data is collected for every second and can be decades long.
To get a good idea of time series data it's better to plot it, which is what I will use as an application for the code discussed in this article.
If you want to see how the stock performed in the last year it is not worth looking at data collected every hour. May be not every minute and definitely not every second. One data point per day will help us understand the highs and lows in the year.
To get a data point per day from a dataset that has points collected every second, we need to group the points collected for a day into one using a min, max or average operation.
Pandas package in python provides such a functionality to group time series data with just one parameter called frequency. Which is what we will explore in this article.
Here we will use data collected for Tesla’s stock which has been around for about 10 year. Below is the code showing, how I downloaded the data.
The package used is
yfinance and you can find the details here.
Reliably download historical market data from Yahoo! Finance with Python
Ever since Yahoo! Finance decommissioned their historical data API, Python developers looked for a reliable workaround…
I will use this data to create plots similar to the ones we see on websites showing stock market trends. The plots will be for the data in the past 1 day, 5 days, 1 month, 6 months, year-to-data, 1 year, 5 years and the maximum data available.
pandas.Grouper - pandas 1.2.3 documentation
A Grouper allows the user to specify a groupby instruction for an object. This specification will select a column via…
First lets look at the
Grouper class from Pandas that I will use in this article. You can find the whole documentation for this class here, however, we will focus on the parameter called
freq (frequency) that will help group a column of type
Datetime using specified frequencies, also known as offset aliases.
output = input.groupby(pd.Grouper(key='', freq='')).mean()
groupby function takes an instance of class
Grouper which in turn takes the name of the column
key to group-by and the frequency by which to group. The
freq parameter can range from nanoseconds to a year. The subset of the frequencies that we will use in this article are listed in the table below.
The column that you group-by becomes the index. The frequency parameter can take just the character that defines the frequency or prefix it by a quantity to scale it. Example: To group day wise, you can either use
1D. They both are equivalent. To group by week or 7 days you can use either
subsetLastNDays function below is a helper function to create a subset of the after grouping.
There are 4 simple steps that I follow in the code that you will see below:
1. Read the original input data
2. Group-by and average
3. Subset the data as required
4. Save the output data
In the next part of the article you will find the code written to group the input data as per the frequencies mentioned in the table and the charts generated from the output data using Datawrapper.
One Day, One minute
The original data downloaded had data for every minute. So here I just subset the data for the last day.
Five Days, Thirty Minutes
For this case, I use the
T parameter for grouping by the minutes and
30T to group 30 minutes of data into one data point.
One Month, One Day
Here we start with data collected every 30 minutes. To group by day, I used
D value. You don’t need to specify
1 to group by the unit frequency.
Six Months, One Day
Here the data was collected every 1 hour, but that does not need any code change to group a day’s data.
Year to date, One Day
You can accomplish the one-day grouping with 24 hours as well. Here’s how.
One Year, One Day
And just for kicks another one day grouping using
B, business days which is Monday to Friday by default.
If you happen to be located in the UAE where a typical business week is from Sunday to Thursday. You might want to use
CB the custom business day option, which can be set as:
pd.offsets.CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu')
Five Years, One Week
One week groups can be done using
Max, One Month
Lastly for a month-long group use the option
M, that sets the grouped data to the last day of the month. It makes sense to set the combined data to the last day of the month to denote the average of all the days of the month.
To set the combined data to the first of the month you can use
MS offset alias.
There are a lot of other offsets that I haven’t discussed here. I hope that this article makes it easier to work with the other frequencies.