Resampling of Financial Time-Series — an important point to consider

8 min readJul 18, 2024

In one of my studies, I needed to convert daily OHLC-stock-data to monthly data. Initially, it seemed straightforward: just apply the resampling function in pandas. However, upon reviewing the results, I was unpleasantly surprised. Without a thorough check, I would have proceeded with incorrect data. This story serves as a cautionary tale about this issue.

As usual, the full code is available on my GitHub.

Problem Statement

In one of my backtesting studies, I needed to review the markets on the first trading day of each month to decide whether to trade or not.

The basis was a download of daily OHLC stock data. My initial approach was simple: take the pandas data frame with the daily data and use the standar resampling method. I expected to receive a DataFrame with the first trading day of the month as the index and all the related data in the columns.

# download data
ticker = '^GDAXI'
start_date = '2020-01-01'
end_date = '2024-05-30'
dax_data = yf.download(ticker, start_date, end_date)

This gives the following data (first 30 rows of the data frame):

Fig. 1: Daily data of german DAX (starting 2020–01–02)

Based on the daily data, we observe that the first trading day in January 2020 was January 2nd, and for February it was February 3rd. Keep this in mind! Now we aim to convert our data frame of daily data into one that contains monthly data. Specifically, we want to display the first trading day of each month along with all the associated data. The most direct and straightforward approach appears to be the use of the pandas resampling function.

# straigthforward solution
dax_data.resample('MS').first()

The parameter ‘MS’ signifies the start of the month. The method ‘.first()’ yields the initial data set. I expect the code to produce a data frame containing all the first trading days of each month, along with the corresponding data.

Here are the actual results:

The discrepancies are evident:

First Line: The first trading day of January 2020 is January 2nd, according to the daily data. However, after resampling to monthly data, the first row is labeled with the index 2020–01–01, which is a day earlier.
A similar situation occurs for February. The first trading day is February 3rd, but after resampling, February 1st is returned as the label of moth February.

Clearly, the simple resampling function always assigns the first calendar day of the month as the label for the “monthly bin”, rather than the date of the first occurrence of a date in that month within the data frame (i.e., the first trading date of the month).

This method, subsequently referred to as M1, should only be used for a quick and rough estimate. Let’s now consider three alternatives, designated as M2 to M4.

Method 2: advanced resampling

We will use the following function:

def first_trade_day_of_month_2(data):
  ''' Identification the first trading date of a month
  Method 2:
  advanced resampling, using the content of the data, downloaded from yfinance,
  adding new column 'date', filling with content of the dataframe index,
  resampling by the index, colunm 'date' contains now the required dates
  --> delivers the first date of a month where yfinance has data --> first trading day

  Args:
    data: pandas dataframe containing OHLC daily stock data.

  Returns:
    list of trade Dates

  Raises:
    Void
  '''
  data['date']= pd.to_datetime(data.index).date
  return data.resample('MS').first()['date'].to_list()

Explanation:

We receive a data frame containing the daily OHLC (Open, High, Low, Close) data.
We then add a column named ‘date’ and copy the content of the data frame’s index, which represents the trading days.
Next, we resample the data using the `pd.resample` method with the ‘MS’ parameter and apply the `.first()` method to obtain the first dataset of each month.
The label, or index, of the resampled dataframe is the first calendar day of the respective month. However, the first trading day is in parallel still available in the ‘date’ column.
This column is then converted into a list of dates and returned for further processing.

Method 3: Working with calendar functions

Method 3 and 4 requires the installation of additional modules, that are not part of the standard Colab-Environment.

!pip install pandas_market_calendars
!pip install holidays

We will use the following function:

def first_trade_day_of_month_3(start_date, end_date):
  '''
  Method 3:
  generating lists of:
  - week-days between dates (using the pandas function)
  - holidays for the region under investigation
  - bank-days = weekdays, that are not in the list of holidays
  generating a dataframe with the bank-days as index and as value of a 'date'-columns
  resampling of the dataframe (monthly) and export

  Args:
    start_date, end_date: dates ('YYYY-MM-DD')

  Returns:
    list of trade Dates

  Raises:
    Void
  '''
  week_days = pd.date_range(start_date, end_date, freq='B')
  holidays_de = holidays.Germany(years=range(int(start_date[:4]), int(end_date[:4]) + 1))
  bank_d = pd.to_datetime([date for date in week_days if date not in holidays_de])
  bank_dd = pd.DataFrame({'date': bank_d},
                         index=bank_d)
  return bank_dd.resample('MS').first().date.tolist()

Explanation:

First, we create a list of all weekdays between the start and end dates. Note: The pd.date_range() function only identifies if a day is Monday, Tuesday, Wednesday, Thursday, or Friday. It does not account for holidays.
Consequently, we require a list of holidays. We utilize the Python module ‘holidays’ and apply the ‘Germany’ method with the parameters for the start and end years.
Next, we compile a list of all weekdays that are not holidays.
This list must be converted into a data frame to utilize the resampling function. Once again, the bins are labeled with the first calendar days, but the content includes the first trading day of the month. These dates are then transformed into a list and are returned by the function.

Method 4: Use of Schedules of Stock Exchanges

The schedues of the stock exchanges are available via the pandas_market_calendars module. This is a data frame, containing the timestamp of the openings and the close of the specified stock exchange.

Such exchange schedules can also be used for our purpose:

def first_trade_day_of_month_4(start_date, end_date, exchange):
  '''
  Method 4:
  importing the schedule of the relevant exchange (dataframe with timestamps of open and close)
  resampling dataframe (montly), collecting first dataset for each month
  generating a list of the date-part of the timestamps of all first closes of each month

  Args:
    start_date, end_date: dates ('YYYY-MM-DD')
    exchange: Code of the stock exchange (e.g 'XETR') as string

  Returns:
    list of trade Dates

  Raises:
    Void
  '''

  xetra_cal = mcal.get_calendar('XETR')
  schedule = xetra_cal.schedule(start_date, end_date)
  trades_x = schedule.resample('MS').first()
  trades_x['date'] = pd.to_datetime(trades_x['market_close']).dt.date
  return trades_x['date'].to_list()

Explanation:

We retrieve the calendar and extract the schedule, using the Frankfurt Stock Exchange/XETRA trading platform as our example.
The resulting data frame is then resampled (by ‘MS’). The bins are labeled with the first calendar day, and within the bin content, we also find information about the trading days. We utilize the stored timestamp of the market’s close to extract the date.
The closing dates of the first trading day of each month are compiled into a list and returned by the function.

Results

By employing the aforementioned four methods, we generate four lists of the “first trading day of the month” (NOTE: we are aware that M1 is imperfect, but we use it as a reference case).

Once we have compiled the list of the “first trading day of the month,” we can extract the corresponding closing prices from our list of daily stock data.

We utilized the following code for both tasks:

# generating lists of trading days
trade_dates_m1 = first_trade_day_of_month_1(dax_data)
trade_dates_m2 = first_trade_day_of_month_2(dax_data)
trade_dates_m3 = first_trade_day_of_month_3(start_date, end_date)
trade_dates_m4 = first_trade_day_of_month_4(start_date, end_date, 'XETR')

# generating dataframes with close dates of trading days
close_m1 = dax_data[dax_data.index.isin(trade_dates_m1)]['Close']
close_m2 = dax_data[dax_data.index.isin(trade_dates_m2)]['Close']
close_m3 = dax_data[dax_data.index.isin(trade_dates_m3)]['Close']
close_m4 = dax_data[dax_data.index.isin(trade_dates_m4)]['Close']

Explanation:

First block: Generate lists of the first trading day of the month using the four specified methods and various input parameters as requested.
Second block: Take these lists and filter the dax_data dataframe by applying the lists through the well-known isin() function.
Results are four timeseries of closing prices at the identified dates of the “first-trading-day-of-the-month”.

The visualization is provided by the following code:

# plotting results
main_title = 'comparision of various methods to define the first trading day of a month'
sub_title = f'instrument {ticker}  from {start_date} to {end_date} ' 
title = main_title + '<br><br><sup>' + sub_title + '</sup>'

fig = go.Figure()
fig.add_trace(go.Scatter(x=close_m1.index, y=close_m1, name='M1: simple resampling'))
fig.add_trace(go.Scatter(x=close_m2.index, y=close_m2, name='M2: advanced resampling'))
fig.add_trace(go.Scatter(x=close_m3.index, y=close_m3, name='M3: from weekday/holiday lists'))
fig.add_trace(go.Scatter(x=close_m4.index, y=close_m4, name='M4: from stock exchange schedules' ))

fig.update_layout(template='plotly_dark', autosize=False, width=1200, height=600)
fig.update_layout(title=title, xaxis_title='Date', yaxis_title=f'close of {ticker}', legend_title='method')
fig.show()

Explanation:

the code is self-explaining and uses standard plotly functionalities

If we plot now all 4 time-series, we can see, if the methods deliver identical results. If no, we should see deviations (in case some dates differ)

This is the plot for all the 4 time-series:

Fig.3: close-price series for methods M1, M2, M3 and M4.

Not surprisingly, M1 returns slightly different curves than the others. Let’s skip the M1-curve:

Let’s explore the differences between the M3 and M2/M4 in 2022. By enlarging the plot with the plotly lens function, we can examine the details more closely:

Fig.5: Deviation around 3rd and 4th of October 2022

Indeed, the 3rd of October, which is German Unity Day and a public holiday in Germany, was a working day for the Frankfurt Stock Exchange. This is confirmed by the official trading calendar published on the Frankfurt Stock Exchange’s website.

Fig.6: Official schedule of working days from the Frankfurt stock exchange (FSE) homepage, confirming that German reunification day 2022 was a workday at the FSE

With this detail everything fits together, so methods M2, M3, M4 delivery consistent results.

Conclusion

Simply resampling a pandas data frame with daily OHLC data on a monthly basis may yield inaccurate values. Thus, more sophisticated methods must be employed. Depending on the problem’s nature, advanced resampling of the OHLC data-frame, or utilizing Python’s holiday module in conjunction with pandas’ date_range() function, or directly downloading and processing the stock exchange’s schedule might be effective solutions.

In our short study we showed the consistency of the proposed methods, using data of the German DAX-Index (starting January 1sth 2020) . one observed difference could be explained.

Resampling of Financial Time-Series — an important point to consider

Problem Statement

Method 2: advanced resampling

Method 3: Working with calendar functions

Method 4: Use of Schedules of Stock Exchanges

Results

Conclusion

Written by Tobi Lux