Alternative to pandas for algorithmic trading using Python

Jofre
4 min readMar 14, 2023

--

Introducing DataArrays to create and develop trading strategies

Photo by Maxim Hopman on Unsplash

Introduction

To develop trading strategies being able to manipulate time series data from financial instruments is a must. Being pandas the most extended package for data processing using Python, in this article we introduce the package Xarray as an alternative. We implement DataArray and show some advantages of using this structure for algorithmic trading.

Why Xarray?

Xarray is a package that allows us to store historical information from any financial instrument, or any time series, “with attributes and labels to create a more intuitive, more precise and less error-prone developer experience“ to treat and visualize data.

The package offers two primary data structures: DataArray and DataSets. In this article, we implement DataArrays to create some technical indicators.

Tutorial

For the demonstration, the package yfinance is used to get financial data and then create the DataArrays that will be used to calculate the indicators. First, we need to import the packages.

import yfinance as yf
import xarray as xr

Then, two years of stock prices from four companies and one index are downloaded.

df = yf.download(['GOOG','META', 'SPY', 'AAPL', 'MSFT'], period='2y')
df.head()
Image by Author. Stocks historical prices

As we can see, the column related to the date-time information is set as the dataframe index. The dataframe has two levels of information for each column, one related to the instrument ticker (Apple, Google, Meta, Microsoft, S&P500) and one to instrument price (Adj. Close, Close, Open, High, Low, Volumen). To get a proper transformation from a DataFrame to a DataArray we need to set the names of the two-level columns. The column level referring to tickers will be called asset and the column level referring to price will be called field.

# adding label to the columnas names
df.columns = df.columns.rename(["field", 'asset'])
df.head()
Image by Author. Setting columns names information

Now, using the function DataArray the dataframe is transformed. The function unstack separates the information in column levels into different coordinates in the new DataArray.

dxr = xr.DataArray(df)

# unstack data attributes
dxr = dxr.unstack("dim_1")
dxr
DataArray Transformation

With the applied transformation now we have a collection of arrays with useful information in the coordinates created. For this example, we get three coordinates: date, field, and asset. Let's inspect them.

Coordinates Information

Each of the coordinates represents information from the collections of arrays in the new variable. As we set a proper name for the different labels in the dataframe, this information is now available in a more user-friendly disposition. Still, the main gain we obtain is now this information is available for data indexing and manipulation.

Working with DataArrays

As an example, we will calculate MACD using a simple moving average on the Adj. Close price. So, how can specific data be selected from the DataArray?

clos_p = dxr.sel(field='Adj Close')
clos_p
Selecting Adj. Close

The function sel allows us to select a specific label for each of the coordinates on the DataArray, in this case, we selected the Adj . Close from the coordinate field, storing that data for all the tickers in the original DataArray in the new variable.

Now, how do we calculate a simple moving average?

sma12 = clos_p.rolling(Date= 12).mean()
sma12

Using the functions rolling and mean the simple moving average is calculated. In one sentence the functions are applied to all the arrays in the DataArray variable, which means, with just need one line of code to get the SMA for all assets.

Let's finish with the MACD calculation

sma26 = clos_p.rolling(Date= 26).mean()
macd = sma12 - sma26
macd

Finally, we get the MACD! Notice that when the difference is applied is only calculated for labels on the coordinate that match in both DataArrays, and not a combination of them.

Summary

This article shows DataArrays as an alternative to store time series data, how two calculate technical indicators, and how to manipulate information in the DataArrays using its coordinates.

--

--