【Application】TQuant Lab ARIMA-GARCH Strategy: Using Time Series Models for Stock Prediction and Trading Strategies

Photo by BoliviaInteligente on Unsplash
Photo by BoliviaInteligente on Unsplash

HighlightPhoto by BoliviaInteligente on Unsplash

  • Article Difficulty:★★★☆☆
  • Use the Custom Factor of Pipeline to write single root tests and the ARIMA-GARCH model.
  • Based on the predicted returns, generate trading signals to determine entry and exit points.
  • To assess risk and performance, write and backtest trading strategies on the TQuant Lab backtesting platform.

Preface

Accurate price prediction and effective trading strategies are essential to successful investments in financial markets. This article introduces a strategy for stock price prediction using the ADF test (Augmented Dickey-Fuller Test) and the ARIMA-GARCH model. The ADF test is used to assess the stationarity of a time series. At the same time, the ARIMA-GARCH model combines the Autoregressive Integrated Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models to capture the volatility of stock returns. This article details the implementation methods and logic of the strategy and discusses its potential in practical applications.

ARIMA-GARCH Strategy

First, a time series is a data structure that is ordered along a time axis, showing how historical data changes over time. Time series models, such as the ones we will discuss, use this data structure to analyze patterns and trends in the data. They then create models that fit these characteristics, enabling us to predict future movements with a certain degree of accuracy.

Let’s start by introducing the ADF test and the ARIMA-GARCH model, which will be used in the strategy.

ADF Test

The Augmented Dickey-Fuller (ADF) test is a statistical method used to test whether a time series has a unit root, indicating non-stationarity. If a unit root is present, it implies that the time series is non-stationary and exhibits random walk characteristics. The ADF test addresses autocorrelation issues by incorporating lagged difference terms, thus enhancing the test’s effectiveness. The primary null hypothesis of the test is that the time series has a unit root (non-stationary), while the alternative hypothesis is that the time series does not have a unit root (stationary). Performing the ADF test helps analysts determine whether differencing or other treatments are needed to achieve stationarity, making the series suitable for further statistical analysis or modeling.

ARIMA-GARCH Model

The ARIMA-GARCH model is a hybrid model that combines the ARIMA (Autoregressive Integrated Moving Average) and GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models for modeling and forecasting time series data. The ARIMA model handles the mean dynamics of the data, while the GARCH model captures the volatility of the data. This combination is particularly suitable for financial data as it can simultaneously describe the data’s mean and volatility clustering characteristics, thereby providing more accurate predictions and risk assessments.

Autoregressive Integrated Moving Average Model (ARIMA)

The ARIMA model is a fundamental time series model with three main parameters: Autoregression (AR), Differentiation (I), and Moving Average (MA).

  1. Autoregression (AR): This parameter determines how many past values from the series are used to predict the current or future values.
  2. Differencing (I): If the data shows a trend, differencing is used for data preprocessing. This parameter determines how many times differencing is needed.
  3. Moving Average (MA): This parameter determines how the deviations from the historical series average are used to predict the current or future values.

Generalized Autoregressive Conditional Heteroskedasticity Model (GARCH)

The GARCH model is used to analyze the error terms of time series data and is primarily used in finance to measure the volatility of assets or stock prices. In this context, the GARCH model will be used to examine the residuals of the ARIMA model and perform error correction. Unlike ARIMA’s AR and MA parameters, the GARCH model focuses on error terms and variance.

Initial buy operation

  • Sort based on the predicted returns and select the top 3 stocks with the highest expected returns to add to the current stocks list.
  • Transparent existing holdings and buy the three stocks with equal weight.

Continuous adjustment of the portfolio.

  • When evaluating the stocks in your portfolio, the return rate is a crucial factor. If the return rate is less than 0, it’s a clear signal to sell the stock and add it to the ban_list.
  • Remove the stocks already held and those in the ban_list from the prediction results and select new high-return stocks to maintain the portfolio at three stocks.
  • Recalculate the weights and buy the newly selected stocks with equal weight.

Editing Environment and Module Requirements

This article uses Windows 11 and VSCode as the editor. 。

import os
import numpy as np
import pandas as pd
import matplotlib
import warnings
warnings.filterwarnings("ignore")

# tej_key
tej_key = 'your key'
api_base = 'https://api.tej.com.tw'
os.environ['TEJAPI_KEY'] = tej_key
os.environ['TEJAPI_BASE'] = api_base
start='2021-01-01'
end='2023-12-29'

Select Stock Pool and Trading Data Import

The data period is from January 1, 2021, to December 29, 2023. We will select the top 10 largest electronic companies by market capitalization as our prediction stock pool and include the weighted return index IR0001 as a benchmark for comparison with the overall market.

os.environ['mdate'] = start + ' ' + end
os.environ['ticker'] = ' 2330 2317 2454 2412 2308 2303 3711 3045 2382 3008' + ' ' + 'IR0001'
!zipline ingest -b tquant

Create Pipeline function

Create Custom Factor function

The Custom Factor allows users to design the required custom factors. In this case, we use it to handle:

  • LogReturn、ADF test、ARIMA-GARCH Forecasting(Details can be found in the ARIMA-GARCH Strategy
  • The window length for the ADF test is set to 91 days, and for the ARIMA-GARCH forecast, it is set to 90 days.
  • The ADF test is conducted on log returns to check for stationarity.
  • Before performing the ARIMA-GARCH forecast, if the data is not stationary, a first-order differencing is applied. This helps preserve the original data characteristics while improving the model’s predictive accuracy.

Pipeline() allows users to process quantitative indicators and price-volume data for multiple targets quickly. In this case, we use it to handle:

  • Importing stock prices and financial data.
  • Calculating the actual log returns of stocks.
  • Predicting log returns using the ARIMA-GARCH model.

Here’s an excerpt from the Pipeline content:

from zipline.pipeline import Pipeline, CustomFactor
from zipline.pipeline.data import TWEquityPricing
from zipline.TQresearch.tej_pipeline import run_pipeline
from zipline import run_algorithm
from zipline.pipeline import CustomFactor, Pipeline
from zipline.pipeline.filters import StaticAssets
from zipline.api import *

from zipline.data import bundles
bundle_data = bundles.load('tquant')
asset_finder = bundle_data.asset_finder
benchmark_asset = asset_finder.lookup_symbol('IR0001', as_of_date=None)
def make_pipeline():
return Pipeline(
columns={
'open': TWEquityPricing.open.latest,
'close': TWEquityPricing.close.latest,
'log_returns':LogReturns(),
'fc_log_returns': ARIMA_GARCH_Forecast()
},
screen=~StaticAssets([benchmark_asset]) # 排除大盤的數據
)
pipeline = make_pipeline()
result = run_pipeline(pipeline, start, end)
result
Excerpt of the Pipeline output

Create initialize function

inintialize() The function is used to define the daily trading environment before the start of trading. In this example, we set up the following:

  • Slippage cost
  • Trading fees
  • Weighted return index (IR0001) as the benchmark
  • Strategy pipeline: will read the fc_log_return column from the above Pipeline to determine trades
  • Set the context.current_stocks variable to record the held stocks
  • Set the context.ban_list variable to record stocks that cannot be bought
from zipline.finance import slippage, commission
from zipline.api import set_slippage, set_commission, set_benchmark, attach_pipeline, order_target_percent, symbol, pipeline_output, record, get_datetime, schedule_function, date_rules, time_rules

def initialize(context):
context.current_stocks = []
context.ban_list = set()
set_slippage(slippage.VolumeShareSlippage())
set_commission(commission.PerShare(cost = 0.001425 + 0.003 / 2))
attach_pipeline(make_pipeline(), 'mystrats')
set_benchmark(symbol('IR0001'))

Create handle_data function

handle_data() is an important function for building a trading strategy. It will be called daily after the backtest begins. The main tasks include setting up the trading strategy, placing orders, and recording trading information.

For detailed trading rules of this strategy, please visit : ARIMA-GARCH.ipynb

Here’s a portion of the code:

def handle_data(context, data):
out_dir = pipeline_output('mystrats')
# 移除 NaN 值
out_dir = out_dir.dropna(subset=['fc_log_returns'])

if not context.current_stocks:
buy_candidates = out_dir.sort_values(by='fc_log_returns', ascending=False).head(3)

if buy_candidates.empty:
print("No stocks selected for initial buying.")
return
current_date = get_datetime().strftime('%Y-%m-%d')
print(f"Initial Rebalance on {current_date}")
print(f"Initial Buy Candidates:\n{buy_candidates}")
buy_weight = 1.0 / len(buy_candidates)
for stock in context.portfolio.positions:
order_target_percent(stock, 0)
for i in buy_candidates.index:
sym = i.symbol
close = out_dir.loc[i, "close"]
fc_log_returns = out_dir.loc[i, 'fc_log_returns']

print(f"Buying {sym} with weight {buy_weight:.2f}")
order_target_percent(i, buy_weight)
record(
**{
f'price_{sym}': close,
f'fc_log_return_{sym}': fc_log_returns,
f'buy_{sym}': True
}
)
context.current_stocks = buy_candidates.index.tolist()

Create analyze function

This is mainly used to visualize strategy performance and risk after backtesting. Here, we use Matplotlib to plot the portfolio value and benchmark value over time.

import matplotlib.pyplot as plt

capital_base = 100000 # 設定初始資金
def analyze(context, results):
plt.style.use('ggplot')
fig = plt.figure()
ax1 = fig.add_subplot(111)
results['benchmark_cum'] = results.benchmark_return.add(1).cumprod() * capital_base
results[['portfolio_value', 'benchmark_cum']].plot(ax = ax1, label = 'Portfolio Value($)')
ax1.set_ylabel('Portfolio value (TWD)')
plt.legend(loc = 'upper left')
plt.gcf().set_size_inches(18, 8)
plt.grid()
plt.show()

Backtest the ARIMA-GARCH Strategy

Use run_algorithm() to execute the above-configured ARIMA-GARCH strategy, setting the trading period from January 1, 2022, to December 29, 2023, with the dataset tquant and an initial capital of $100,000. The output results will contain the daily performance and transaction details.

import pytz
start = pd.Timestamp('2022-01-01', tz=pytz.UTC)
end = pd.Timestamp('2023-12-29', tz=pytz.UTC)

results = run_algorithm(
start=start,
end=end,
initialize=initialize,
handle_data=handle_data,
analyze=analyze,
capital_base=100000,
data_frequency='daily',
bundle='tquant'
)
Portfolio Value Comparison Chart
Partial Transaction Details Table

Performance Evaluation Using Pyfolio

from pyfolio.utils import extract_rets_pos_txn_from_zipline
import pyfolio as pf
# 從 results 資料表中取出 returns, positions & transactions
returns, positions, transactions = extract_rets_pos_txn_from_zipline(results) # 從 results 資料表中取出 returns, positions & transactions
benchmark_rets = results.benchmark_return # 取出 benchmark 的報酬率

# 繪製 Pyfolio 中提供的所有圖表
pf.tears.create_full_tear_sheet(returns=returns,
positions=positions,
transactions=transactions,
benchmark_rets=benchmark_rets
)
Backtest Performance vs. Benchmark Comparison Chart

The above table shows that the ARIMA-GARCH strategy achieved an annualized return of 33.856% over two years, with an annualized volatility of approximately 25.573%. Additionally, the Sharpe ratio is 1.27, and the alpha value is 0.3, indicating that the ARIMA-GARCH strategy can generate substantial excess returns for investors under relatively controlled risk.
Thanks to its rolling method for calculating the return rate, the ARIMA-GARCH strategy demonstrates a remarkable adaptability to diverse market conditions. This, coupled with its dynamic stock selection process, enables the strategy to identify optimal entry points across various market phases. As evidenced in the performance comparison chart, the ARIMA-GARCH strategy’s superior performance in the bull market after 2023 underscores its robustness and predictive prowess, instilling a sense of confidence in its resilience.

Rolling Sharpe Ratio

In the Rolling Sharpe ratio chart, this strategy only started to perform exceptionally well from 2023 onwards. This may be due to the overall market being bearish in 2022. Despite this, the strategy’s performance in 2022 still outperformed the benchmark.

Annual Return

From the annual return chart, we can also observe that most of the returns came from 2023, completely offsetting the losses from 2022.

Conclusion

During the backtesting period, this strategy demonstrated its unique trading logic and dynamic adjustment mechanism. By combining the ADF test and the ARIMA-GARCH model for predicting stock returns, the strategy can accurately select high-potential stocks for investment. The dynamic adjustment mechanism allows for periodic checking and rebalancing of holdings to adapt to market changes and manage risk. This approach effectively captures investment opportunities amid market volatility while maintaining robust risk management.
Overall, the strategy showcases its application value and potential in the market through its volatility model for time series and flexible adjustment mechanism. In the future, incorporating additional models or adjusting model settings could further optimize the strategy and enhance its accuracy and overall performance.

This strategy and the referenced targets are for informational purposes only and do not constitute financial or investment advice. In future discussions, we will introduce how to use the TEJ database to construct various indicators and backtest their performance. Therefore, readers interested in multiple trading backtests are welcome to consider purchasing relevant plans from TQuant Lab to build trading strategies tailored to their needs using a high-quality database.

Source Code

Further Reading

Relevant Links

--

--

TEJ 台灣經濟新報
TEJ-API Financial Data Analysis

TEJ 為台灣本土第一大財經資訊公司,成立於 1990 年,提供金融市場基本分析所需資訊,以及信用風險、法遵科技、資產評價、量化分析及 ESG 等解決方案及顧問服務。鑒於財務金融領域日趨多元與複雜,TEJ 結合實務與學術界的精英人才,致力於開發機器學習、人工智慧 AI 及自然語言處理 NLP 等新技術,持續提供創新服務