【Quant】What is the Look-ahead Bias ?

TEJ 台灣經濟新報
TEJ-API Financial Data Analysis
8 min readJul 18, 2023

Reveal the Effect of the Look-ahead Bias to Trading Strategy by Bollinger Band

Photo by Emile Guillemot on Unsplash

Highlight

  • Difficulty:★★★☆☆
  • Automated trading via Bollinger Band
  • Demonstrate how the look-ahead bias affects the trading result

Preface

Look-ahead bias is the phenomenon that unconsciously uses unavailable or unrevealed data in analyzing or simulating historical events.
It exists in the processes of making decisions or evaluations which use information or data that was unknown at that time.
Look-ahead bias may cause distortion and misleadingness of the result because it violates the principle of using only information available during analysis. It could emerge in any field, such as finance, economy, and data analysis, and influence investment strategy, backtesting of the trading system, and performance grading.

Today’s practice will demonstrate a typical scenario of Look-ahead bias ━ using historical data, which contains the info of future variety in testing trading strategy. However, the info is unrevealed during the period of testing. This phenomenon may cause man-made overestimated performance and unrealistic expectations of the strategy’s profitability.

Programming environment and Module required

MacOS and Jupyter Notebook is used as editor

import pandas as pd 
import re
import numpy as np
import tejapi
from functools import reduce
import matplotlib.pyplot as plt
from collections import defaultdict, OrderedDict
from tqdm import trange, tqdm
import plotly.express as px
import plotly.graph_objects as go
tejapi.ApiConfig.api_key = "Your api key"
tejapi.ApiConfig.ignoretz = True

Database

  • Listed (OTC) adjusted stock price (day) — average price:TWN/AAPRCDA
  • unadjusted (day) technical indicator:TWN/AVIEW1

Import data

For the period from 2021–06–01 to 2022–12–31, we take YangMing Marine Transport Corporation(2609) as an example, we will use unadjusted closed price、BB-Upper(20)、BB-Lower(20) to construct the Bollinger Band, and then we will compare the return with Market Return Index(Y9997)

stock_id = "6547"
gte, lte = '2020-01-01', '2023-06-30'
stock = tejapi.get('TWN/APRCD',
paginate = True,
coid = stock_id,
mdate = {'gte':gte, 'lte':lte},
opts = {
'columns':[ 'mdate', 'open_d', 'high_d', 'low_d', 'close_d', 'volume']
}
)
ta = tejapi.get('TWN/AVIEW1',
paginate = True,
coid = stock_id,
mdate = {'gte':gte, 'lte':lte},
opts = {
'columns':[ 'mdate', 'bbu20', 'bbma20', 'bbl20']
}
)
market = tejapi.get('TWN/APRCD',
paginate = True,
coid = "Y9997",
mdate = {'gte':gte, 'lte':lte},
opts = {
'columns':[ 'mdate', 'close_d', 'volume']
}
)
data = stock.merge(ta, on = ['mdate'])
market.columns = ['mdate', 'close_m', 'volume_m']
data = data.set_index('mdate')

After acquiring the stock price and technical indicators data, as in the previous article, we use plotly.express to visualize our Bollinger Band. In the diagram, bbu20 will be the upper track 、bbl20 will be the lower track, and close_d will be the closed price.

fig = px.line(data,   
x=data.index,
y=["close_d","bbu20","bbl20"],
color_discrete_sequence = px.colors.qualitative.Vivid
)
fig.show()
YangMing Marine Transport Corporation(2609) Bollinger Band

Next, we will implement two Bollinger Band trading strategies and compare their differences.

  1. Same as the previous article, when the closed price touches the upper track, we will sell our holding position at tomorrow’s open price; when the closed price touch the lower track, we will buy 1 unit at tomorrow’s opened price; when the conditions above are already satisfied, we remain adequate principal, both the holding position and the closed price are lower than last time buying price, we will buy one more unit.
  2. when the closed price touches the upper track, we will sell our holding position at today’s closed price; when the closed price touch the lower track, we will buy 1 unit at today’s closed price; when the conditions above are already satisfied, we remain adequate principal, both the holding position and the closed price are lower than last time buying price, we will buy one more unit.

In light of the only difference in strategies is the transaction’s unit price, we modify our strategy code in the previous article. We define our strategy in def bollingeband_strategy, add an if condition, and set a parameter — mode to control which strategy we want to execute. When mode is True, execute the strategy 1; when mode is False, run the strategy 2.

def bollingeband_strategy(data, principal, cash, position, order_unit, mode):
trade_book = pd.DataFrame()

for i in range(data.shape[0] -2):
        cu_time = data.index[i]
cu_close = data.loc[cu_time, 'close_d']
cu_bbl, cu_bbu = data.loc[cu_time, 'bbl20'], data.loc[cu_time, 'bbu20']

if mode:
n_time = data.index[i + 1]
n_open = data['open_d'][i + 1]
else:
n_time = data.index[i]
n_open = data['close_d'][i]
if position == 0: #進場條件
if cu_close <= cu_bbl and cash >= n_open*1000:
position += 1
order_time = n_time
order_price = n_open
order_unit = 1
friction_cost = (20 if order_price*1000*0.001425 < 20 else order_price*1000*0.001425)
total_cost = -1 * order_price * 1000 - friction_cost
cash += total_cost
trade_book = pd.concat([trade_book,
pd.DataFrame([stock_id, 'Buy', order_time, 0, total_cost, order_unit, position, cash])],
ignore_index = True, axis=1)
elif position > 0:
if cu_close >= cu_bbu: # 出場條件
order_unit = position
position = 0
cover_time = n_time
cover_price = n_open
friction_cost = (20 if cover_price*order_unit*1000*0.001425 < 20 else cover_price*order_unit*1000*0.001425) + cover_price*order_unit*1000*0.003
total_cost = cover_price*order_unit*1000-friction_cost
cash += total_cost
trade_book = pd.concat([trade_book,
pd.DataFrame([stock_id, 'Sell', 0, cover_time, total_cost, -1*order_unit, position, cash])],
ignore_index = True, axis=1)
elif cu_close <= cu_bbl and cu_close <= order_price and cash >= n_open*1000: #加碼條件: 碰到下界,比過去買入價格貴
order_unit = 1
order_time = n_time
order_price = n_open
position += 1
friction_cost = (20 if order_price*1000*0.001425 < 20 else order_price*1000*0.001425)
total_cost = -1 * order_price * 1000 - friction_cost
cash += total_cost
trade_book = pd.concat([trade_book,
pd.DataFrame([stock_id, 'Buy', order_time, 0, total_cost, order_unit, position, cash])],
ignore_index = True, axis=1)
if position > 0: # 最後一天平倉
order_unit = position
position = 0
cover_price = data['open_d'][-1]
cover_time = data.index[-1]
friction_cost = (20 if cover_price*order_unit*1000*0.001425 < 20 else cover_price*order_unit*1000*0.001425) + cover_price*order_unit*1000*0.003
cash += cover_price*order_unit*1000-friction_cost
trade_book = pd.concat([trade_book,
pd.DataFrame([stock_id, 'Sell',0, cover_time, cover_price*order_unit*1000-friction_cost, -1*order_unit, position, cash])],
ignore_index=True, axis=1)
trade_book = trade_book.T
trade_book.columns = ['Coid', 'BuyOrSell', 'BuyTime', 'SellTime', 'CashFlow','TradeUnit', 'HoldingPosition', 'CashValue']

return trade_book

Now we build an automated trading strategy. The return of the def bollingeband_strategy is a transaction table that could let us understand each transaction’s details. Further, we define a def simplify to summarize all info for better readability.

def simplify(trade_book):
trade_book_ = trade_book.copy()
trade_book_['mdate'] = [trade_book.BuyTime[i] if trade_book.BuyTime[i] != 0 else trade_book.SellTime[i] for i in trade_book.index]
trade_book_ = trade_book_.loc[:, ['BuyOrSell', 'CashFlow', 'TradeUnit', 'HoldingPosition', 'CashValue' ,'mdate']]
return trade_book_

The final step is calculating both strategies’ performance. Basically, the code in this part is similar to the previous code. However, because of the upgrade of the pandas version, the latest version no longer supports the function append; we made a slight modification to make the code can keep working and wrap it in def back_test.

def back_test(principal, trade_book_, data, market):
cash = principal
data_ = data.copy()
data_ = data_.merge(trade_book_, on = 'mdate', how = 'outer').set_index('mdate')
data_ = data_.merge(market, on = 'mdate', how = 'inner').set_index('mdate')
# fillna after merge
data_['CashValue'].fillna(method = 'ffill', inplace=True)
data_['CashValue'].fillna(cash, inplace = True)
data_['TradeUnit'].fillna(0, inplace = True)
data_['HoldingPosition'] = data_['TradeUnit'].cumsum()
# Calc strategy value and return
data_["StockValue"] = [data_['open_d'][i] * data_['HoldingPosition'][i] *1000 for i in range(len(data_.index))]
data_['TotalValue'] = data_['CashValue'] + data_['StockValue']
data_['DailyValueChange'] = np.log(data_['TotalValue']) - np.log(data_['TotalValue']).shift(1)
data_['AccDailyReturn'] = (data_['TotalValue']/cash - 1) *100
# Calc BuyHold return
data_['AccBHReturn'] = (data_['open_d']/data_['open_d'][0] -1) * 100
# Calc market return
data_['AccMarketReturn'] = (data_['close_m'] / data_['close_m'][0] - 1) *100
# Calc numerical output
overallreturn = round((data_['TotalValue'][-1] / cash - 1) *100, 4) # 總績效
num_buy, num_sell = len([i for i in data_.BuyOrSell if i == "Buy"]), len([i for i in data_.BuyOrSell if i == "Sell"]) # 買入次數與賣出次數
num_trade = num_buy #交易次數
avg_hold_period, avg_return = [], []
tmp_period, tmp_return = [], []
for i in range(len(trade_book_['mdate'])):
if trade_book_['BuyOrSell'][i] == 'Buy':
tmp_period.append(trade_book_["mdate"][i])
tmp_return.append(trade_book_['CashFlow'][i])
else:
sell_date = trade_book_["mdate"][i]
sell_price = trade_book_['CashFlow'][i] / len(tmp_return)
avg_hold_period += [sell_date - j for j in tmp_period]
avg_return += [ abs(sell_price/j) -1 for j in tmp_return]
tmp_period, tmp_return = [], []
avg_hold_period_, avg_return_ = np.mean(avg_hold_period), round(np.mean(avg_return) * 100,4) #平均持有期間,平均報酬
max_win, max_loss = round(max(avg_return)*100, 4) , round(min(avg_return)*100, 4) # 最大獲利報酬,最大損失報酬
winning_rate = round(len([i for i in avg_return if i > 0]) / len(avg_return) *100, 4)#勝率
min_cash = round(min(data_['CashValue']),4) #最小現金持有量
print('總績效:', overallreturn, '%')
print('交易次數:', num_trade, '次')
print('買入次數:', num_buy, '次')
print('賣出次數:', num_sell, '次')
print('平均交易報酬:', avg_return_, '%')
print('平均持有期間:', avg_hold_period_ )
print('勝率:', winning_rate, '%' )
print('最大獲利交易報酬:', max_win, '%')
print('最大損失交易報酬:', max_loss, '%')
print('最低現金持有量:', min_cash)

So far, we already finished the whole coding process, then we can compare both strategies’ performance with real data.

  1. Trade at the open price of tomorrow
principal = 500000
cash = principal
position = 0
order_unit = 0
trade_book = bollingeband_strategy(data, principal, cash, position, order_unit, True)
trade_book_ = simplify(trade_book)
back_test(principal, trade_book_, data, market)
Trade at the open price of tomorrow

2. Trade at the closed price of today

principal = 500000
cash = principal
position = 0
order_unit = 0
trade_book_cu_close = bollingeband_strategy(data, principal, cash, position, order_unit, False)
trade_book_cu_close_ = simplify(trade_book_cu_close)
back_test(principal, trade_book_cu_close_, data, market)
Trade at the closed price of today

By observing the results of two trading strategies, it can be noticed that trading based on the day’s closing price yields better overall performance. However, novice stock market participants may mistakenly use the historical backtesting data and assume the closing price as the trading price, disregarding the fact that it is impossible to know the closing price in advance in the actual market. Using information that is not known when trading for backtesting constitutes a “look-ahead bias,” resulting in discrepancies in the backtesting results. Therefore, it is advisable to use the next day’s opening price as the trading price to reflect the most accurate trading conditions.

Conclusion

Through this implementation of simple trading backtest, we have demonstrated the presence of the look-ahead bias in backtesting, which is not limited to trading alone but is a common occurrence in the field of finance. In order to avoid the look-ahead bias, it is crucial to ensure that historical analysis or decision-making processes are based solely on the information available at that time. This requires using historical data in a manner consistent with what was known in the past, excluding any subsequent information that was not available at the time. Being aware of the look-ahead bias and handling the data with caution is essential for maintaining the integrity and accuracy of statistical analysis and decision-making processes.

Last but not least, please note that “Stocks this article mentions are just for the discussion, please do not consider it to be any recommendations or suggestions for investment or products.” Hence, if you are interested in issues like Creating Trading Strategy , Performance Backtesting , Evidence-based research , welcome to purchase the plans offered in TEJ E Shop and use the well-complete database to create your own optimal trading strategy.

Source Code

Extended Reading

Related Link

You could give us encouragement by …
We will share financial database applications every week.
If you think today’s article is good, you can click on the
applause icon once.
If you think it is awesome, you can hold the
applause icon until 50 times.
Any feedback is welcome, please feel free to leave a comment below.

--

--

TEJ 台灣經濟新報
TEJ-API Financial Data Analysis

TEJ 為台灣本土第一大財經資訊公司,成立於 1990 年,提供金融市場基本分析所需資訊,以及信用風險、法遵科技、資產評價、量化分析及 ESG 等解決方案及顧問服務。鑒於財務金融領域日趨多元與複雜,TEJ 結合實務與學術界的精英人才,致力於開發機器學習、人工智慧 AI 及自然語言處理 NLP 等新技術,持續提供創新服務