Implement a stock trading strategy with Python: data exploration and code walkthrough

Published in

Analytics Vidhya

6 min readJun 10, 2020

In this tutorial we want to explore stock data and how to analyze the performance of a custom trading system. This is not a suggestion to invest your money in stocks but rather to understand how to implement such a strategy to better understand the performance you could have reached (both positive and negative). We will import from yahoo the stock prices for some companies, explore the data and implement a simple trading system based on SMA (simple moving average) and the tether line.

Data Preparation

The first step is to import our data. We will use pandas_reader to import financial data from yahoo. In this tutorial we will explore two companies with different stock price behaviour. We start from january 2018 till end of may 2020.

# Define the tickers. We would like to see Apple and Tesla.
tickers = ['AAPL', 'TSLA']
# define starting date from '01/01/2018' until '30/05/2020'.
start_date = '01/01/2018'
end_date = '30/05/2020'
# User pandas_reader.data.DataReader to load the desired data.
data = data.DataReader(tickers, 'yahoo', start_date, end_date)
close = data['Close']
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
close = close.reindex(all_weekdays)
close = close.fillna(method='ffill').dropna()

Define functions and strategy

The simple medium average “SMA” is a basic average of price over the specified timeframe. Periods of 50, 100, and 200 are common to gauge longer-term trends in the market. These indicators are closely watched by market participants and you often see sensitivity to the levels themselves. A decisive break of a well-followed moving average is often attributed importance by technical analysts.

A crossover of a “fast” SMA above or below a “slow SMA” may also denote an official change in trend. In the context of 50–200 period moving averages, the 50-period would be considered fast as it’s more responsive to price. The 200-period is slow, as it’s less responsive. The 100-period would be considered slow relative to the 50-period but fast relative to the 200-period. In this example we will use a “fast” SMA of 14 days.

The second concept we will use is the tether line. It was named this way because stock prices have a tendency to cluster around it. It means that stock prices tend to move away from the midpoint between their 50-day highs and lows, then return to that midpoint at some time in the future. On a chart, it appears as though the stock price is tethered to this line, and hence the name.

So we define our functions using the .rolling(window) function of pandas. For the tether line we can define a new function which will do its job.

In the code below we will also calculate wether the SMA14 is above the tether line, the daily return of the stock and a column called TTH_diff: this is the indicator that will be used to decide wether to buy or sell.

def tether(x):
    return np.round((x.max() + x.min())/2,0)for name in tickers:
    close[name+'_SMA14'] = close[name].rolling(window=14,min_periods=1).mean()
    close[name+'_THT'] = close[name].rolling(window=40).apply(tether, raw=True)
    close['SMA14>Tether_'+name] = close.apply(lambda x: 1 if x[name+'_SMA14']>=x[name+'_THT'] else 0, 1)
    close[name+'_Return'] = close[name].pct_change()
    close[name+'_TTH_diff'] = close['SMA14>Tether_'+name].diff()

The plots for AAPL and TSLA are shown below: the blue line is the stock closing price. The green line is the SMA with 14 days window and the tether line is the red line. We can see from the AAPL graph that the overall trend is upward from early 2018 with some downward periods.

The TSLA ticker is different: due to the sharp increase in the stock prince in january the previous data looks like it’s not really exciting.

Both stocks show the coronavirus effect of march and we’re interested if we can avoid losing too much money in it with this strategy.

TSLA

let’s start with TSLA, for this ticker we define buy_order and sell_order masks which will be useful to calculate returns. To create those we used the previously defined TTH_diff column. We want to buy when it’s greater than 0 (SMA > tether line) and sell when it’s less than 0 (SMA<tether line).

We calculate a column “tot_return” which represent our baseline: what if I invested all the money in the beginning and just forget it?

Then we create a column for each buy_order point, and for each we calculate the return.

tickers=['TSLA']
for tick in tickers:
    df_tick = close.filter(like=tick).copy()
    buy_order = df_tick[tick+'_TTH_diff']>0
    sell_order = df_tick[tick+'_TTH_diff']<0
    df_tick['tot_return'] = df_tick[tick+'_Return'].cumsum()
    for i in range(len(aapl_s[buy_order])):
        df_tick['Returns_'+str(i)] = (df_tick.loc[df_tick[buy_order].index[i]:,tick+'_Return']).cumsum()

plotting the TTH_diff column with a line makes it clear the entry and exit points of our strategy. We see there’s a lot of jumpyness in 2018. But we can already see some shortcomings: when the price is increasing from october 2019 the strong oscillations gave us some buy_orders just after the sell_orders.

Now we evaluate our investment like this: since each returns is calculated from the buy_order upward, if we plot a heatmap with our dataframe using the sell_order as a mask we can extract from the diagonal the resulted return for our strategy. On the left we have the sell date.

Heatmap with dataframe returns at sell points.

We can plot those returns in a more understandable way:

On the left the single returns are plotted against the sell date. We can see that our strategy started to work from mid 2019, but we did not avoid the major fall of march 2020. How did we compare to the simplest strategy: buy and forget about it?

We can now plot the total return both for the simplest strategy: green line, and our strategy: blu line. Well in this case that would have been better but it did not go that bad either.

What happened if we invested a fixed amount and always sold and bought all we invested?

investment = 1000
returns=[]
for col in df_tick[sell_order].filter(like='Returns').columns:
    try:
        sell_return = df_tick[sell_order].filter(like='Returns')[col].dropna()[0]
        investment = investment*(1+sell_return)
        returns.append(investment)
        print(col,sell_return,investment)
    except:
        print(col,0)
        #returns.append(investment)
print(investment/1000-1)

In the graph below we see that our strategy did not pay at all up until late 2019 where we had the boom of the stock price. The major fall of march did not completely crunched our earnings and as for the last day of trading (29–05–2020) our last buy order was +12%.

APPLE

We can use the same code to calculate our returns for Apple. Let’s just check the final graphs. Again on the left there’s the return for the chunk of time of our strategy while on the right there’s the comparison with the actual cumulative return for both strategy. We can see that sometimes our strategy overperform the general trend, we are actually able to avoid major falls but also major upward trends.

Conclusions

We analyzed and implemented a simple trading strategy with python, and also some code to evaluate the returns. The results are good only when there is a clear trend in the stock price. The behaviour of TSLA in 2018–2019 suggests that in such situation the strategy is not effective.

You can find the source code on github