Becoming a Trader Data Scientist: Transforming Bollinger Bands (Part 4)

At the end of the article, you will understand how to grab a business concept from scratch and materialize it into a variable for a machine learning model with an applied case.

Mauricio Letelier

Published in

Coinmonks

9 min readApr 12, 2020

Data Science Project: Cryptocurrencies Part 1—Motivation —

Data Science Project: Cryptocurrencies Part 2 — Volume and Data Source —

Data Science Project: Cryptocurrencies Part 3 — Becoming a Trader Data Scientist —

Intro

Today I’m going to introduce you to one of the variables that we will be using in our models. I will describe to you the whole process from retrieving a trading concept to, in the end, create an insightful variable. You will understand how to transform your business insights into real machine learning material.

Our data

The horizon that I would like to evaluate is 5-minute data aggregation. This means that I will collect data for 5 minutes intervals, with the standard OHLCV information. I met a little problem with Cryptocompare. I just find out that for gathering more than a week for minute data, we need an enterprise account. I love surprises!.

So to solve this, we will be using cbpro, that we used in our first chapter. If you already read it, you know that we were facing the trouble of the 300 maximum data points. To fix that problem, I developed this method, when I first try using it, I had an unspecific error (the story of my life).

I had no idea what could be, but after trying a couple of useless things, the lightbulb turned on! Maybe there were too many requests, so I added a time.sleep(0.5) that helped to set some time between every request. Being as modest as I am: the method worked delightfully. Here it is in case you want to improve it. I took the data for the last 3 years of our beloved ETH-USD.

Finding the concept

Our first step is to do some quick research about which technical indicators are mostly used by traders. Ok, simple Google search: “Best technical indicators for intraday trading.” And I found in almost every article: MACD, Bollinger Bands®, RSI, and VWAP.

Proof that I actually clicked the articles. Search by Google.com.

Wait, I just realize that I never even told you what intraday means. Do you remember that we need time horizons? If you are looking to hold your position within hours or minutes, you probably are doing intraday trading. Besides, the shorter the time period that you want to maintain a position, the shorter the data granularity that you should use.

This concept applies better in the regular stock market because the markets actually close during weekends and nights. The goal of day traders is not to hold any position during the closing market periods. In our case, we have the advantage (or pain) that the markets don’t close, so, for us intraday will just mean that we don’t want to hold positions for days.

Creating the features

Ok, we are all set! Our next step is constructing these indicators in our dataframe. These technical indicators are the business wisdom, so we will analyze how to use them to generate our variables. At first, I was considering making them on my own, as we did the last weeks with MA and the golden cross in the last chapter (beautiful times).

But this great package came to the rescue. I don’t want to imagine the tons of work that could have been making all of them on my own. TA package has a method to construct all these indicators and attach them to the dataframe. The great thing about it is that all the parameters that we need to choose have their own default value, meaning that we don’t need to think almost anything to create them.

The other great thing, it’s that the parameters can be changed, so in case we are eager to test different parameters for our analysis, we can do it either.

Ok, the parameters themselves are a significant step forward in our analysis. Still, we cannot take them raw as features. We need to understand it’s interpretation to get the most of them. But how? As usual, I’m going to explain you with an example.

Bollinger Bands®

I know there’s a lot of articles describing and creating Bollinger Bands®. And precisely because of that we are going to use them for our example. The descriptions are generic and rarely materialized into a variable that could fit a machine learning model. They focus more on the observable things, and that’s not bad for a trader, but we need a little more.

Our goal is to transform trading concepts into variables that can fit in machine learning models. So, the example of Bollinger Bands® is my way of explaining to you a little bit the feature engineering based on business concepts.

As a quick definition, Bollinger Bands® uses the volatility of the last periods, described as standard deviations, and with the help of a simple moving average, set some kind of limits where the price should be. Here you can see an example.

Wou! Your confusion seems to have crossed the space-time barriers. I’m listening to your questions all the way from here (the past): “What the cyan and gold points mean?” “Why there are colored areas?” “What is the Lower2 legend?” Resonating on my ears. So to put your brain back inside your head, and to close the whole mess you made in the space-time, I will answer you all of those questions.

Gaining Insights!

First, a little about the concepts. In the chart, we see the legend of Higher and Lower with a number attached, that number is the standard deviations used, a lower standard deviation is closer to the SMA (the usual is to use 20 periods for SMA and two standard deviations)

Why is this important? Because, if the close price is in the blue filled area, is an indicator that the uptrend will continue, on the other side, if the price is in the red filled area, the downtrend will continue. The zones inside the bands and no colored are neutral zones. In those, there’s no strong momentum or trend.

Let’s zoom (nasty word in these days) into a downtrend.

Here, the points also make more sense. The points marked with cyan or gold are in which the close price goes beyond the bands. It looks like in the downtrend, the close price oscillates between being in the red filled zone, and escaping a little bit. It also seems to try to go back with more strength, the farthest the point it is from the bands.

Let’s look at an uptrend.

It is the same but on the opposite side! This is great because now we can start thinking about our variable.

Transformation and Exploration!

Now is when the artistic part begins, where you can try the craziest functions if they replicate the behavior of the variable. The concepts suggest that when the close price is in the filled areas will go up in the blue one, and will go down in the red one.

To try this out, we are going to use a simple dummy approach (1 if the close price is higher than “Higher1” band, 0 on the other case). Created the variable — that we will name D_BB_uptrend — , let’s make our first analysis.

The exciting part of creating this variable is understanding that we will try to predict an uptrend. We are assuming that the behavior of the crypto is the same independent of the zone inside this filled band. For example, if it’s nearest to the first band, the variable will act the same as if the nearest was the second band.

I made a method that determines what happens first: a 1% increase or a 1% decrease. The logic is the following: the method looks for the next 12 time periods (for 5-minute data, 1 hour) and checks if the price trespass our defined threshold. It also couldn’t be an outcome, because the price might never close outside that bounds. Below you can see what happens if we analyze all 2019. Only 20% of the cases had an outcome.

Shocking! If the price crosses the 1% barrier soon, we have incredible odds! But this power seems to lose effect when the time distance increases.

If we create our second dummy — D_BB_Downtrend — defined by one, if the close price is lower than “Lower1” band, and 0 on another case. We should expect that the -1 outcomes increase. So, let’s see.

Wou! The same pattern, but with the odds calling for a negative outcome!

It looks like the variable can separate positive or negative outcomes depending on in which place in the band it’s located. It means we already have two variables created from the concept of Bollinger Bands®, and they are looking good.

Testing Ideas

Suppose now that we look at our charts, and we came up with the firm belief that, if the close price is nearest to the second band, the force of the predictor will increase. We can’t know if that feeling is right without testing, so let’s try that. Now we will define the Distance_From_H2 variable, which will be 0 starting on the Higher1 band, to 1 in the Higher2 Band. The logic is as described below. If the close price is on the top, the value would be near 1, and if it is in the bottom near 0.

We can define a success rate, based on the probability of the cases in which our outcome was positive or negative. Let’s see how the success rate moves along ten bins of the variable.

Wou! We were wrong, there is no reliable evidence that the higher the distance, the higher the success rate. But in our way to test that idea, we learn something! At the chart, the values between [0.3,0.6] have the best success rates. That tells us that it is crucial to define bins for our variable and not to give to the model just a continuous number.

Try it yourself!

The funny part of this I will leave it to you. You already know the process. Do you remember that we saw that if the point was too far from the second band, it seems to try to come back? Well, you can test that idea and add, for example, a variable “Upper distance from the band.”

And you will have to figure it out if the distance is correlated with a reversal of the price testing it and seeing charts. Maybe there’s a threshold: for example, when the close price is +/- 2% away from the bands may start to coming back, and you will have to adapt your variable to it.

That’s only one example, but you can see a lot of other things going on in the charts and with your business problems. I want to leave you the “how I trespass this concept to a number” mindset. If you achieve that while dealing with your machine learning problems, your model performance will skyrocket.

That’s all for today folks! We made another huge step into creating our model, business feature engineering, as I love to name this process. Don’t get impatient. In 14 from now, we will continue this journey!