Generating meaningfull Bitcoin data for our RL agent to learn

Mathieu Cesbron
3 min readDec 12, 2019

--

We will use Binance data to generate our data that we will customize for our need. All you need is a Binance account, you can create one by clicking here. It is free and one of best trading platform to find cryptocurrency data.

All of the code for this article will be available on my GitHub.

Steps for generating data:

  • Add our Binance API keys
  • Pulling data from Binance
  • Standardize our data as we wish
  • Save it in a csv file

Add our Binance API keys

After cloning the repository, we will need to add our binance keys. We can generate them on the Binance site. Then we just have to add them to keys.py.

apiKey = “”secretKey = “”

After the adding:

apiKey = “glebfxHdBtDR3RWt3hVRtHmvQaYduT3WodIRdKCOg51qwQsRWbn3aeS298GINGaM”secretKey = “NSBJOldWu9xXARDCcJRdQ5C0uVth3Qx5bY4y9YyKJubvz7HPcbJL3Racao9xEo1o”

That’s it ! We are connected to the API.

Pulling data from Binance

For all the steps below we will works on main.py.
The first step is to import a bunch of things that you may be familiar with, we will explain them after:

from binance.client import Clientfrom sklearn.preprocessing import StandardScaler, MinMaxScalerfrom matplotlib import pyplotimport pandas as pdimport keysimport os

In our case we want BTCUSDT (Bitcoin/Tether) but this script can create data for any pair available on Binance. We want the data since 2 year ago, but we can change this if wanted. The data will be for every minutes so it’s approximately 1 million rows of data.

#Changes these values to obtain the desired csvpair = os.getenv(“PAIR”, “BTCUSDT”)since = os.getenv(“SINCE”, “2 year ago UTC”)

To connect to our Binance account we use Client(apikey, secretkey), then we will have our dataframe df that we could work on.

#Connect to the Binance clientclient = Client(keys.apiKey, keys.secretKey)# Get the data from Binancedf = client.get_historical_klines(pair, Client.KLINE_INTERVAL_1MINUTE, since)

We want to normalize our data (make him between 0 and 1). However we still want to know the “Real Open” and the “Real CLose” it will be usefull to evaluate our bot.

# Store the open and close values in a pandas dataframereal_values = []for i in df:    real_values.append([i[1], i[3]])real_df = pd.DataFrame(real_values, columns=[“Real open”, “Real close”])

Normalizing data is easy with sklearn, after this all our data are between 0 and 1, exept for the Real Open and the Real Close of course.

# Normalize the datanormalizer = MinMaxScaler().fit(df)df = normalizer.transform(df)

We then transform our data to a pandas array and drop useless columns “Open Time”, “Close Time” and “Ignore”.

# Transform the data to a pandas arraydf = pd.DataFrame(df,columns=[“Open time”, “Open”, “High”, “Low”, “Close”, “Volume”,“Close time”, “Quote asset volume”, “Number of trades”,“Taker buy base asset volume”,“Taker buy quote asset volume”, “Ignore”])# Drop useless columnsdf.drop([“Open time”, “Close time”, “Ignore”], axis=1, inplace=True)

Plot the data ! It will give us a good idea if our generation is successful.

#Plot the datadf.plot()

We then add our 2 dataframes. We also need to drop the last timesteps, because it is possibly not finished.

# Add the real open and the real close to dfdf = pd.concat([df, real_df], axis=1)# Remove the last timestepdf = df[:-1]

Our csv file will be in our folder datafile and at the end we will show the plot.

# Add to the csv filedf.to_csv(“datafile/” + pair + “.csv”)#Show the plotpyplot.show()

Test our script

Running can take a bit of time, for 2 years of data I needed 15 minutes.

Running main.py give us a plot:

But above all it gives us a precious csv file in the folder datafile that we will use for our trading agent.

Next article to create our custom OPenAI’s gym environment made for a binance trading agent.

--

--