Train our Agent using our environment — Bitcoin Binance trading

Mathieu Cesbron
2 min readDec 14, 2019

--

Our objective is to train an agent by making him on the binance environment created in the last article. Let’s see if he can make money trading Bitcoin. All of the code for this article is available on my GitHub.

In this article we will be using OpenAI’s gym and PPO agent from the stable-baseline library. We already have a env.py which contains our RL environment, a static.py file which contains all of our constants like fees paid and a .csv file which contains the past Bitcoin data that we have created in this article.

We will create a file main.py that we will run and that will train our agent.

The main.py file

First let import what we will need for our env, we will explain them after:

from stable_baselines.common.policies import MlpPolicyfrom stable_baselines.common.vec_env import DummyVecEnvfrom stable_baselines import PPO2from env import CryptoEnvimport pandas as pdimport os

Then we read the data that we have put on a data folder:

df = pd.read_csv(‘data/BTCUSDT.csv’, index_col=0)

We have imported our environment created (CryptoEnv), the way to instantiate it is not really instinctive:

env = DummyVecEnv([lambda: CryptoEnv(df)])

We create our agent that will try his best to trade on our environment. We can change parameters like gamma or the learning rate to have better results.

# Instanciate the agentmodel = PPO2(MlpPolicy, env, gamma=1, learning_rate=0.01, verbose=0)

We then train the agent during 5000000 timesteps:

# Train the agenttotal_timesteps = int(os.getenv(‘TOTAL_TIMESTEPS’, 500000))model.learn(total_timesteps)

And let’s render if we succeed to increase our reward over time:

# Render the graph of rewardsenv.render(graph=True)

After training we need to check if he can predict the market:

# Trained agent performenceobs = env.reset()env.render()for i in range(100000):action, _states = model.predict(obs)obs, rewards, done, info = env.step(action)env.render(print_step=True)

Run main.py !

Let’s see the result for a short training:

So after a few training, what i see is that the agent is doing very poorly at first because he is trading WAY TOO MUCH and he is paying the trading fees each time.

After ~50 episodes he understand that trading comes with a cost. After ~1000 episodes he barely trade like he know that he will just loose money most of the time.

It’s a great news ! That means we have succeeded to create an environment close to the real one. Surpassing the 0% profit is the real challenge, for the moment we have only recreated the environment.

Stay tuned for the next article.

--

--