Published in

Google Trends Data for automated stock trading using Reinforcement learning.

Due to COVID-19, most of the people got stuck at their homes in lockdown. They turned to investment and there was a rise in retail investment. Cases like Gamestonk shows the extent of the impact of retail investing. These investors don’t have a formal background in finance and are mostly self-taught by utilizing internet resources. Google is a dominant player in internet services and most of these investors directly or indirectly flow through these channels to pick stocks and how much to invest. As of June 2021, Google search captures 92.47% of the search engine market. So to follow trends of a company, Google trends can be a good proxy for the financial sentiment about the company.

So, what are we trying to achieve in this project?

  • Financial language is hard to reason and drawing sentiment from online reports given by CEOs and CFOs of companies can be misleading as they will try to belie the problems with euphemisms. So Google trends can serve as a good representation of what people are talking about the asset online.
  • So our main aim is to validate whether Google trends can improve the observability of an RL trading agent over the plain OHLCV data.
  • Also, Google trends data only talks about the trend not whether it is a downtrend or an uptrend. So we will explore the Chaikin Money Flow indicator which talks about the selling and buying pressure of an asset
  • We will also explore whether adding technical indicators to our OHLCV data can give better results than trends data.


  • The link to the codebase can be found below
  • An RL agent has 3 main components: state-space Sₜ represents the current observations of the agent. Action space aₜ is the action taken by the agent in the environment and rₜ is the ensuing reward the agent gets based on its action. It acts in a sequential nature with the environment and it accumulates reward till the end of the episode. The aim of the agent is to maximize rewards in the episode. Here the rewards are the daily return that the agent accumulates from its actions


  • State-space represents the current internal that the agent has at the time t. We assume an MDP formalization. It consists of {sₜ=(vₜ,nₜ,cₜ,uₜ)}, where vₜ is the cash value, nₜ is the number of shares owned, cₜ is the closing price of the asset and uₜ is the user-defined features at the time t. Its dimension is [(vₜ,+cₜ + nₜ+uₜ) ]for an individual stock. Total portfolio value is given as vₜ+ nₜ.cₜ.
  • The user-defined features consist of different test cases in our experiment. We have Chaikin money flow indicator, Google trends data and other technical indicators. Other technical indicators consist of MACD (Moving average convergence and divergence), RSI (Relative Strength Index), CCI (Commodity Channel Index) and DX (Directional Movement).
  • Chaikin Money flow indicator (cmfₜ): It represents the selling and buying pressure of the current market. -1≤cmfₜ≤+1, where negative values determine the selling pressure and positive represents buying pressure.
  • Google trends data (Pₜ): It is scored based on the search volume of a keyword using Google search services. 0≤Pₜ≤100, where a higher score represents high search volume. It is unscaled data to avoid any data leakage problem. The trends data was downloaded by specifying the category of the keyword of the asset. The keywords are necessary because Apple can be a technology company and a fruit. Similarly, Amazon can be an internet company or a forest.
The categories for downloading the trends data. The source can be found here


  • Action Space consists of distinct values in the set {-1,0,+1}. Here +1 means buy one share of the asset, -1 means sell one share and 0 means hold.
  • We are using an Actor-Critic method PPO (Proximal Policy Optimization) which has an actor and critic network. Here the actor-network outputs the actions which have a Gaussian distribution. The outputs were discretized by multiplying with 1 and taking the floor value.
  • The reward is the discounted sum of daily returns for the entire episode, here the gamma value was chosen as 0.99 for discounted reward.


  • The pipeline consists of the following open-source libraries, (i)RL agents from Stable Baselines3, (ii)using Optuna for hyperparameter tuning in the validation period and (iii)FinRL for the stock market trading environment and other implementation (iv)Yahoo finance to download trading data.
The pipeline for our implementation
  • Next is our timeframe: We have 2 sets of stocks, Crypto and Non-crypto. Non-crypto stocks include Apple, Microsoft, Amazon, Facebook, Google and Tesla. Non-crypto stocks include Bitcoin, Ethereum and Dogecoin.
  • So the time frame for non-crypto stocks are
Non-crypto stocks
  • For crypto stocks, the dates were picked based on their respective inception years. Crypto markets are always open hence they can provide more data compared to other stocks which are open for only 252 days in a given year. So for Bitcoin the data range.
Bitcoin range
  • Similarly for Ethereum and Dogecoin are mentioned on the Github project page.
  • Testing environment implementation details are as follows: We have an initial amount of $100,000 at the starting of each train, validation and testing phase. The transaction cost for both buying and selling to include the slippage cost is set at 0.001. No out of cash penalty was incurred given that we are only buying or selling one share, so it is highly unlikely that we will run out of cash.
  • Next, for hyperparameter optimization, we specify the search space of the PPO algorithm for entropy coefficient, episode length, learning rate and batch size. For more details, follow my previous blog post
The search space for hyperparameter optimization
  • So our model gets fine-tuned in the validation phases and the optimal set of hyperparameters were chosen based on the Sharpe ratio by specifying the objective to maximize the Sharpe ratio
  • Then we have multiple test cases that we explore and ablation studies,
Different test cases that we use for our results


  • Below are the charts for Account value vs. testing period. For better visualization, jump over to here
  • Sharpe ratio values for the testing period
Note that the data for Buy and Hold (B&H) and TDQN is taken from here


  • Amazon: It is fairly a stable stock with a steady return. So parameters like trends data and CMF won’t help much here compared to technical analysis. With technical indicators, the agent was able to generate better returns compared to other cases. Normalized trends data (ohlcv_cmf_pytrends_100) also performed well and gave returns close to the best performing one.
  • Apple and Google: The TDQN model gave the best returns for Apple and OHLCV + Trends data came close to it. There is not a solid reason for the results, but these are stable companies that have healthy returns
  • Bitcoin, Facebook and Microsoft: Again the normalized trend data gave good results compared to other test cases. So normalized trends can paint an accurate about the asset.

Trends talks about what people are talking about the asset and CMF talks about what the data is talking about the asset.

  • Tesla and Dogecoin: The decoupled CMF and trends data gave good returns because it reduces the volatility in the transaction and the agent considers both the trading parameters to take decisions and not be biased to trends data. It is not strange to see the correlation between Tesla and Doge😁.
  • Certainly, trends lead to more volatility as the agent overestimates during bearish and bullish periods as the spikes are higher than normal OHLCV data.


  • With the democratization of financial trading, more and more players are flocking into the market daily. COVID-19 accelerated this trend as many were stuck in their homes and turned to invest to earn extra bucks. The Internet helped publishers to disseminate information and it helps the investors to be better informed and do their own research. Google trends being the most comprehensive about the internet traffic can paint a good picture about the current trend of the market.
  • In our testing, augmenting the state-space with trends data lead to the performance increase, particularly for volatile stocks and cryptocurrency. This resonates with the idea of democratized finance which crypto aimed to achieve. Also, Google trends can provide real-time data that can help us to do real-time trading and it can serve as a good proxy for financial sentiment about the asset. The trends data can be helpful if you have a higher ceiling on the number of shares that we can trade at a given time because it is continuous and shows the extent of sentiment on a given stock.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Astarag Mohapatra

Astarag Mohapatra

Hi Astarag here, I am interested in topics about Deep learning and other topics. If you have any queries I am one comment away