Can Social Media Predict Asset Price Movement? is an open source model and trading application that produces buy and sell signals based on social media activity and sentiment.

Kyle Benzle
Geek Culture
4 min readMar 13, 2021


It certainly seems that online forums can have a certain premonition about price movements. As chatter of a given asset increases, often so does the price. This is the hunch I wanted to validates and if correct, profit from with this project. I hope some parts of if not the whole project will be useful to others and it is hosted on GitHub and PythonAnywhere. Please get in touch with any questions or advice.

Using a large amount of data scraped hourly from Reddit along with historic price data a model is built to try and predict positive or negative future price movements.

Overview of TradR

This project has 5 parts with the final result as,, the front-end app where users can see the hourly updated buy and sell signals.

1. Web scraper / API — gather real-time social media and price data.

2. Feature engineering — what data to use and what to predict.

3. Build a model — maximize buy/sell signal accuracy.

4. Trade — make real-time trades.

5. App — real-time interactive app for users.

Program Control

Several modules are controlled from the, “”. Once per hour the following are run:

  • /TradeMaker/

1. Scraping Reddit

Selenium is used because it gives more flexibility with no restrictions, but could try the API too.

The crypotocurrency community is nicely into separated forums based on asset, so it can give a more granular view. Because cryptocurrency is traded 24/7 and forums are very active lot of data is available.

Date scrapped hourly:

  • All comment text
  • Current users number
  • Number of posts in last hour
  • Number of comments in last hour
  • Number of votes in the last hour
  • Hourly price and volume data from API

For Assets/Subreddits

  • r/bitcoin, r/btc, r/ethereum, r/ethtrader, r/ethfinance, r/monero, r/xmrtrader

Price data is grabbed from the free API.

Code for the Reddit scraper is found here and Price scraper here.


All together the following features are being used:

  • Hour of day
  • Day of week
  • Number of users/hour
  • Number of posts/hour
  • Comments/hour
  • 15 most significant words
  • NLP on comments

The target for the training data is a +/- in the percent change for the next hour.


Random forest models were used to classify the most significant words used in the comments and for +/- price signal classification based on all the features above. Full code is found here. The output is an update to the file, “SignalInput.csv” were 1=Buy and 0=Sell signal based on each subreddit and asset.


The target output of the analysis is an hourly updated array of buy/sell signals. After the usr has put in their API in the file the the signals are read in by the, “” file and trades made on Binance once per hour.

The basic algorithm for the sales and purchases is that for a buy signal, 20% of the account is spent on that asset, for sell signals all of the asset will be sold.


At a user can choose what features to include and rerun the model to try and get the best possible score and predictions.


As far as performance, after about one month the performance is flat. It is very conservative and I the algorithm spends most of its time sitting in USD. Sell signals seem to be much more common and only 10–20% of the time does the algorithm make a purchase.

Future Work

Continue to test accuracy with new data.

Tweak features.

Optimize the number of words.

Try time series models with price data included.

Use sliding window to test at what time interval signals are most accurate.

Add ability for users to trade on

Thanks for reading and please get in touch at or on Twitter.



Kyle Benzle
Geek Culture

I am a plant biologist with an MS from OSU and broad experience in data science, cell biology, genetics, genomics, and plant breeding.