Predicting ICO returns with machine learning

Piotr Płoński
6 min readFeb 19, 2018

--

Have you heard about bitcoin? It is a cryptocurrency, according to wikipedia:

cryptocurrency is a digital asset designed to work as a medium of exchange that uses cryptography to secure its transactions, to control the creation of additional units, and to verify the transfer of assets

Do you know that there are many more cryptocurrencies? In fact, every day new coins (tokens) are created. The process of a new coin creation is called ICO (Initial Coin Offering). During this process, participants can buy newly created coins. It is a form of funding for new companies/organizations. Usually, the price of new tokens during ICO is low because behind the token there is only prototype or idea and not many people are aware of the new project. However, when the new token reaches the exchange or the company ships a product, the token price can increase several times. For example, if you were an early investor in Ethereum (ETH), you could buy 1 ETH for about 0.30 USD, while at the moment of writing this article it costs 936 USD. It is about 300,000% of the return. If you had invested 100 USD in Ethereum ICO you would have earned about 300,000 USD :-)

Ethereum price from https://coinmarketcap.com/currencies/ethereum/ (accessed 16Feb2018)

Looking at these crazy returns you will understand that many people would like to invest in such projects. But it is not so easy. Currently, there are so many ICOs running that selecting the right project for investment is difficult. Very often ICO can be a scam and investors will loose all their money.

Top-5 ICOs returns, source https://icostats.com/roi-since-ico (accessed 16Feb2018)

There are many sites that aggregate the information about upcoming ICOs, for example: icodrops. There are also people in cryptocurrencies community that track and score upcoming ICOs. One of my favourite ICO influencers is Ian Balina. He has data-driven approach for selecting ICO investments. He has a spreadsheet where each ICO is described with numbers, points and grades. Based on computed score he is selecting his investments. Nice!

Ian‘s spreadsheet

Let’s do more than spreadsheet analysis! I will build a machine learning model that based on Ian’s data is predicting the return on investment.

First, I took data from Ian’s spreadsheet and created my data set. For each ICO I have added a minimum, maximum and enter price from coinmarketcap.com (CMC) (enter price is the first price from CMC chart). I have also corrected few ICO prices using data from icodrops.com and tokendata.io.

As you see, some rows have a missing price from coinmarketcap.com. This is because these are new projects and they haven’t reached exchanges yet. Let’s predict what maximum returns for them can be!

Rows with full CMC price information will be used as training data. The input columns for the model will be:

  • Ian_ICO_Grade — score created by Ian Balina
  • Largest_Bonus — information about bonus size
  • ICO Price
  • Number of ICO tokens (number of tokens available for sale in ICO)
  • Total Supply Ratio (number of tokens in ICO divided by total token number)
  • ICO Market Cap
  • Information if a prototype is available
  • Number of points for team scored by Ian
  • Number of points for advisors scored by Ian
  • Number of points for ICO idea
  • The size of community
  • Information if it is a Security token
  • Type of the product
  • Audience of the product

The output of the model will be Max_CMC_x which is defined as:

Max_CMC_x = Max_CMC / ICO_Price

(note: in training I have used logarithm of Max_CMC_x because it has better distribution for ML model training)

I have used mljar.com for training machine learning models, with 5-fold cross-validation, MSE metric score and xgboost, lightGBM and random forest algorithms.

Setting up machine learning model training in mljar.com

MLJAR searches through many different machine learning algorithms and selects the best fit for training data. OK, we have a lot of models!

Models from MLJAR
  • What can we do with these models?
  • Many interesting things!

Which features are important for predicting ICO investment returns?

Let’s check the features importance for the best model.

Feature importance from the best model (lowest MSE score). The larger bar, the more important feature.

This is interesting!

  • The ICO price is the most important feature for predicting ICO returns! You can try to explain this: the lower the price is the more room for future growth. Simple.
  • Ian’s grade is the second most important feature — Ian’s grade is correlated with the project’s quality. The higher the grade is, the better project is, and the higher chance for the price growth.
  • In top — 6 features, there are also: number of tokens sold in ICO, ICO market cap, and total supply — in my opinion, it is similar situation to that of the ICO’s price, investors are trying to assess the room for the future growth.
  • The number of community members is also very important — the more people interested in the project, the better :)

What returns for past ICOs does the model predict?

Let’s take the out-of-folds (OOF) predictions from machine learning model and check what returns for past ICOs does it predict. It will be interesting to compare them with real returns. Below are top-20 tokens with the highest returns according to the model:

OOF_Prediction column is ML model prediction of returns for past ICOs

According to the model, the highest returns should be for District0x, 0x and Red Pulse. The interesting results are for tokens where OOF_Prediction is much higher that Max_CMC_x (Max_CMC_x = Max_CMC/ICO_Price). For example, Icon has 115x return in investment, but the model predicts ‘only’ 18x return.

My interpretation: tokens which have OOF_Prediction value much higher than Max_CMC_x value have still room for the growth — for example, AirToken can grow 5 times more. On the other hand, for tokens with OOF_Prediction much lower that Max_CMC_x these tokens maybe over-priced — or machine learning model made a mistake.

All predictions from ML model for past ICOs are here.

What returns for new ICOs does the model predict?

Let’s compute predictions for tokens which returns are unknown! Below are the top-20 highest returns computed by machine learning model:

Returns predicted by machine learning model

You can see that a token with Zombie code name on Ian’s spreadsheet will have the highest return based on the model.

On the other hand, returns aren’t promising for Telegram token, as the model predicts ‘only’ 6.8x return. In my opinion, it is caused by huge ICO Cap, and that’s why model was conservative in terms of predicted returns.

All predictions from ML model for upcoming ICOs are here.

Conclusions

Investing in ICOs can be very profitable. Picking good token for investment requires careful research. The data-driven approach, in my opinion, is the most consistent. Enriching the spreadsheet with machine learning predictions can give investors more insights:

  • knowledge of potential returns,
  • assessment of potential exit price.

The ML insights can be also used by people who create ICO to set attractive token-economics for investors.

Please comment below: if you have any questions or ideas or have more data about ICOs that can be used for scoring.

Data used in this post is available here.

Code used for data pre-processing is here.

The MLJAR service has been used for creating machine learning models.

Disclaimer: I take no responsibility or liability for any losses which may be incurred by any person or persons using the whole or part of the contents of this article. It is not financial advice! It is not investment advice!

--

--