Forecasting Future Price Movement of Ethereum using Big Data Analytics and Machine Learning

Domain — Blockchain & Cryptocurrency

Shehara Pandithasekera
8 min readNov 8, 2023
Ethereum Price Prediction

Introduction

A blockchain is a distributed and decentralized ledger that records transactions across several nodes (computers). There is no need for an intermediary third-party (i.e., banks, sellers) due to its peer-to-peer (P2P) network structure. It enables;

  • Transparency
  • Security
  • Immutability

Cryptocurrencies are digital currencies that use cryptography to safeguard transactions and regulate the generation of new units. Currently the Ethereum market value is of USD 224 billion (etherscan.io, 2023).

Figure — Transactions without third parties

For example, let’s say Alice has cryptocurrency worth of 400 dollars and she wants to send bob 100 dollars. So the ledger will update Alice’s balance as 300 USD (which is 400–100 dollars) and update Bob’s balance as the current balance they have + plus the 100 dollars. Also, these records are always being approved by the other nodes in the network that have this ledger. The transaction records are stored in blocks, the high-level infrastructure is a chain like structure of blocks. These blocks are verified and secured with the use of cryptography.

Problem…

Forecasting Future Price Movement of Ethereum to support Investment companies

Investment companies face many problems when investing in the Ethereum blockchain.

  • Volatility of price movements
  • Market liquidity
  • Traditional valuation metrics

For example, the Ethereum token dropped more than 30% from $1,574.80 on 8th November to $1,083.29 on the following day (capital.com). But still many investment companies are more likely to take these risks on this investment.

Price prediction of Ethereum is difficult to capture due to the large quantity of transactions and the rapid price movements. Therefore, this article recommends using Big Data Analytics and Machine Learning to tackle this task.

Data Analytics Concepts

Advanced statistical techniques help predict Ethereum price movements. Using big data, you can make informed decisions in the cryptocurrency industry. Investment companies can utilize machine learning, data mining, and statistical modeling to evaluate and comprehend vast amounts of data.

Figure — source : Data mining process

Big Data in terms of 4Vs

  1. Volume — Decentralized system maintains all transaction records for every individual, resulting in a data heavy environment. For example, Ethereum had a trading volume of 38 billion dollars in 2021 and Ethereum also has 16 million smart contracts (which are digital contracts) in 2020, which has now reached even beyond that at this point in time.
  2. Velocity — Cryptocurrency platforms operate worldwide 24/7. Meaning the trading and transaction velocity is very high almost all the time. For instance, the Bitcoin cryptocurrency network executes around 50 million tera cryptographic hashes every second (cryptographic hashes are unique immutable keys used to verify blocks and transactions in the blockchain).
  3. Variety — The data in the cryptocurrency platforms can vary such as, transaction data, network utilization data, smart contracts, and token data etc.
  4. Value — Value of this data is immeasurable, considering the fact that this is an alternate digital currency. Ethereum networks transactions hold high value (meaning they are verified by multiple nodes of the network) and they are highly secured (since the blocks of the network are tamper-proof with the use of cryptography).

Machine Learning

Long Short-Term Memory (LSTM) and Random Forest models are used to forecast the future price movement of Ethereum. I have created the dataset using the data retrieved from Etherscan API. Etherscan is an official Ethereum trading platform, which has all transactions records of the Ethereum network.

TARGET LABEL : Ethereum price (‘price’)

FEATURES : Date (‘date’), transaction volume (‘daily_tx_volume’), active users in network (‘unqiue_address_count’), hashes made with cryptography (‘network_hashes’), market capitalization of Ethereum (‘marketcap’), transaction fee (‘tx_fee’), network utilization (‘network_util’) and gas usage (‘gas_used’).

Long Short-Term Memory (LSTM) Model

Figure — source : RNN vs. LSTM

This algorithm is used for time series analysis. Effective in capturing long term dependencies and non-linear dependencies. The LSTM model is implemented using the TensorFlow library.

Sequential API provided by the Keras library. The Sequential model allows you to stack layers sequentially. In this case, the LSTM model consists of two LSTM layers and a Dense layer.

  1. The model takes input data with a shape of (number of samples, 1, number of features)
  2. The first LSTM layer processes the input sequence, learning patterns and dependencies in the training data. ReLu — ReLU sets any negative input values to zero and leaves positive values unchanged. ReLU neurons can be sparse in their activations. When an input is negative, the ReLU function produces an output of zero, which effectively turns off the neuron.
  3. The second LSTM layer further captures additional patterns and dependencies for reach a higher accuracy level.
  4. The output of the second LSTM layer is fed into a Dense layer, which maps the extracted features to the final predicted value
  5. The optimizer (Adam) adjusts the model’s weights during training to improve its performance.
Figure — LSTM Model

Evaluation of LSTM Model

Squared Error (MSE) = ~ 233578.842
Root Mean-squared Error (RMSE) = ~ USD 483.299

Figure — Test data (Ethereum price movements)
Figure — Predicted Ethereum price movements

This actual Ethereum price movement chart is taken from real most recent data. You can clearly see the similarities between the predicted price movement and the actual. For example, the drop that is happening from 13th of June and the upwards trend again on 17th June with the precise USD values.

Random Forest Model

Figure — source : Random Forest

This problem is a regression problem; therefore, Random Forest Regression algorithm is used. Effective in capturing complex relationships between Ethereum prices and contributing features. This model can mitigate bias and improve overall accuracy with the ensemble tree structure.

Figure — Random Forest

Evaluation of Random Forest Model

Squared Error (MSE) = ~ 203.613
Root Mean-squared Error (RMSE) = ~ USD 14.269

These models can essentially be used for anything, for oil and gold trading, forex and many more applications. The accuracies of these models can be further improved by improving feature engineering and by incorporating big data analytics and analyzing data from multiple sources which is discussed in the next topic.

Use of big data analytics to support decision making

  1. Value Investing — Recognizing undervalued stocks and consistently monitoring their value are key to making returns in this investment category.
  2. Big data helps investors identify profitable investment opportunities from analyzing data from many sources such as platform instability, trading volumes, and consumer behavior trends. We can further improve this model and provide insights to other cryptocurrencies.
  3. Cryptocurrency market volatility requires real-time data monitoring for investment businesses to react quickly to changes, spot trends, and modify investment plans. For example, With Elon Musk’s takeover of Twitter, the token started a rally that culminated in a high of $1,652.38 on 29 October. Therefore, there are many contributing factors to the price movement of Ethereum.
Figure — source : Big data ecosystem software and tools

Big Data Intervention strategies

  1. Data integration and Centralization — Data lakes enable investment companies to combine data from various sources, such as news organizations, cryptocurrency platforms, social media, and market data suppliers.. These data lakes store Ethereum-related data, which helps investors analyze and learn about market dynamics using techniques like data mining, sentiment analysis, and pattern recognition.
  2. Cutting-edge Data Analytics Techniques — Data Lakehouses offer a unified, scalable platform for storing and analyzing Ethereum-linked structured and unstructured data, combining data lakes and warehousing for structural data preservation and adaptability.
  3. Real-time Data Processing and Monitoring — Data Lakehouses support real-time data processing, allowing investors to quickly respond to market movements and sentiment through on-chain metrics like Ethereum and other blockchain platforms.
  4. Data Visualization and Reporting — Investment firms can utilize data visualization techniques to convey complex data in clear, usable ways which enable decision-makers to quickly understand key insights and trends. 1. Interactive dashboards and visual representations, such as price charts, sentiment heatmaps, market indicators, and portfolio performance measurements, enable decision-makers to quickly understand key insights and trends.
  5. Predictive Analytics — Big data analytics enables investment companies to develop accurate prediction models and scenario analysis. Scenario analysis — Through simulation, they can evaluate the effects of various investing methods on the success of their portfolios.
    This helps decision-makers understand risks and rewards associated with investment decisions.

You can check out the below PowerPoint for further information.

References

--

--