# Predicting Ethereum (ETH) Prices With RNN-LSTM in Keras (TensorFlow)

The idea of this topic is to present a simple way for predicting future prices of Ethereum cryptocurrency using exploratory analysis and recurrent neural networks (RNN), primarily LSTMs.

I will not go into details concerning cryptocurrencies and the operation of LSTMs, many articles are already present on the subject.

I invite you to look at these articles if these topics are still unclear for you :

**How does Ethereum work, anyway?**https://medium.com/@preethikasireddy/how-does-ethereum-work-anyway-22d1df506369

**Understanding LSTM and its diagrams**

https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714

The first step in any project is to recover a dataset, in our case we need the historical data from Ethereum.

The cryptodatadownload platform offers this type of data. I used the **Kraken **market and the **ETH / USH **dataset with **hourly granularity** available at this address :

Here is the csv file imported with pandas :

`df_ethusd = pd.read_csv("kraken_ethusd_1h_csv") `

df_ethusd.columns = df_ethusd.iloc[0]

df_ethusd.drop(df_ethusd.index[[0, 1]], inplace=True)

df_ethusd.head()

Before continuing, it is important to carry out some data structuring operations :

`df_ethusd.dtypes`

# convert date format to datetime

df_ethusd['Date'] = pd.to_datetime(df_ethusd['Date'], format='%Y-%m-%d %H-%p').dt.strftime('%Y-%m-%d %H:%M')

df_ethusd['Date'] = pd.to_datetime(df_ethusd['Date'])# sort values by date

df_ethusd = df_ethusd.sort_values(by='Date')df_ethusd.rename(columns = {'Date':'datetime'}, inplace = True)del df_ethusd['Symbol']

del df_ethusd['Unix Timestamp']

Simple check for null values :

`df_ethusd.isnull().sum()`

Before starting the correlation analysis, it is important to use numeric values on the quantitative variables :

# Convert all quantitative variables to numeric format

df_ethusd = pd.concat([

df_ethusd.iloc[:,0],

df_ethusd.iloc[:, 1:len(df_ethusd.columns)].astype('float')

], axis = 1)df_ethusd.dtypes

# Correlations

With the data preparation complete, the next step is to analyze the different correlations to identify the most interesting variables.

The idea is to predict the future values of “Close” price. It is therefore important to know if the other variables can explain the variability of this one.

Here we will first plot the Pearson correlation heatmap and see the correlation of independent variables with the output variable Close. We will only select features which has correlation of **above 0.5** (taking absolute value) with the output variable.

The correlation coefficient has values between -1 to 1

— A value closer to 0 implies weaker correlation (exact 0 implying no correlation)

— A value closer to 1 implies stronger positive correlation

— A value closer to -1 implies stronger negative correlation

I then recover the different pairs of correlations to check their pvalues and keep only the significant correlations with a threshold lower than 0.05%.

As we can see, only the features **High, Low** and **Open **are highly correlated with the output variable Close

To ensure this first hypothesis, a second analysis is performed with the Recursive Feature Elimination (RFE) method. The RFE method works by recursively removing attributes and building a model on those attributes that remain. It uses accuracy metric to rank the feature according to their importance.

Here is the result :

The RFE method also recommends the same variables as the first method with the Pearson coefficient.

We will therefore keep the **Open, High** and **Low **variables to predict the **Close **price.

# Data pre-processing

We now come to the pre-processing stage. It is important to select the different variables and to define a time steps.

The specified number of time steps defines the number of input variables (*X*) used to predict the next time step (*y*). As such, for each time step used in the representation, that many rows must be removed from the beginning of the dataset.

In our case, I set the time steps to 24. This means that the model will each time use the last 24 hours to predict the next future hour.

# Select features (columns) to be involved intro training and predictions

cols = list(['Close','High', 'Low', 'Open'])# Target feature

y_target = 'Close'# Number of time steps use to predict the future

n_time_steps = 24# Extract dates (will be used in visualization)

dataset_datelist = list(dataset['datetime'])# Parse training set timestamp for better visualization

dataset = pd.DataFrame(dataset, columns=cols)

dataset.index = dataset_datelist

dataset.index = pd.to_datetime(dataset.index)print('Training set shape == {}'.format(dataset.shape))

print('All timestamps == {}'.format(len(dataset_datelist)))

print('Featured selected: {}'.format(cols))

print('Featured target selected: {}'.format(y_target))

print('Number of time steps selected: {}'.format(n_time_steps))

Here is the dataset at this point :

# Evaluate normal distribution of selected features

The next step is to normalize the data so that the LSTM model is not affected by variations in scales. Differences in the scales across input variables may increase the difficulty of the problem being modeled.

There are several methods, the two main ones are offered by ScikitLearn :

**StandardScaler **: Removes the mean and scales the data to the unit variance.**MinMaxScaler **: Rescales the dataset so that all function values are in the range [0, 1].

In order to use the best method, it is important to know beforehand whether our variables follow a normal distribution.

To do this, we must look at the distributions of the variables and compare them with the normal density.

We can conclude that the variables do not seem to follow a normal distribution.

To ensure this, a Kolmogorov Smirnov test is applied to each of the variables, The Kolmogorov–Smirnov method is also another goodness of fit method that compares the maximum distance between the experimental cumulative distribution function and the theoretical cumulative distribution function.

Here are the results :

The variables do not have a pvalue greater than the significant threshold of 0.05%, which leads us to reject the hypothesis of normality. This is probably linked to the large number of outliers in the time series. A StandardScaler normalization seems very well suited in this context.

# Train-test split

Now it’s time to split our dataset into two parts, one part for training the model and a second part of data validation.

I used a standard **80/20** split which brings us to 22,999 samples for training and 5749 for validation data, which is sufficient in our case.

import mathtrain_split = 0.8Data = dataset.values #converting numpy array

train_data_size = math.ceil(len(Data)*train_split)

test_data_size = len(dataset) - train_data_sizeprint('train size == {}.'.format(train_data_size))

print('test size == {}.'.format(test_data_size))# split the actual dataframe in train/test set

train, test = dataset[0:train_data_size], dataset[train_data_size:len(dataset)]

print('train shape == {}.'.format(train.shape))

print('test shape == {}.'.format(test.shape))

Visualization of training and test data :

# Feature Scaling Normalization

We can then normalize the two datasets with the **StandardScaler **method.

from sklearn.preprocessing import StandardScalerscaler = StandardScaler()

training_scaled_data = scaler.fit_transform(train)

print('training scaled data shape == {}.'.format(training_scaled_data.shape))look_back_train_data = train.tail(n_time_steps) #look back n_time_steps

testing_data = look_back_train_data.append(test)scaler_test = StandardScaler()

testing_scaled_data = scaler_test.fit_transform(testing_data)scaler_test_predict = StandardScaler()

scaler_test_predict.fit_transform(testing_data.iloc[:, 0:1])print('testing scaled data shape == {}.'.format(testing_scaled_data.shape))

With the data now normalized, it now becomes important to transform the data structure into the input data expected by an LSTM model.

You always have to give a three-dimensional array as an input to your LSTM network. Where the first dimension represents the **batch size**, the second dimension represents the **time-steps** and the third dimension represents the number of **units** in one input sequence. For example, the input shape looks like *(batch_size, time_steps, units).*

There are many ways to transform data, here is one that achieves the 3 desired dimensions :

def split_sequences_multivariate_output(sequences, n_steps):

X, y = list(), list()

for i in range(len(sequences)):

# find the end of this pattern

end_ix = i + n_steps

# check if we are beyond the dataset

if end_ix > len(sequences)-1:

break

# gather input and output parts of the pattern

seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]

X.append(seq_x)

y.append(seq_y)

return array(X), array(y)# convert into input/output

X_train, y_train = split_sequences_multivariate_output(training_scaled_data, n_time_steps)X_test, y_test = split_sequences_multivariate_output(testing_scaled_data, n_time_steps)print('X_train shape == {}.'.format(X_train.shape))

print('y_train shape == {}.'.format(y_train.shape))print('X_test shape == {}.'.format(X_test.shape))

print('y_test shape == {}.'.format(y_test.shape))

Check the shape (again) before start training :

We have **24 time steps** on the **X_train **and **X_test **datasets with **4 features **used as well as 4 features also used on **y_train **and **y_test**.

# Build LSTM network

It is now time to prepare the LSTM model, I define a function which takes as input the training and test data as well as some hyper parameters.

The model is then formed with **two LSTM hidden layers**, each with **50 units**.

**25% dropout layers** are also used between each LSTM hidden layer.

A dropout on the input means that for a given probability, the data on the input connection to each LSTM block will be excluded from node activation and weight updates.

In Keras, this is specified with a *dropout* argument when creating an LSTM layer. The dropout value is a percentage between 0 (no dropout) and 1 (no connection).

It is important to specify the input shape on the first LSTM hidden layer so that it uses the same as the training data.

**Linear activation** is then used on the Dense output layer.

def train_keras_model(X_train, y_train, X_test, y_test, epochs, batch_size, shuffle=False): from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import LSTM, Dropout, Dense# Initializing the Neural Network based on LSTM

model = Sequential()

model.add(LSTM(units=50,return_sequences=True,input_shape=(X_train.shape[1], X_train.shape[2])))

model.add(Dropout(0.25))

model.add(LSTM(units=50))

model.add(Dropout(0.25))

model.add(Dense(units=X_train.shape[2], activation='linear'))

model.compile(optimizer='adam',loss='mean_squared_error')

history = model.fit(X_train, y_train, shuffle=shuffle, validation_data=(X_test, y_test), epochs=epochs, verbose=2, batch_size=batch_size).history

return history, model

The training can begin, I used **30 epochs** with a **batch size set to 256**. These values seem to make the model converge quickly.

`# Fit model`

history, model = train_keras_model(X_train, y_train, X_test, y_test, epochs=30, batch_size=256, shuffle=False)

Here are the training and validation loss curves :

The model seems to converge quickly to 0 on both training and validation data.

# Performance visualization using the Test Set

It becomes important to check the performance of the model with different metrics and not just with loss curves.

The most interesting metrics for this are: **MAE, MAPE, MSE, RMSE, R_Squared and Adjusted R-Squared.**

They must be calculated from the predicted values on the validation data :

# Perform predictions

predictions_test = model.predict(X_test)# Inverse the predictions to original measurements

y_pred_test = scaler_test_predict.inverse_transform(np.array(predictions_test)[:,0])y_actual_test = scaler_test_predict.inverse_transform(np.array(y_test)[:,0])

It is then possible to apply different metrics on **y_pred_test **and **y_actual_test**.

Here are the results :

We obtain a** MAPE value of 3.43%** which means a very low average error between the actual values and the values predicted by the model.

In addition, the **adjusted R2 and R2 coefficients** are very **close to 1**, which means that the predicted values are strongly correlated with the real values and therefore explain a lot of variances on the real values.

We can visualize the performance of the model with a graph, for this I define two time series, one with validations data and another with predicted data :

y_test_serie = pd.DataFrame(y_actual_test, columns=[y_target]).set_index(testing_data[n_time_steps:].index)y_pred_serie = pd.DataFrame(y_pred_test, columns=[y_target]).set_index(testing_data[n_time_steps:].index)

Then just view the two series :

plt.plot(y_test_serie.index, y_test_serie[y_target], color='green', linewidth=2, label='Actual')

plt.plot(y_pred_serie.index, y_pred_serie[y_target], color='red', linewidth=2, label='Testing predictions')plt.grid(which='major', color='#cccccc', alpha=0.5)plt.legend(shadow=True)

plt.title('Testing predictions Vs Acutal')

plt.xlabel('Timeline', fontsize=10)

plt.ylabel('Value', fontsize=10)

plt.xticks(rotation=45, fontsize=8)

plt.show()

We can focus on the last month :

Although there are still improvements to be made to data processing and model parameters to improve the quality of predictions. The predictions made by the model follow the main trend in test data.

# Predicting the future

It is then possible to use the model to predict the future, the future prices of Ethereum for the next few hours.

There are several approaches to predict the future. A direct prediction or a recursive prediction.

I used recursive prediction to predict the next 12 hours. The model predicts the 4 features for 1 time step each time.

So I used the predicted variables and I integrated them as input variables in the last window by shifting by once step each time.

This approach is interesting for small predictions because it multiplies the predicted error each time which can considerably impact the quality of the predictions over long periods.

Here is the result of the next 12 hours:

# Summary

The Ethereum time series was used with the Open, High, and Low variables to measure network performance on the Close variable.

The estimation results obtained are compared to the graphs. MSE, MAPE, and R values were examined as predictive success criteria.

However, it is possible to achieve more successful results with more data points and by modifying the hyper parameters of the LSTM network.

Thats it! Hope this article provides a good understanding on using LSTM’s to forecast time series.

*References :*

https://machinelearningmastery.com/use-dropout-lstm-networks-time-series-forecasting/

https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

https://machinelearningmastery.com/use-timesteps-lstm-networks-time-series-forecasting/

https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b

*Follow me **here**.*