🚀 Dogecoin Price Prediction using LSTM network

Published in

hackerdawn

6 min readMay 21, 2021

Dogecoin (DOGE) is a cryptocurrency created as a joke, making fun of the wild speculation in cryptocurrencies at the time. Dogecoin features the face of the Shiba Inu dog from the “Doge” meme as its logo. It has quickly developed its own online community, reaching a gigantic market capitalization. In this tutorial, we will predict the prices of Dogecoin. For this purpose, we’ll use the Dogecoin Historical Data from Kaggle.

Importing Libraries

Let’s first import the required libraries. If you don’t have a particular library installed, run the command ‘pip install <package_name>’ to install it.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Activation
from sklearn.metrics import mean_squared_error, mean_absolute_error

Loading the Dataset

We’ll have downloaded the dataset from Kaggle. Let’s now load it and print its head to see how it looks like.

doge = pd.read_csv("../Kaggle/dogecoin-historical-data/DOGE-USD.csv")doge.head()

Let us see the shape of the dataset. The shape turn-outs to be (2438, 7)

doge.shape

Shape of the Dataset

We will use the info() function to see the column-wise non-null counts and data types.

doge.info()

Let’s see the total count and percentage of null values in each column. As you can see in the output, all the columns except ‘Date’ contain 4 null values.

total = doge.isnull().sum().sort_values(ascending=False)percent = (doge.isnull().sum()/doge.isnull().count()).sort_values(ascending=False)missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])missing_data

Column-wise total null count and Percentage

Data Visualization

We will use a heatmap to see the correlation between different features in the dataset. The higher the number in the box, the higher is the correlation.

#Plotting correlation
plt.figure(figsize=(7,7))
corr=doge[doge.columns[1:]].corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(doge[doge.columns[1:]].corr(), mask=mask, vmax=.3, center=0,
square=True, linewidths=.5,annot=True)
plt.show()

We‘ll’ use a violin plot to see how the distribution looks like for different columns in the dataset. In seaborn, a violin plot can be created using the violinplot function.

fig,axes = plt.subplots(3,2,figsize = (10,10))
fig.suptitle("Count plot for categorical features")#Open
sns.violinplot(ax=axes[0,0],data=doge,x='Open',color='#4e89ae')
#High
sns.violinplot(ax=axes[0,1],data=doge,x='High',color='#c56183')
#Low
sns.violinplot(ax=axes[1,0],data=doge,x='Low',color='#ff8000')
#Close
sns.violinplot(ax=axes[1,1],data=doge,x='Close',color='#ffd700')
#Adj Close
sns.violinplot(ax=axes[2,0],data=doge,x='Adj Close',color='#7cfC00')
#Volume
sns.violinplot(ax=axes[2,1],data=doge,x='Volume',color='#00FFFF')

Preparing the Data

We will take care of the nulls first. Let’s fill the nulls in the discrete column with 0’s and nulls in continuous columns using the method ‘ffill’. ‘ffill’ stands for ‘forward fill’ and will propagate the last valid observation forward.

doge['Volume'].fillna(value=0, inplace=True)
doge['Open'].fillna(method='ffill', inplace=True)
doge['High'].fillna(method='ffill', inplace=True)
doge['Low'].fillna(method='ffill', inplace=True)
doge['Close'].fillna(method='ffill', inplace=True)
doge['Adj Close'].fillna(method='ffill', inplace=True)

We’ll convert the data type of ‘Date’ column to datetime. We will also rename the column ‘Close’ to ‘Price’ for better comprehension.

doge['Date'] = pd.to_datetime(doge['Date'])
doge.rename(columns = {'Close':'Price'}, inplace = True)

We will apply the dt.tz_localize method on the ‘Date’ column. This method takes a time zone (tz) naive Datetime Array object and makes this time zone aware. Time zone localization helps to switch from time zone-aware to time zone-unaware objects.

We will also set the dataframe’s index to the ‘Date’ column. Then we’ll set doge to the column ‘Price’.

doge['Date'] = doge['Date'].dt.tz_localize(None)
doge = doge.set_index('Date')
doge = doge[['Price']]

Let’s split the data between train and test.

#Splitting data
split_date = '2018-06-25'
data_train = doge.loc[doge.index <= split_date].copy()
data_test = doge.loc[doge.index > split_date].copy()

Let’s preprocess the data to make it ready for feeding into the model.

#Data preprocessing
training_set = data_train.values
training_set = np.reshape(training_set, (len(training_set), 1))
sc = MinMaxScaler()
training_set = sc.fit_transform(training_set)
X_train = training_set[0:len(training_set)-1]
y_train = training_set[1:len(training_set)]
X_train = np.reshape(X_train, (len(X_train), 1, 1))

We’ll create a plot to see the historical price of Dogecoin.

#Historical price
_ = doge.plot(style='', figsize=(15,5), color="#F8766D", title='DOGE Price (USD) by Days')

Let’s create a plot to see how our test and training sets look like.

#Plotting the test and training sets
_ = data_test.rename(columns={'Price': 'Test Set'}).join(data_train.rename(columns={'Price': 'Training Set'}), how='outer').plot(figsize=(15,5), title='Test & Training Sets', style='')

Creating the Model

It’s time to create the model. We will use LSTM for this purpose. Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.

#Creating the model
model = Sequential()
model.add(LSTM(128,activation="sigmoid",input_shape=(1,1)))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=100, batch_size=50, verbose=2)

Let’s see how our model looks like by using the summary() method.

model.summary()

Predicting the Prices

We will feed the test data into the model after reshaping and transforming it. The predictions from the model will be stored in the predicted_DOGE_price variable.

#Making the predictions
test_set = data_test.values
inputs = np.reshape(test_set, (len(test_set), 1))
inputs = sc.transform(inputs)
inputs = np.reshape(inputs, (len(inputs), 1, 1))
predicted_DOGE_price = model.predict(inputs)
predicted_DOGE_price = sc.inverse_transform(predicted_DOGE_price)

Let’s create a dataframe data_all to store the ‘Price’ and ‘Predicted_Price’.

data_test['Price_Prediction'] = predicted_DOGE_price
data_all = pd.concat([data_test, data_train], sort=False)

Let us see how our predictions compare to the actual prices. The blue line shows the actual prices and the yellow line shows the predicted prices.

_ = data_all[['Price','Price_Prediction']].plot(figsize=(15, 5))

Let’s zoom in on the graph for the Jan to May months of the year 2021. We can clearly see there is a big difference between the actual prices and predicted prices. This is because in the real world history might not repeat itself. Also, with time new factors affecting the market come into play. Thus, making predictions in the real world scenario is not a great idea.

#Plotting the forecast v/s actual price
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
_ = data_all[['Price_Prediction','Price']].plot(ax=ax)
ax.set_xbound(lower='01-02-2021', upper='20-05-2021')
plot = plt.suptitle('Jan to May: Forecast vs Actual')

Now let us see the Mean Squared Error (MSE) and Mean Absolute Error (MAE) for our predictions.

#MSE
mean_squared_error(y_true=data_test['Price'],y_pred=data_test['Price_Prediction'])

MSE

#MAE
mean_absolute_error(y_true=data_test['Price'],y_pred=data_test['Price_Prediction'])

MAE

We are done with the Dogecoin price prediction. If you liked this tutorial, do hit the clap button!