šŸš€ Dogecoin Price Prediction using LSTM network

Sidharth Pandita
hackerdawn
Published in
6 min readMay 21, 2021
Photo by Clay Banks on Unsplash

Dogecoin (DOGE) is a cryptocurrency created as a joke, making fun of the wild speculation in cryptocurrencies at the time. Dogecoin features the face of the Shiba Inu dog from the ā€œDogeā€ meme as its logo. It has quickly developed its own online community, reaching a gigantic market capitalization. In this tutorial, we will predict the prices of Dogecoin. For this purpose, weā€™ll use the Dogecoin Historical Data from Kaggle.

Importing Libraries

Letā€™s first import the required libraries. If you donā€™t have a particular library installed, run the command ā€˜pip install <package_name>ā€™ to install it.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Activation
from sklearn.metrics import mean_squared_error, mean_absolute_error

Loading the Dataset

Weā€™ll have downloaded the dataset from Kaggle. Letā€™s now load it and print its head to see how it looks like.

doge = pd.read_csv("../Kaggle/dogecoin-historical-data/DOGE-USD.csv")doge.head()
Head of the Dataframe

Let us see the shape of the dataset. The shape turn-outs to be (2438, 7)

doge.shape
Shape of the Dataset

We will use the info() function to see the column-wise non-null counts and data types.

doge.info()
Dataframe Info

Letā€™s see the total count and percentage of null values in each column. As you can see in the output, all the columns except ā€˜Dateā€™ contain 4 null values.

total = doge.isnull().sum().sort_values(ascending=False)percent = (doge.isnull().sum()/doge.isnull().count()).sort_values(ascending=False)missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])missing_data
Column-wise total null count and Percentage

Data Visualization

We will use a heatmap to see the correlation between different features in the dataset. The higher the number in the box, the higher is the correlation.

#Plotting correlation
plt.figure(figsize=(7,7))
corr=doge[doge.columns[1:]].corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(doge[doge.columns[1:]].corr(), mask=mask, vmax=.3, center=0,
square=True, linewidths=.5,annot=True)
plt.show()
Correlation Heatmap

Weā€˜llā€™ use a violin plot to see how the distribution looks like for different columns in the dataset. In seaborn, a violin plot can be created using the violinplot function.

fig,axes = plt.subplots(3,2,figsize = (10,10))
fig.suptitle("Count plot for categorical features")
#Open
sns.violinplot(ax=axes[0,0],data=doge,x='Open',color='#4e89ae')
#High
sns.violinplot(ax=axes[0,1],data=doge,x='High',color='#c56183')
#Low
sns.violinplot(ax=axes[1,0],data=doge,x='Low',color='#ff8000')
#Close
sns.violinplot(ax=axes[1,1],data=doge,x='Close',color='#ffd700')
#Adj Close
sns.violinplot(ax=axes[2,0],data=doge,x='Adj Close',color='#7cfC00')
#Volume
sns.violinplot(ax=axes[2,1],data=doge,x='Volume',color='#00FFFF')
Countplots for different features

Preparing the Data

We will take care of the nulls first. Letā€™s fill the nulls in the discrete column with 0ā€™s and nulls in continuous columns using the method ā€˜ffillā€™. ā€˜ffillā€™ stands for ā€˜forward fillā€™ and will propagate the last valid observation forward.

doge['Volume'].fillna(value=0, inplace=True)
doge['Open'].fillna(method='ffill', inplace=True)
doge['High'].fillna(method='ffill', inplace=True)
doge['Low'].fillna(method='ffill', inplace=True)
doge['Close'].fillna(method='ffill', inplace=True)
doge['Adj Close'].fillna(method='ffill', inplace=True)

Weā€™ll convert the data type of ā€˜Dateā€™ column to datetime. We will also rename the column ā€˜Closeā€™ to ā€˜Priceā€™ for better comprehension.

doge['Date'] = pd.to_datetime(doge['Date'])
doge.rename(columns = {'Close':'Price'}, inplace = True)

We will apply the dt.tz_localize method on the ā€˜Dateā€™ column. This method takes a time zone (tz) naive Datetime Array object and makes this time zone aware. Time zone localization helps to switch from time zone-aware to time zone-unaware objects.

We will also set the dataframeā€™s index to the ā€˜Dateā€™ column. Then weā€™ll set doge to the column ā€˜Priceā€™.

doge['Date'] = doge['Date'].dt.tz_localize(None)
doge = doge.set_index('Date')
doge = doge[['Price']]

Letā€™s split the data between train and test.

#Splitting data
split_date = '2018-06-25'
data_train = doge.loc[doge.index <= split_date].copy()
data_test = doge.loc[doge.index > split_date].copy()

Letā€™s preprocess the data to make it ready for feeding into the model.

#Data preprocessing
training_set = data_train.values
training_set = np.reshape(training_set, (len(training_set), 1))
sc = MinMaxScaler()
training_set = sc.fit_transform(training_set)
X_train = training_set[0:len(training_set)-1]
y_train = training_set[1:len(training_set)]
X_train = np.reshape(X_train, (len(X_train), 1, 1))

Weā€™ll create a plot to see the historical price of Dogecoin.

#Historical price
_ = doge.plot(style='', figsize=(15,5), color="#F8766D", title='DOGE Price (USD) by Days')
Dogecoin Historical Price

Letā€™s create a plot to see how our test and training sets look like.

#Plotting the test and training sets
_ = data_test.rename(columns={'Price': 'Test Set'}).join(data_train.rename(columns={'Price': 'Training Set'}), how='outer').plot(figsize=(15,5), title='Test & Training Sets', style='')
Plot denoting Test & Training Sets

Creating the Model

Itā€™s time to create the model. We will use LSTM for this purpose. Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.

#Creating the model
model = Sequential()
model.add(LSTM(128,activation="sigmoid",input_shape=(1,1)))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=100, batch_size=50, verbose=2)
Epochs getting completed (Truncated)

Letā€™s see how our model looks like by using the summary() method.

model.summary()
Summary of the model

Predicting the Prices

We will feed the test data into the model after reshaping and transforming it. The predictions from the model will be stored in the predicted_DOGE_price variable.

#Making the predictions
test_set = data_test.values
inputs = np.reshape(test_set, (len(test_set), 1))
inputs = sc.transform(inputs)
inputs = np.reshape(inputs, (len(inputs), 1, 1))
predicted_DOGE_price = model.predict(inputs)
predicted_DOGE_price = sc.inverse_transform(predicted_DOGE_price)

Letā€™s create a dataframe data_all to store the ā€˜Priceā€™ and ā€˜Predicted_Priceā€™.

data_test['Price_Prediction'] = predicted_DOGE_price
data_all = pd.concat([data_test, data_train], sort=False)

Let us see how our predictions compare to the actual prices. The blue line shows the actual prices and the yellow line shows the predicted prices.

_ = data_all[['Price','Price_Prediction']].plot(figsize=(15, 5))
Actual Price v/s Predicted Price

Letā€™s zoom in on the graph for the Jan to May months of the year 2021. We can clearly see there is a big difference between the actual prices and predicted prices. This is because in the real world history might not repeat itself. Also, with time new factors affecting the market come into play. Thus, making predictions in the real world scenario is not a great idea.

#Plotting the forecast v/s actual price
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
_ = data_all[['Price_Prediction','Price']].plot(ax=ax)
ax.set_xbound(lower='01-02-2021', upper='20-05-2021')
plot = plt.suptitle('Jan to May: Forecast vs Actual')
Zooming in for the months Jan-May

Now let us see the Mean Squared Error (MSE) and Mean Absolute Error (MAE) for our predictions.

#MSE
mean_squared_error(y_true=data_test['Price'],y_pred=data_test['Price_Prediction'])
MSE
#MAE
mean_absolute_error(y_true=data_test['Price'],y_pred=data_test['Price_Prediction'])
MAE

We are done with the Dogecoin price prediction. If you liked this tutorial, do hit the clap button!

--

--