Crash Course in Forecasting: Predicting Bitcoin Price Using ARIMA

Junyang Luo
AI Skunks
Published in
7 min readApr 16, 2023

What is time series?

Time series is a sequence of data points collected at regular intervals over time. It is a statistical technique used to analyze and model data that varies with time. Time series data can be used to analyze past patterns and trends and to forecast future values.

Examples of time series data include stock prices, weather conditions, and monthly sales figures. In time series analysis, the goal is often to identify patterns or trends in the data, such as seasonality, trends, or cycles, and use this information to forecast future values.

Time series analysis techniques include moving averages, exponential smoothing, autoregression, and spectral analysis. These methods can be used to model the data, identify patterns, and make predictions about future values. Time series analysis is widely used in fields such as finance, economics, engineering, and environmental science.

The steps in Time Series forecasting are as follows:

Step 1 Visualize the Time Series

Step 2 Detect Missing Values and impute

Step 3 Test Training Splitting

Step 4 Calculate Time Series Statistics

* First Intuitions on (Weak) Stationarity

* Autocovariance function

* Gamma

* Autocovariance coefficients

Step 5 Moving Averages, MA(1) — One Step Back

Step 6 Moving Averages, MA(2) — Two Steps Back

Step 7 Moving Averages, MA(N) — N Steps Back

Step 8 — AIC, BIC, etc

Step 9 — ARMA Models

Step 10 — ARIMA Models

Step 11 — fbProphet Models

Step 12 — GreyKite Models

Step 13 — Neural Network Models

What is ARIMA?

ARIMA stands for Autoregressive Integrated Moving Average. It is a widely used time series modeling technique that combines the concepts of autoregression (AR) and moving average (MA) models with differencing. ARIMA models are commonly used for forecasting future values of a time series by analyzing its past behavior.

The “AR” in ARIMA refers to the autoregressive part of the model, which models the dependence of the current value on past values. The “MA” part refers to the moving average part of the model, which models the dependence of the current value on past error terms. The “I” in ARIMA stands for integration, which refers to differencing the time series to make it stationary (i.e., remove trends and seasonality).

ARIMA models are usually specified using three parameters: p, d, and q. The “p” parameter represents the number of autoregressive terms, the “d” parameter represents the number of times the time series is differenced to make it stationary, and the “q” parameter represents the number of moving average terms.

ARIMA models can be fitted using statistical software, such as R or Python, and can be used to make forecasts of future values of a time series. However, it is important to note that ARIMA models assume that the time series is stationary, which may not always be the case in practice.

What is bitcoin?

Bitcoin is the longest running and most well known cryptocurrency, first released as open source in 2009 by the anonymous Satoshi Nakamoto. Bitcoin serves as a decentralized medium of digital exchange, with transactions verified and recorded in a public distributed ledger (the blockchain) without the need for a trusted record keeping authority or central intermediary. Transaction blocks contain a SHA-256 cryptographic hash of previous transaction blocks, and are thus “chained” together, serving as an immutable record of all transactions that have ever occurred. As with any currency/commodity on the market, bitcoin trading and financial instruments soon followed public adoption of bitcoin and continue to grow.

First, we will import the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

we preprocess it by converting the ‘Timestamp’ column to a datetime object and setting it as the index of the dataframe. We will also rename the columns to ‘Date’, ‘Open’, ‘High’, ‘Low’, ‘Close’, and ‘Volume (BTC)’ for ease of use.

df = pd.read_csv('/content/sample_data/bitstampUSD_1-min_data_2012-01-01_to_2021-03-31.csv')
df = df.dropna()
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')
df = df.set_index('Timestamp')
df = df.resample('D').mean()

df = df[['Open', 'High', 'Low', 'Close', 'Volume_(BTC)']]
df = df.rename(columns={'Volume_(BTC)': 'Volume'})

Next, we will plot the Bitcoin closing prices to visualize the data.

plt.plot(df['Close'])
plt.title('Bitcoin Closing Prices')
plt.xlabel('Year')
plt.ylabel('Price')
plt.show()

To perform time series forecasting, we will use the ARIMA (Autoregressive Integrated Moving Average) model. The ARIMA model is a popular method for time series forecasting that uses past values of a variable to predict future values.

First, we will split the dataset into training and testing sets. We will use the first 80% of the data for training and the remaining 20% for testing.

train_size = int(len(df) * 0.8)
train, test = df[:train_size], df[train_size:]

Next, we will fit the ARIMA model on the training data and make predictions on the testing data. We will use the mean squared error (MSE) to evaluate the performance of our model.

history = [x for x in train['Close']]
predictions = []
for t in range(len(test)):
model = ARIMA(history, order=(5,1,0))
model_fit = model.fit()
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test['Close'][t]
history.append(obs)
print('predicted=%f, expected=%f' % (yhat, obs))
error = mean_squared_error(test['Close'], predictions)
print('Test MSE: %.3f' % error)

Finally, we will plot the predicted values against the actual values to visualize the accuracy of our model.

plt.plot(test.index, test['Close'], label='Actual')
plt.plot(test.index, predictions, color='red', label='Predicted')
plt.title('Bitcoin Closing Prices')
plt.xlabel('Year')
plt.ylabel('Price')
plt.legend()
plt.show()

Questions

**1. What is a time series?**

* A time series is a sequence of data points collected at regular intervals over time.

**2. What are some common applications of time series analysis?**

* Time series analysis is widely used in fields such as finance, economics, engineering, and environmental science for forecasting, trend analysis, and anomaly detection.

**3. What is stationarity in time series analysis?**

* Stationarity is a property of a time series in which statistical properties such as mean, variance, and covariance are constant over time. Stationarity is an important assumption for many time series analysis techniques.

**4. What is the difference between trend and seasonality in time series analysis?**

* Trend refers to a long-term pattern or direction of a time series, while seasonality refers to a repeating pattern that occurs over a fixed period, such as a year or a month.

**5. What is autocorrelation in time series analysis?**

* Autocorrelation is a measure of the correlation between a time series and a lagged version of itself. It is used to identify patterns in the time series that are related to its own past values.

**6. What is a moving average in time series analysis?**

* A moving average is a technique used to smooth out fluctuations in a time series by calculating the average of a rolling window of observations. It is often used to identify trends and remove noise from the data.

**7. What is the difference between AR and MA models in time series analysis?**

* AR (autoregressive) models are used to model the dependence of the current value on past values, while MA (moving average) models are used to model the dependence of the current value on past error terms.

**8. What is an ARIMA model in time series analysis?**

* ARIMA (Autoregressive Integrated Moving Average) models are a type of time series model that combines autoregression, moving average, and differencing to model time series data.

**9. What is the purpose of forecasting in time series analysis?**

* The purpose of forecasting in time series analysis is to predict future values of a time series based on its past behavior.

**10. What are some common evaluation metrics for time series forecasting models?**

* Common evaluation metrics for time series forecasting models include mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE). These metrics measure the accuracy of the forecasted values compared to the actual values.

License

All code in this notebook is available as open source through the MIT license.

All text and images are free to use under the Creative Commons Attribution 3.0 license. https://creativecommons.org/licenses/by/3.0/us/

These licenses let people distribute, remix, tweak, and build upon the work, even commercially, as long as they give credit for the original creation.

Copyright 2023 AI Skunks https://github.com/aiskunks

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

--

--