Harnessing SageMaker DeepAR+ for Bitcoin Price Predictions
7 min readJul 14, 2024
This tutorial is part 3 of Machine Learning on AWS Series. You can see the previous tutorial Here
Pre-Requisites
- Basic python knowledge
- An AWS account (If you don’t have one go to aws.amazon.com and sign up for a free account)
- Basic AWS knowledge (Optional but recommended)
Project Overview
In this project we will use SageMaker to download historical bitcoin price data and then use DeepAR+ algorithm (built in to SageMaker) to make predictions about the bitcoin price.
- SageMaker: Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning models quickly. It offers a comprehensive set of tools and features to simplify each step of the ML workflow, including data preparation, model training, and model deployment.
- DeepAR+: Amazon SageMaker DeepAR+ is a forecasting algorithm designed to handle large datasets of time series data. It uses deep learning to provide accurate forecasts by modeling complex patterns and trends across multiple time series, making it suitable for various applications such as finance, retail, and supply chain management.
Setup S3 Bucket
We need to create an S3 bucket which will hold our training data and the trained model.
- Navigate to S3 in AWS Management Console.
- Click on the “Create bucket” button.
- Enter a unique bucket name like
bitcoin-price-predictor-data
(bucket names must be unique across all existing bucket names in Amazon S3).
Setup IAM Role
Our SageMaker notebook instance will require permissions to access SageMaker and S3 resources.
- Navigate to IAM in AWS Management Console.
- Create a new Role and attach S3FullAccess and SageMakerFullAccess policies to it. (In a production setting we would restrict these to just the bare minimum permission needed)
Create SageMaker Notebook
- Navigate to SageMaker in AWS Management Console
- Click “Create notebook instance”
- Give the notebook a name like ‘BitcoinPricePredictorNotebook’
- Assign it the IAM role previously created
- Rest of the options can be left at default settings
- Click ‘Create notebook instance’ again
- Once the notebook is created you can “Start” it and click “Open Jupyter”
Download Training Data
- Create a new Code cell in Jupyter notebook and install the following dependencies
!pip install yfinance pandas boto3
- We will be using the great yfinance library which lets us download historical stock information.
- Next we will download historical Bitcoin data from the start of the year 2015 till 2024–07–11
import yfinance as yf
import pandas as pd
import json
# Download historical Bitcoin data
btc_data = yf.download('BTC-USD', start='2015-01-01', end='2024-07-11')
print(btc_data.head())
- The downloaded data should look something like this
Open High Low Close Adj Close \
Date
2015-01-01 320.434998 320.434998 314.002991 314.248993 314.248993
2015-01-02 314.079010 315.838989 313.565002 315.032013 315.032013
2015-01-03 314.846008 315.149994 281.082001 281.082001 281.082001
2015-01-04 281.145996 287.230011 257.612000 264.195007 264.195007
2015-01-05 265.084015 278.341003 265.084015 274.473999 274.473999
Volume
Date
2015-01-01 8036550
2015-01-02 7860650
2015-01-03 33054400
2015-01-04 55629100
2015-01-05 43962800
Transform Training Data
- Next we need to convert the data into a JSON format expected by DeepAR+ with the start date of the first reading and an array of closing costs since then (since we are interested in learning the patterns of and predicting future closing costs)
# Prepare the data
btc_data.reset_index(inplace=True)
btc_data['Date'] = btc_data['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
# Convert the data to the required JSON format
data = {
"start": btc_data['Date'].iloc[0],
"target": btc_data['Close'].tolist()
}
# Save to JSON file (each object should be in a new line)
with open('bitcoin_historical_data.json', 'w') as f:
f.write(json.dumps(data) + '\n')
# Verify the first few lines of the JSON file
with open('bitcoin_historical_data.json', 'r') as f:
for _ in range(5):
print(f.readline())
- The code above should result in a JSON that looks like this
{"start": "2015-01-01", "target": [314.2489929199219, 315.0320129394531, 281.0820007324219, 264.19500732421875, 274.4739990234375, 286.1889953613281, 294.3370056152344, 283.3489990234375, 290.4079895019531, 274.7959899902344, 265.6600036621094, 267.7959899902344, 225.86099243164062, 178.10299682617188, 209.843994140625, 208.0970001220703, 199.25999450683594, 210.33900451660156, 214.86099243164062, 211.31500244140625, ...
Upload Train data to S3
- The training data should be saved in S3 so it does not need to be downloaded for future training sessions (replace the S3 bucket name with the name of the bucket you created).
import boto3
# Initialize S3 client
s3 = boto3.client('s3')
# Define your bucket name and the prefix (folder path)
bucket_name = 'bitcoin-price-predictor-data'
prefix = 'deepar-bitcoin-forecast/train'
# Upload the JSON data
s3.upload_file('bitcoin_historical_data.json', bucket_name, f'{prefix}/bitcoin_historical_data.json')
print(f'Data uploaded to s3://{bucket_name}/{prefix}/bitcoin_historical_data.json')
- Navigate to your S3 bucket and verify that the JSON file has been uploaded succesfully
Train DeepAR+ Model
- With all this setup we are now finally ready to train the model. Use the following code (but replace the bucket name), to grab the forecasting-deepar container image and train it on the data previously saved in S3. Training will approximately take 5–10 minutes.
import sagemaker
from sagemaker import get_execution_role
from sagemaker.image_uris import retrieve
# Initialize SageMaker session and role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
# Define the S3 bucket and prefix
bucket = 'bitcoin-price-predictor-data'
prefix = 'deepar-bitcoin-forecast'
# Get the DeepAR+ container image
image_uri = retrieve('forecasting-deepar', sagemaker_session.boto_region_name)
# Define the DeepAR+ estimator
estimator = sagemaker.estimator.Estimator(
image_uri=image_uri,
role=role,
instance_count=1,
instance_type='ml.c4.xlarge',
volume_size=5,
max_run=3600,
input_mode='File',
output_path=f's3://{bucket}/{prefix}/output',
sagemaker_session=sagemaker_session
)
# Set hyperparameters
estimator.set_hyperparameters(
time_freq='D', # daily frequency
epochs=50,
prediction_length=30, # predict next 30 days
context_length=30,
num_layers=2,
num_cells=40,
likelihood='gaussian',
mini_batch_size=64,
learning_rate=0.001,
dropout_rate=0.05,
)
# Define the input data format
train_data = sagemaker.inputs.TrainingInput(f's3://{bucket}/{prefix}/train/bitcoin_historical_data.json', content_type='json')
# Fit the model
estimator.fit({'train': train_data})
- Verify the training logs did not throw an error and that a model file now exists in your S3 bucket
Deploy SageMaker Endpoint
- To make inference against the trained model SageMaker gives us the ability to deploy it to an endpoint which can be sent inference requests to. Use the following code to do that (once again verify the S3 paths are correct)
import sagemaker
from sagemaker import get_execution_role
# Initialize SageMaker session
sagemaker_session = sagemaker.Session()
role = get_execution_role()
model_key = 'deepar-bitcoin-forecast/output/forecasting-deepar-2024-07-13-04-06-28-228/output/model.tar.gz'
# Define the model artifacts
model_artifact = f's3://bitcoin-price-predictor-data/deepar-bitcoin-forecast/output/forecasting-deepar-2024-07-13-04-06-28-228/output/model.tar.gz'
# Create the model
from sagemaker.model import Model
deepar_model = Model(model_data=model_artifact,
role=role,
image_uri=sagemaker.image_uris.retrieve("forecasting-deepar", sagemaker_session.boto_region_name))
# Deploy the model to an endpoint
predictor = deepar_model.deploy(initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='deepar-bitcoin-endpoint')
Infer Future Bitcoin Prices
- Finally we are ready to start predicting future bitcoin prices. In the following code we invoke the deployed enpoint to fetch prediction for the future 30 days
import boto3
import json
import pandas as pd
import matplotlib.pyplot as plt
# Use the specified endpoint name
endpoint_name = 'deepar-bitcoin-endpoint'
# Create a SageMaker runtime client
runtime_client = boto3.client('runtime.sagemaker')
# Prepare the input data for prediction (recent data)
recent_data = btc_data['Close'].tolist()[-30:]
# Ensure the index is in datetime format
btc_data.index = pd.to_datetime(btc_data.index)
# Format the data as required by DeepAR
input_data = {
"instances": [
{
"start": str(btc_data.index[-30].date()), # Start date of the recent data
"target": recent_data # Recent data
}
]
}
# Convert input data to JSON format
payload = json.dumps(input_data)
# Make the prediction
response = runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='application/json',
Body=payload
)
# Parse the response
result = json.loads(response['Body'].read().decode())
# Print the entire response to understand its structure
print("Full response from the endpoint:")
print(json.dumps(result, indent=4))
# Extract predictions based on the actual response structure
if 'predictions' in result and len(result['predictions']) > 0:
if 'quantiles' in result['predictions'][0]:
predictions = result['predictions'][0]['quantiles']['0.5'] # Median prediction
elif 'mean' in result['predictions'][0]:
predictions = result['predictions'][0]['mean']
else:
predictions = []
else:
predictions = []
# Print the predictions
print("Predicted prices for the next 30 days:")
for i, prediction in enumerate(predictions):
print(f"Day {i+1}: {prediction}")
# Convert predictions to a DataFrame for better visualization
if predictions:
prediction_dates = pd.date_range(start=btc_data.index[-1] + pd.Timedelta(days=1), periods=len(predictions), freq='D')
prediction_df = pd.DataFrame(data={'Date': prediction_dates, 'Predicted Close': predictions})
# Plot the historical and predicted prices
plt.figure(figsize=(14, 7))
plt.plot(btc_data['Close'], label='Historical Prices')
plt.plot(prediction_df['Date'], prediction_df['Predicted Close'], label='Predicted Prices', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Bitcoin Price Prediction')
plt.legend()
plt.show()
else:
print("No predictions were made. Please check the response structure.")
- I got the following predictions for the next 30 days
Predicted prices for the next 30 days:
Day 1: 52501.43359375
Day 2: 54837.73046875
Day 3: 56198.109375
Day 4: 56217.41015625
Day 5: 55887.828125
Day 6: 56257.8046875
Day 7: 56906.1796875
Day 8: 56187.9453125
Day 9: 56758.234375
Day 10: 56161.765625
Day 11: 54751.31640625
Day 12: 54658.140625
Day 13: 54198.4140625
Day 14: 55457.953125
Day 15: 53713.32421875
Day 16: 54059.24609375
Day 17: 53099.015625
Day 18: 52433.3203125
Day 19: 51978.546875
Day 20: 50440.765625
Day 21: 50154.13671875
Day 22: 50394.96484375
Day 23: 49662.32421875
Day 24: 49925.25
Day 25: 48535.6015625
Day 26: 48221.78515625
Day 27: 50349.7734375
Day 28: 50746.55078125
Day 29: 50819.0
Day 30: 51720.92578125
Title Background: “200520” by takawohttp://openprocessing.org/sketch/857874License CreativeCommons Attribution NonCommercial ShareAlikehttps://creativecommons.org/licenses/by-nc-sa/3.0