02# Kaggle: Challenge Crypto Forecasting with Machine Learning

Andre Vianna

Published in

My Data Science Journey

6 min readNov 20, 2021

Kaggle Competitions — Use your ML expertise to predict real crypto market data

Description

Over $40 billion worth of cryptocurrencies are traded every day. They are among the most popular assets for speculation and investment, yet have proven wildly volatile. Fast-fluctuating prices have made millionaires of a lucky few, and delivered crushing losses to others. Could some of these price movements have been predicted in advance?

In this competition, you’ll use your machine learning expertise to forecast short term returns in 14 popular cryptocurrencies. We have amassed a dataset of millions of rows of high-frequency market data dating back to 2018 which you can use to build your model. Once the submission deadline has passed, your final score will be calculated over the following 3 months using live crypto data as it is collected.

The simultaneous activity of thousands of traders ensures that most signals will be transitory, persistent alpha will be exceptionally difficult to find, and the danger of overfitting will be considerable. In addition, since 2018, interest in the cryptomarket has exploded, so the volatility and correlation structure in our data are likely to be highly non-stationary. The successful contestant will pay careful attention to these considerations, and in the process gain valuable insight into the art and science of financial forecasting.

G-Research is Europe’s leading quantitative finance research firm. We have long explored the extent of market prediction possibilities, making use of machine learning, big data, and some of the most advanced technology available. Specializing in data science and AI education for workforces, Cambridge Spark is partnering with G-Research for this competition. Watch our introduction to the competition below:

Evaluation

Submissions are evaluated on a weighted version of the Pearson correlation coefficient. You can find additional details in the ‘Prediction Details and Evaluation’ section of this tutorial notebook.

You must submit to this competition using the provided python time-series API, which ensures that models do not peek forward in time. To use the API, follow this template in Kaggle Notebooks:

import gresearch_crypto
env = gresearch_crypto.make_env()   # initialize the environment
iter_test = env.iter_test()    # an iterator which loops over the test set and sample submission
for (test_df, sample_prediction_df) in iter_test:
    sample_prediction_df['Target'] = 0  # make your predictions here
    env.predict(sample_prediction_df)   # register your predictions

A more detailed introduction to the API is available here.

You will get an error if you submission includes nulls or infinities.

Timeline

This is a forecasting competition with an active training phase and a second period where models will be run against real market data.

Training Timeline

November 2, 2021 — Start Date
January 25, 2022 — Entry deadline. You must accept the competition rules before this date in order to compete.
January 25, 2022 — Team Merger deadline. This is the last day participants may join or merge teams.
February 1, 2022 — Final submission deadline.

All deadlines are at 11:59 PM UTC on the corresponding day unless otherwise noted. The competition organizers reserve the right to update the contest timeline if they deem it necessary.

Forecasting Timeline:

Starting after the final submission deadline there will be periodic updates to the leaderboard to reflect market data updates that will be run against selected notebooks. Updates will take place roughly every two weeks.

May 3, 2022 — Competition End Date — Winner’s announcement

Prizes

1st Place — $50,000
2nd Place — $20,000
3rd Place — $15,000
4th Place — $10,000
5th Place — $5,000
6th Place — $5,000
7th Place — $5,000
8th Place — $5,000
9th Place — $5,000
10th Place — $5,000

Code Requirement

This is a Code Competition

Submissions to this competition must be made through Notebooks. In order for the “Submit” button to be active after a commit, the following conditions must be met:

CPU Notebook <= 9 hours run-time
GPU Notebook <= 9 hours run-time
Internet access disabled
Freely & publicly available external data is allowed, including pre-trained models
Submission file must be named submission.csv

Please see the Code Competition FAQ for more information on how to submit. Review the code debugging doc if you are encountering submission errors.

Dataset

G-Research Crypto Forecasting

Use your ML expertise to predict real crypto market data

www.kaggle.com

This dataset contains information on historic trades for several cryptoassets, such as Bitcoin and Ethereum. Your challenge is to predict their future returns.

As historic cryptocurrency prices are not confidential this will be a forecasting competition using the time series API. Furthermore the public leaderboard targets are publicly available and are provided as part of the competition dataset. Expect to see many people submitting perfect submissions for fun. Accordingly, THE PUBLIC LEADERBOARD FOR THIS COMPETITION IS NOT MEANINGFUL and is only provided as a convenience for anyone who wants to test their code. The final private leaderboard will be determined using real.