The Overfitting Challenge in Blockchain Analysis

Machine learning models tend to overfit when used with blockchain datasets. What is overfitting and how to address it?

Published in

IntoTheBlock

7 min readJan 28, 2020

The idea of using machine learning to analyze blockchain datasets sounds incredibly attractive at first glance but it’s a road full of challenges. Among those challenges, the lack of labeled datasets remains by far the biggest hurtle to overcome when applying machine learning methods to blockchain datasets. These limitations cause a lot of machine learning models to operate with very small data samples for training and over-optimizing for those causing a phenomenon known as overfitting. Today, I would like to deep dive into the overfitting challenge in blockchain analysis and propose a few ideas to address it.

Overfitting is considered one of the biggest challenges in modern deep learning applications. Conceptually, overfitting occurs when a model generates a hypothesis that is too tailored to a specific dataset to the data making it impossible to adapt to new datasets. A useful analogy to understand overfitting is to think about it as hallucinations in the model. Essentially, a model hallucinates/overfit when it infers incorrect hypothesis from a dataset. A lot has been written about overfitting since the early days of machine learning so I…

The Overfitting Challenge in Blockchain Analysis

Machine learning models tend to overfit when used with blockchain datasets. What is overfitting and how to address it?

Written by Jesus Rodriguez