5 Minimum Requirements of an Operational Feature Store

Ben Epstein
Feature Stores for ML
6 min readJul 14, 2020

By Ben Epstein, Sergio Ferragut, and Monte Zweben

Source: Adobe Stock Ribkhan

I’ve spent the last few months thinking heavily about feature stores. It’s the hottest new buzz word in the ML space, and everyone has a distinct implementation laser-focused on their personal use cases.

A recent article¹ that I read talked about this exact topic and did a great job summarizing the fundamental problem: these implementations don’t create a general purpose, conceptual framework for what a feature store is, rather focusing on the outcomes of their particular use cases. If we forget what we’ve read about these implementations and rethink this from the ground up, we may be able to design a general purpose feature store that works for any use-case.

What is a feature store? A feature store is a shareable repository of predictive features, both complex and simple, for use in near real-time machine learning and business intelligence. As you can see, this is a broad and general definition. That’s because a feature store has a plethora of use cases, all of which should be possible if architected correctly. So let’s start from the ground up and list out some of the minimal requirements.

Shareable

A fundamental premise of the feature store is that it must be shareable across an organization; features must be accessible by any team that needs them. This is what allows for the reuse of complex features that may take weeks or months to develop. The idea of shared features is so prevalent that Twitter coined the metric “Sharing Adoption: The number of teams who use another team’s features in production.” Having a single repository of features that data scientists can search and reuse to help solve their problems is crucial to their productivity.

If you have a fairly mature data science organization, you will likely have hundreds or thousands of features, with potentially millions of records. Easily searching through those features, whether that be through SQL or a dataframe-like API, is a must-have for data scientists to be successful.

Transparent

In order for a feature store to be trusted, the origin and implementation of each feature must be available for investigation. To achieve this, the…

Ben Epstein
Feature Stores for ML

Founding Engineer @ Galileo. Working on data-centric AI and looking for new hikes.