Rethinking Feature Stores

Chang She
Chang She
Jul 25, 2019 · 6 min read

or, Let the Punching Begin

Feature engineering is a critical component in any machine learning system. As the basic input into a predictive model, the quality of features can make or break the overall performance of the model.

Feature engineering also takes a tremendous amount of work. If a new machine learning application requires one new feature, it probably means that the ML engineers discarded ten other features and tried ten variations of each candidate feature. Features have to be computed, versioned, backfilled, and shared. Finally the method for storing and accessing features will also fundamentally differ between offline vs online contexts.

This means that as more and more companies scale up their machine learning applications, they are finding it extremely difficult to manage this combinatorial explosion of needs. It’s become a big enough problem that many of the biggest tech companies have invested significant resources into creating specialized systems to manage model features.

Source: https://www.logicalclocks.com/feature-store/

Existing Systems

Michelangelo

Uber’s feature store (source: https://eng.uber.com/michelangelo/)

Zipline

Source: https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform

Feast

id: word.count
name: count
entity: word
owner: bob@feast.com
description: number of times the word appears
valueType: INT64
uri: https://github.com/bob/example

Hopsworks

Source: https://www.logicalclocks.com/feature-store/

So what’s the problem?

Having said that, I’ve found it difficult to think about these existing systems coherently. In general the focus has been “well we needed to satisfy requirement X so we implemented Y in this way” (e.g., Zipline’s backfilling, Michelangelo’s online store, etc). But I haven’t really seen a conceptual framework that truly helps us think about how all of these different pieces fit together. Without a coherent conceptual framework, the result seems to be that depending on what use case you decide to add or remove, the architecture to serve these use cases may look drastically different. For anyone trying to select or design a feature store, this is not very encouraging.

I think we can do better.

Be like Joe

Joe Armstrong talked about Erlang programs being structured as processes that generate messages to communicate with each other. This was in stark contrast to the traditional OO paradigm and a primary factor that makes Erlang unique and successful. I’d like to propose an analogous statement about features:

Features are data generating processes. Processes have input and generates output. And in different contexts, processes may have different concrete implementations.

In particular, let’s discuss several unique benefits of this mental model:

Versioning and provenance

Lazy evaluation

df = feature_store.get("user_view_time").materialize(start, end)

And not have to worry about what’s being fetched from database vs computed in memory. Moreover, we can also chain lazy features processes together into the pipeline and enable a feature compiler to optimize the underlying computations on the fly.

Online vs offline

df = (feature_store.get("user_hist")
.filter(user_id)
.materialize(start, end))

And the offline version would get executed as perhaps some Spark code snippet whereas the online version would add the user_id filter into the request body for an API call to a user-profile service that updates in real-time.

No data scientist is an island

But also be like Mike

Most importantly, this is a beginning rather than an end. We’re currently building out our next generation machine learning platform at Tubi. If you’re passionate about creating tools and products to enable ml-driven companies, come join us and build this platform with me from vision to execution.

And let the punching begin.

Feature Stores for ML

AI, Data, and everything in between

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store