Common Feature Store Workflow with Feast
--
Feature stores are a critical piece of Machine Learning infrastructure within the company. In DKatalis, we’ve come to rely on Feast as our feature store implementation.
One of the core benefits of having a feature store is for Data Scientists and Data Analysts alike to share features, thereby saving time having to recreate features.
We’ve come to realize that this is easier said than done!
As we scale up the team, we’ve come to a point where clear processes need to be defined in order to add guard rails and rein in potential chaos.
While Feast is great at feature definition and retrieval, there are still many core issues (that may not necessarily be a Feast problem) that need to be resolved in order for Feast to be used effectively.
Think of a tool like Git. Using Git as a single user is simple, git add
, git commit
, and git push
all you want to master
and everything will be fine. However, when working in larger teams, this certainly will not fly. Similarly, one of the things that we had to figure out was how to effectively allow all the Data Scientists to collaborate on a single, common feature store.
Similarly, one of the things that we had to figure out was how to effectively allow all the Data Scientists to collaborate on a single, common feature store.
There hasn’t been much written about best practices surrounding effective collaboration around a single, common feature store, so this post is meant to start the conversation going and provide some inspiration. Certainly, I’m under no illusions that this should be a “best practice”, but this is what seems to work for us for now. So as usual, YMMV.
Brief Outline
If you think hard about implementing a common feature store, you’d realize that the biggest problems are not the software, but rather the process. Overall, this means defining a feature creation lifecycle. If you unpack this, it includes defining:
- A proper folder structure
- Naming conventions and tagging
- Feature versioning