What to consider before going the machine learning route
With Machine Learning being a hot topic or buzz word, teams are quickly jumping to building features around it.
However, jumping too quickly to that decision can be troublesome in the long run.
Before you decide if machine learning is the answer to all your problems, first you should:
- Focus on the “what”
- Then figure out the “how”
Focusing on what the problem is or what exactly the feature you want to build will help impact the “how”. Sometimes features can start with simple engineering solutions that do not involve the time or resources required for machine learning which then could eventually lead to using machine learning in the long run once the feature or platform is more established.
Once you decide on the “what”, figuring out the how is the fun part.
If machine learning is the answer to your problem then first consider the data you will need to feed your model. Do you have enough relevant data to train and evaluate the model? If the answer is yes, then ensure that you have an expert or data labeler that understands the feature set. If that is established then the next key detail is to build out a stable platform that persists the data. More often then not, teams will start building models and realize months later that the data they were using is not stable or accurate or they were interpreting the data incorrectly.
It’s also important to acknowledge that data fluctuates. Especially when using data that describes usage events for a particular feature or platform from a customer point-of-view. For example, for content discovery platforms customers typically watch movies on the weekend and TV shows during the week. Or usage varies depending on day of the week or holidays, school calendars, etc. Models need to be retrained at a certain interval. For that reason, maintaining relevant data that is persisted and easily accessible is crucial to the success of the platform.
Regarding re-training models, some decide to train their model on a daily basis with the latest data that is persisted in their system in an effort to avoid stale predictions. Which of course, is another reason why the data layer is key to figuring out how to effectively solve a problem or solutionize a feature utilizing machine learning.
After realizing you have a stable data platform with resources that understand what the data represents, here are a few high-level questions to consider as you go down the route of engineering a machine learning platform:
- How easy is it to iterate on the model or make improvements? If you have a data scientist who is working on the model and then handing it over to the engineering team to build out; will the data scientist have easy access to fresh data to experiment or make tweaks to the model?
- Once your model is enabled in production, what type of performance metrics will you have to ensure quality of those predictions?
- Will the platform itself allow for online experimentation? i.e. A/B Testing
- If for some reason there is a failure to load or surface predictions, what would the fall back logic be?
