ML Platform User Experience: Build for Data Scientists
I’ve been part of building large scale ML Platform twice over multiple iterations in a cross functional capacity. Here are 5 key principles to follow for providing a seamless user experience to your data scientists.
Abstraction
Data Scientists do not need to know where data is stored, how it is pulled or how to deploy containers on K8’s. Provide SDK/Libraries to make it easy for them to create, configure & deploy their experiments on your platform.
Interactive Experimentation
Data Scientists need a space to test their code, look at data and attributes before running a full training cycle. Provide Jupyter notebooks to interactively work with data & model algorithms.
Flexibility
Provide the data scientists the capability to author & run experiments in the framework of their choice. At the very least provide first-class citizen support for TensorFlow and PyTorch.
Research to Production
Create a pipeline to easily promote model algorithm (eg. neural network) and related business logic (pre/post-processing) to production. Provide a CI/CD pipeline to package the data scientists code behind an API & deploy as a container.
Visibility
Capture metrics from the model prediction workflows and provide way to visualize the model behavior (eg. compute SHAP values). Data scientists need a feedback loop to measure how the model behaves in production and improve iteratively.