How We Built a Machine Learning Platform in 6 Months
Our journey in using Segment and GCP to build a scalable platform that helps us achieve our company’s mission.
Norrøna’s mission is to create the greatest outdoor products. We design and produce physical products to satisfy the most extreme users — we are not your typical data science company.
In 2029 Norrøna will be 100 years. By then (we think), machine learning (ML) will be a fundamental part of our company, contributing to better customer experiences and higher quality products.
This article describes how we built a Google Cloud machine learning platform that allows us to realize some short-term use cases and brings us closer to our future vision. The complete platform, from data collection to serving machine learning predictions, took us about 6 months to build.
With the article we hope to inspire and show that ML use cases are very much in reach, also for small- and medium sized companies.
The fewer moving parts, the better
Our ambition was to build a platform that would be relatively future proof. It should scale in terms of number of different ML scenarios and accommodate the likely increasing volume of queries to the endpoints that would serve the algorithmic output to the organization. Lastly, we wanted the solution to be as simple as possible. The fewer moving parts, the better.
A separate environment for testing
As companies become more data (read algorithm) driven, the ability to test new solutions without affecting production becomes hugely important. Traditionally though, establishing a test environment has not been high on the data scientists’ priority list (this is at least true for the cowboy-data-scientists we have been hanging out with). Also here, data science has plenty to learn from software development.
Early on we opted to build two separate environments, we call them creatively development and production. These are in practice two identical projects in GCP and they allow us to test a complete ML solution without affecting any production processes.
Deployment of code to the two projects is orchestrated by Jenkins which is running on a third project in our GCP organization: infrastructure. A commit to our development Github repository triggers a deployment by Jenkins via a webhook. Similarly, a code merge to production repository deploys a new version of our ML platform to production.
Jenkins is also used for other tasks, such as copying data from data warehouse to development and for scheduling Google AI Platform jobs, but more about that later.
BigQuery as the main data platform
The majority of Norrøna’s data assets are stored in Google BigQuery (yet in another GCP project). This data supports not only our ML work but also enables a number of business intelligence solutions.
Norrøna uses Segment for data collection and master data management. Segment offers out-of-the-box and close-enough-to-real-time tracking of user interactions on both client and server side and assigns a customer ID, either identifiable or anonymous depending on GDPR consent, to each customer.
The largest source of data in Norrøna is our e-commerce platform accompanied by several, in-house built, digital products, such as the loyalty, product return and pro user platforms. Data from these solutions is enriched with information from our brick and mortar stores and the enterprise resource planning system. All the data combined offers numerous possibilities for ML use cases.
We tend to do data transformations and feature engineering directly in BigQuery (mainly due to personal preferences towards SQL). The nearly final ML training data assets are constructed as BigQuery views on top of live production data. The benefit of this approach is that the training data is always fresh and the data is easy to find by anyone in the organization. Same goes for the output from the ML models, everything is stored in one place.
Finally, while our ML production environment uses data directly through views against live data, the development environment is kept isolated. The development data is, however, refreshed regularly by a scheduled Jenkins process that executes GCP command-line commands to copy data between the environments.
Integrations are everything
The harsh truth about ML is that data integrations are everything. An advanced algorithm in a Python notebook alone is not going to bring your company any value. The fact that the actual “science” part of data science is also becoming easier to develop (e.g.: Google Cloud AutoML & BigQuery ML) is further increasing the relative importance of engineering in machine learning systems.
That said, machine learning algorithms naturally deserve a mention. Our initial ML use cases are e-commerce and customer communication related mainly due to the tangible short-term business effects and good data availability.
Our programming language of choice has been Python. The training data is read from BigQuery and the code can be run either locally or submitted to be run in Google AI Platform. The algorithms themselves have so far been relatively simple to ensure easy implementation and transparency towards stakeholders, which in turn builds organization’s trust to the system. It is still fairly feasible to explain variations of collaborative filtering algorithm to a management group, the same can not be said about some more complex models.
The models are scheduled to train regularly in Google AI Platform by Jenkins. For example, we could have a training job that determines the similarity between two products based on the number of customers that have interacted with the particular pair of products.
These training jobs generate prediction outputs in the form of tables, which are stored both in BigQuery and in Google Datastore.
Serving model predictions at scale
One of the key functions of our platform is its ability to serve model predictions to whatever system wishes to use them. We designed a set of applications which function is to expose model predictions to other systems via an API. These applications run on Google’s App Engine product, which is highly scalable and allow us to serve a large number of predictions consistently across lots of destination systems.
One particularly successful implementation of this is in-mail product recommendation: our email tool catches via a GET method a data feed from one of these applications, and every time we send an email where product recommendation is relevant, it uses this feed to populate a placeholder within the email template. So far we have seen significant improvement on click rate on these emails.
Performance monitoring — closing the loop
The actual performance of the machine learning models is monitored on case by case basis. The use cases can vary so much that there is no universal solution for performance monitoring. In general we use Power BI accompanied with data in BigQuery for monitoring and visualization of results.
When it comes to product recommendation use cases, the approach has been to do A/B testing with different versions of models and evaluate several metrics to determine the winner version.
Future is up to our imagination
The immediate, short-term, ML use cases have been strongly e-commerce and customer communication related. We foresee that future use cases can be found in areas like product development, sustainability, demand prediction and logistics. The diversity of future use cases is now only up to our imagination.