How We Built a Machine Learning Platform in 6 Months

Our journey in using Segment and GCP to build a scalable platform that helps us achieve our company’s mission.

Norrøna
Norrøna
Jun 7, 2019 · 6 min read

by Jaakko Mikkonen and Thomas Gariel

Image for post
Image for post
90 years of craftmanship in Norrøna — we are not your typical data science company

Norrøna’s mission is to create the greatest outdoor products. We design and produce physical products to satisfy the most extreme users — we are not your typical data science company.

In 2029 Norrøna will be 100 years. By then (we think), machine learning (ML) will be a fundamental part of our company, contributing to better customer experiences and higher quality products.

This article describes how we built a Google Cloud machine learning platform that allows us to realize some short-term use cases and brings us closer to our future vision. The complete platform, from data collection to serving machine learning predictions, took us about 6 months to build.

With the article we hope to inspire and show that ML use cases are very much in reach, also for small- and medium sized companies.

The fewer moving parts, the better

A separate environment for testing

Image for post
Image for post

Early on we opted to build two separate environments, we call them creatively development and production. These are in practice two identical projects in GCP and they allow us to test a complete ML solution without affecting any production processes.

Deployment of code to the two projects is orchestrated by Jenkins which is running on a third project in our GCP organization: infrastructure. A commit to our development Github repository triggers a deployment by Jenkins via a webhook. Similarly, a code merge to production repository deploys a new version of our ML platform to production.

Jenkins is also used for other tasks, such as copying data from data warehouse to development and for scheduling Google AI Platform jobs, but more about that later.

BigQuery as the main data platform

Norrøna uses Segment for data collection and master data management. Segment offers out-of-the-box and close-enough-to-real-time tracking of user interactions on both client and server side and assigns a customer ID, either identifiable or anonymous depending on GDPR consent, to each customer.

Image for post
Image for post
Segment is a great tool to get up and going with clean and accessible data in record time.

The largest source of data in Norrøna is our e-commerce platform accompanied by several, in-house built, digital products, such as the loyalty, product return and pro user platforms. Data from these solutions is enriched with information from our brick and mortar stores and the enterprise resource planning system. All the data combined offers numerous possibilities for ML use cases.

We tend to do data transformations and feature engineering directly in BigQuery (mainly due to personal preferences towards SQL). The nearly final ML training data assets are constructed as BigQuery views on top of live production data. The benefit of this approach is that the training data is always fresh and the data is easy to find by anyone in the organization. Same goes for the output from the ML models, everything is stored in one place.

Finally, while our ML production environment uses data directly through views against live data, the development environment is kept isolated. The development data is, however, refreshed regularly by a scheduled Jenkins process that executes GCP command-line commands to copy data between the environments.

Integrations are everything

Image for post
Image for post

That said, machine learning algorithms naturally deserve a mention. Our initial ML use cases are e-commerce and customer communication related mainly due to the tangible short-term business effects and good data availability.

Our programming language of choice has been Python. The training data is read from BigQuery and the code can be run either locally or submitted to be run in Google AI Platform. The algorithms themselves have so far been relatively simple to ensure easy implementation and transparency towards stakeholders, which in turn builds organization’s trust to the system. It is still fairly feasible to explain variations of collaborative filtering algorithm to a management group, the same can not be said about some more complex models.

The models are scheduled to train regularly in Google AI Platform by Jenkins. For example, we could have a training job that determines the similarity between two products based on the number of customers that have interacted with the particular pair of products.

Image for post
Image for post
Google AI Platform training jobs scheduled by Jenkins.

These training jobs generate prediction outputs in the form of tables, which are stored both in BigQuery and in Google Datastore.

Serving model predictions at scale

Image for post
Image for post
One of the model APIs being queried by the Norrøna e-commerce platform.

One particularly successful implementation of this is in-mail product recommendation: our email tool catches via a GET method a data feed from one of these applications, and every time we send an email where product recommendation is relevant, it uses this feed to populate a placeholder within the email template. So far we have seen significant improvement on click rate on these emails.

Image for post
Image for post
In-mail product recommendation example.

Performance monitoring — closing the loop

When it comes to product recommendation use cases, the approach has been to do A/B testing with different versions of models and evaluate several metrics to determine the winner version.

Image for post
Image for post
A/B testing results in Power BI from two different versions of a product recommender model.

Future is up to our imagination

Like what you’ve just read? We are hiring a Machine Learning Enthusiast to join our incredible team in Oslo. Apply here or share the word!

Check out norrona.com to learn more about Norrøna’s mission and discover our products, and follow us on Twitter and Linkedin.

Norrøna

Stories from our people on their mission to create the…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store