How we do machine learning at iZettle

Eric Hansander
Feb 20, 2019 · 5 min read

Data science and machine learning are central at iZettle when building tools to support small businesses and cater to each business’ unique needs. In this post, we’ll describe three of the factors we’ve found important to enable a small team to ship lots of value.

At iZettle, we live and breathe small businesses. Understanding them is key: what makes a small self-made company succeed? To be able to help each merchant with their unique challenges, we have to be smart about how we leverage insights and decisions from data, and that’s where data science and machine learning come in.

A small team with a huge scope, shipping a lot

We’re using machine learning solutions to solve several challenges around the company, including:

  • issuing credit to small merchants by predicting which businesses will be doing well and would benefit from a credit offering (calibrated probabilities for new products with limited data)
  • offering the right product to the right merchant at the right time, to get more, and more engaged, merchants (tailored product offering based on predicted business needs).
  • detecting fraudsters, to keep both merchants and customers safe (imbalanced dataset).

This is all done by a relatively small team of machine learning professionals, working in a hybrid model. We’ve tried fully embedding machine learning competence before (in Risk, Credit, etc.) but found that bringing people with different domain expertise together into one group strikes a good balance at our current scale. Having a team provides great support and cross-learning, and working tightly together with the rest of the organization ensures we maximize our impact.

So how do we scale this efficiently?

1. Partnering with product and business owners, and owning the full delivery

Data scientists at iZettle own the full scope of delivering a machine learning solution, all the way from finding the business opportunity to shipping the solution to production. This includes:

  • partnering with the owner of a product or business area, and working with them to identify and prioritize the most valuable opportunities.
  • developing the machine learning model or other solution that is necessary to solve the problem.
  • deploying solutions to our production data platform, and monitoring the performance and business impact.

We’re done when we have measured the desired business impact — not before.

Distributing ownership of deciding what to do and how, and removing as many dependencies as possible, ensures that we can scale nicely. To achieve the full potential of independence the development environment must also support it.

2. Everyone ships to production, and it’s fast and safe

We have iterated on our data platform for many years, and although these things are never perfect, we’re certainly getting there. Some key features are:

  • easy access to “all” the data you need (of course with strict access controls and anonymization), with several tools for processing it (like Spark, SQL, and Apache Beam).
  • a framework that makes it easy to deploy a model together with its preprocessing steps, executing in isolation so failures don’t disturb unrelated pipelines.
  • treating data science and machine learning applications just like any other software, with pull requests and peer reviews (also for exploratory and analysis code), tests, continuous delivery, etc.

Most of this hopefully sounds obvious, but is easier said than done, and building a good data platform is a huge topic in itself. For now let’s just say that we have done several iterations over the years. So try not to lock yourself in to one specific tool or ecosystem, and try to make it easy and cheap to iterate and improve.

3. Ensure impact from the start, by prioritizing projects by value and probability of success

At a fairly mature, multi-product fintech company like iZettle, there is an abundance of opportunities for leveraging data with high potential return on investment. So how do you choose where to start?

The approach we’ve found successful is to think of the projects we take on as a “portfolio”. Just like in any portfolio management, we aim for a good balance between reward and risk. Often it makes sense to think of software projects in terms of value and effort, but for machine learning we’ve found it helpful to also think of “probability of success”. Software engineering projects rarely fail because it turns out to be impossible to implement a solution, but some data efforts legitimately fail because there simply is not enough signal in the data to reach the required performance.

What the right balance is varies over time. When we started, we prioritized opportunities that had a high probability of success, even if that meant not starting with the absolute maximum business value. That way we could build experience and tooling while delivering some value, but most important of all: we built trust in the organization from the start. Later we transitioned to also having a healthy dose of high-risk, high-reward projects in the portfolio.

To visualize this, you could plot potential projects by value and probability of success (ignoring effort for this exercise). Then this this is what you want:

Use these early wins to define a pattern for success that can be shared as an example for others to follow. Our pattern includes things like:

a committed stakeholder: A person within the area that knows the domain, can help move things along, and enable a tight feedback loop.

an actionable outcome: Have a clear idea of what (automatic) actions will be taken based on the model output, and that they will have the intended business impact (frameworks like CoNVO can help here).

available data: Do we have the right data, and enough history for detecting relevant patterns? If we want to do supervised machine learning (a common case): do we have labeled outcomes?

reasonable feedback cycle time: How long do we have to wait to know if we have added value? If more than a month or so, you should probably work to define some early indicator of success.

Like the idea of using machine learning to deliver real value?

At iZettle, Data Scientists can have a lot of impact, and deliver a lot of value. To summarize, some of the key enablers are:

  1. Distributed ownership and prioritization, by partnering with product and business area owners around the company.
  2. A solid data platform that supports teams in delivering independently
  3. A portfolio perspective on the balance between risk and reward in projects we take on

Does this sound exciting? Join us! We’re always looking for exceptional data science and machine learning talent.

Zettle Engineering

We build tools to help business grow — this is how we do it.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store