How we do machine learning at iZettle
Data science and machine learning are central at iZettle when building tools to support small businesses and cater to each business’ unique needs. In this post, we’ll describe three of the factors we’ve found important to enable a small team to ship lots of value.
At iZettle, we live and breathe small businesses. Understanding them is key: what makes a small self-made company succeed? To be able to help each merchant with their unique challenges, we have to be smart about how we leverage insights and decisions from data, and that’s where data science and machine learning come in.
A small team with a huge scope, shipping a lot
We’re using machine learning solutions to solve several challenges around the company, including:
- issuing credit to small merchants by predicting which businesses will be doing well and would benefit from a credit offering (calibrated probabilities for new products with limited data)
- offering the right product to the right merchant at the right time, to get more, and more engaged, merchants (tailored product offering based on predicted business needs).
- detecting fraudsters, to keep both merchants and customers safe (imbalanced dataset).
This is all done by a relatively small team of machine learning professionals, working in a hybrid model. We’ve tried fully embedding machine learning competence before (in Risk, Credit, etc.) but found that bringing people with different domain expertise together into one group strikes a good balance at our current scale. Having a team provides great support and cross-learning, and working tightly together with the rest of the organization ensures we maximize our impact.
So how do we scale this efficiently?
1. Partnering with product and business owners, and owning the full delivery
Data scientists at iZettle own the full scope of delivering a machine learning solution, all the way from finding the business opportunity to shipping the solution to production. This includes:
- partnering with the owner of a product or business area, and working with them to identify and prioritize the most valuable opportunities.
- developing the machine learning model or other solution that is necessary to solve the problem.
- deploying solutions to our production data platform, and monitoring the performance and business impact.
We’re done when we have measured the desired business impact — not before.
Distributing ownership of deciding what to do and how, and removing as many dependencies as possible, ensures that we can scale nicely. To achieve the full potential of independence the development environment must also support it.
2. Everyone ships to production, and it’s fast and safe
We have iterated on our data platform for many years, and although these things are never perfect, we’re certainly getting there. Some key features are:
- easy access to “all” the data you need (of course with strict access controls and anonymization), with several tools for processing it (like Spark, SQL, and Apache Beam).
- a framework that makes it easy to deploy a model together with its preprocessing steps, executing in isolation so failures don’t disturb unrelated pipelines.
- treating data science and machine learning applications just like any other software, with pull requests and peer reviews (also for exploratory and analysis code), tests, continuous delivery, etc.
Most of this hopefully sounds obvious, but is easier said than done, and building a good data platform is a huge topic in itself. For now let’s just say that we have done several iterations over the years. So try not to lock yourself in to one specific tool or ecosystem, and try to make it easy and cheap to iterate and improve.
3. Ensure impact from the start, by prioritizing projects by value and probability of success
At a fairly mature, multi-product fintech company like iZettle, there is an abundance of opportunities for leveraging data with high potential return on investment. So how do you choose where to start?
The approach we’ve found successful is to think of the projects we take on as a “portfolio”. Just like in any portfolio management, we aim for a good balance between reward and risk. Often it makes sense to think of software projects in terms of value and effort, but for machine learning we’ve found it helpful to also think of “probability of success”. Software engineering projects rarely fail because it turns out to be impossible to implement a solution, but some data efforts legitimately fail because there simply is not enough signal in the data to reach the required performance.
What the right balance is varies over time. When we started, we prioritized opportunities that had a high probability of success, even if that meant not starting with the absolute maximum business value. That way we could build experience and tooling while delivering some value, but most important of all: we built trust in the organization from the start. Later we transitioned to also having a healthy dose of high-risk, high-reward projects in the portfolio.
To visualize this, you could plot potential projects by value and probability of success (ignoring effort for this exercise). Then this this is what you want:
Use these early wins to define a pattern for success that can be shared as an example for others to follow. Our pattern includes things like:
a committed stakeholder: A person within the area that knows the domain, can help move things along, and enable a tight feedback loop.
an actionable outcome: Have a clear idea of what (automatic) actions will be taken based on the model output, and that they will have the intended business impact (frameworks like CoNVO can help here).
available data: Do we have the right data, and enough history for detecting relevant patterns? If we want to do supervised machine learning (a common case): do we have labeled outcomes?
reasonable feedback cycle time: How long do we have to wait to know if we have added value? If more than a month or so, you should probably work to define some early indicator of success.
Like the idea of using machine learning to deliver real value?
At iZettle, Data Scientists can have a lot of impact, and deliver a lot of value. To summarize, some of the key enablers are:
- Distributed ownership and prioritization, by partnering with product and business area owners around the company.
- A solid data platform that supports teams in delivering independently
- A portfolio perspective on the balance between risk and reward in projects we take on
Does this sound exciting? Join us! We’re always looking for exceptional data science and machine learning talent.