A Journey from Software to Machine Learning Engineer at iZettle
I have been working at iZettle for about four years now. I did not start my journey at this company as a Machine Learning Engineer, but rather transitioned to it as I was growing within the company.
My formal education is Computer Science/Software Engineering, and in fact I have been working as a software developer for about 8 years. All these years as a software engineer have given me some skills that have made my path towards ML Engineering rather particular. I want to share with you what I wished I had focused on more when preparing for the role, and also how I think my background helped me on this transition.
How it started
First off - why? Was I not happy with my job as a Software Engineer? Do I think being an ML Engineer is somehow better than being a Software Engineer? By no means! It is just a matter of what I like to spend my time with, and that is data. I have been coding since I was 15, and I have always loved it. But what I love the most is using my work to understand the world. As lots of people, I have a lot of “pet project” ideas that never come alive, but nonetheless when I read this list of ideas, I find things like: Build tool to use twitter data for natural disaster detection, analyse data from my social media accounts to understand my own behaviour, detect moods in friends’ chats, etc. Do you see the pattern? All of those project ideas seek understanding of some situation, and all are centred around data. I have then always had a great interest in data, and about two years ago I made the decision that I wanted that to be my main job. I want to think that Machine Learning was my personal choice, but in the spirit of being data-driven, I can map my decision to the tip of the hype in the Machine Learning field in recent years, when every tech article you read was about some ML innovation, so I guess I will never know.
Anyway, I took that decision, and fortunately for me, iZettle was running a Machine Learning mentoring program in which I happily enrolled.
The learning process
In that mentoring program, we went through all the chapters of the book Python Machine Learning. Every week, we would discuss a chapter and code some exercises to experiment with the concepts learned. I found it very much exciting and it really settled my base understanding on how Machine Learning works.
Besides that, I used my free time to take some online courses, like a Deep Learning specialisation in Udacity, participating in Kaggle competitions (with not much success, I have to admit), and trying to implement some of those projects I mentioned before.
I am telling you all of this not to brag, but so that you understand that I was absorbing as much knowledge on the topic as I could, and that when I was ready to change titles, I felt I had a very solid understanding of Machine Learning.
After more than 6 months of intense studying, I joined the Machine Learning team at iZettle.
What courses and books don’t teach you
I joined the team full of energy and eager to start on my first project. And it was from that first project that I started noticing how different it is to study machine learning from actually doing machine learning. This first project was about trying to predict bankruptcy for our merchants, so that we could reach out and help them in their business.
When you do a project coming from a course or a book, the most important part of that project is already done for you. That is: What is it exactly you are trying to do? In a course, you are given a dataset, a target metric, and all you have to do is “massage” your data and train models to get good performance on your target metric.
There are a couple of things you don’t learn, or even ever question, in that situation:
Problem definition: How is the problem formulated, so that it makes sense from a Machine Learning point of view? For the bankruptcy problem, I was shocked about how many questions suddenly arose in my mind, almost out of nowhere: what does it mean to predict bankruptcy? Does it mean a merchant is going bankrupt tomorrow? In a week? In a month? How do I know which of our merchants has already gone bankrupt? Is it a lack of activity? What about seasonality then? Is it some external information? How do I map that to a label that my algorithm can learn from?… I was so used to being given a labeled dataset, that I never considered that just creating that label requires a lot of thinking, domain knowledge and business considerations. And that depending on how you define the label, the problem and features you can use change completely.
Data: I’ve already hinted at what was coming next, and it is widely known that getting the correct data is a hard part of an ML problem. However, it still strikes you when you stumble upon this problem for the first time. Data is hard to get, and it is messy, and you shouldn’t trust it blindly. Building the label comes actually after you get your data sources. In my first assignment, I had two datasets coming from different sources, that I had to merge and map to our own set of features per merchant. For each new source of information you introduce, you need to make sure not only that the data quality is acceptable, but that you are not introducing any kind of bias, or at the very least that you account for it.
In some cases, you don’t even have data for the problem you want to tackle, and the Machine Learning starts months before any line of code, building data collection strategies and relationships with other teams.
Evaluation: We have our dataset and our label. We start modelling and… how do we measure performance? This is not only a matter of which metric to use, but also if it makes business sense. Trade offs play a big role here. I had never had to think about what metric to use to measure my model’s performance before, that was a given. I was literally waiting for someone to tell me: use accuracy/ROC-AUC/etc. When that didn’t happen, and I had to think about a metric and the implications of it, I realised how important it is to spend a lot of time thinking about this, and I was kind of disappointed with the little attention that this topic is given on any book or course I had taken. Just think about it: Depending on how “bad” is it to predict true when it is actually false (a.k.a False Positive) or any variation like this, you might want to keep a minimum precision or recall, regardless of how your general metric (say, ROC-AUC) goes up or down. That’s just an example, there are many.
These were the main points I realised I had not learned at all in any of the courses I took or books I read. These are the things I have been learning in my job, day to day and thanks to really experienced and patient colleagues.
What helped me in my journey
There were of course also good unexpected parts, in which I could leverage the skills I have acquired over the years as a software engineer. To list some:
- General software development practices: In lots of courses, a big deal of time is spent explaining common software development practices: Version Control, basic programming, focus on a single programming language (90% of the time that language being Python), etc. Having worked with software for years, this is something I not only already had practice in, but it is something that it’s just “in me”. This made it much easier for me to start testing and implementing ideas, and hopefully to spread these practices and inspire my recently joined team.
- Reading and understanding others’ code: Because this is what I’ve always done, I am very comfortable reading other colleagues’ code, and hopefully giving constructive feedback.
- Flexibility when it comes to “the right tool for the job”. Machine Learning Engineers tend to get stuck in the tools they use to whatever it is they are comfortable with. While I have no data to back up that statement, it is something I’ve noticed. Because I have worked with many languages and frameworks, I find it relatively easy (and exciting!) to try out new tools and libraries, which gives me hopefully a wider view and set of tools to work with.
So… what to do?
If you are in a similar situation to the one I was in, I have the following add-on exercise for the next exercise in the book you are reading or the course you are taking. Try answering the following questions:
- What are you trying to solve/predict? Do you understand all the parts of the problem?
- Does the data you are given make sense for the problem you are trying to solve?
- How was the data collected? If you can’t answer that with certainty, how would you collect the data? How long would it take? Who would need to be involved for that data collection? Is it web developers (maybe click events), app developers (app usage data), etc
- If this was a problem to be solved within a company, what else other than your predictions would be needed for the project? Some infrastructure? Any business decision? A new feature in the app? Who would be involved?
- What evaluation metric are you using? Do you understand it? Is it appropriate for the problem? Is there any alternative metric that would make more sense for the problem? What’s the cost of miss-classifying a sample?
- Regardless of what the target performance is: When would you be happy if this would be a real project? Why that number? Why not lower/higher?
I am confident that if you really try to answer all those questions for every exercise you encounter in your learning process, you will develop a much wider and realistic view of Machine Learning “in real life”.
I hope this post helps many people in their learning journey! Please reach out if you have any questions, or want to share your story with us, we’d love to hear about it!