How to Stop Being a Taker and Give More As a Data Scientist in 6 Simple Steps

OLX Group Careers
OLX Group Careers Blog
12 min readJul 22, 2021

--

We asked Serge what advice he would give his childhood self, and got an inspiring answer:

“I left Belgium when I was 26. Now, I’ve lived in three countries and five different cities. I would tell myself to go abroad earlier. Living abroad opens your mind. I would tell others to go abroad, too.”

For Serge, a Data Science Manager at OLX Motors, much of his career has been about adventure and taking leaps. It’s why she started out working in space research and why he jumped into the data science world by working at startups and then joining OLX Group.

In our talk with Serge, we learned a lot about what data scientists at user-focused companies should strive to accomplish and how they can make life better for folks.

During this article, we agreed with Serge to alternate between the pronouns he and she.

Without further ado, let’s jump in.

1. Fairness

Vehicles are one of the most expensive purchases people ever make. Understanding that, Serge and the data science team have taken steps to make certain pricing is fair.

Through the Price Fairness project, OLX Motors uses AI-powered price evaluation tools to analyze whether car prices are fair. While a car shopper peruses the platform, they can see if a price is above, below, or at the average price range for that type of car.

“The model flags to the buyer if the car price is fair or not, and explains why,” describes Serge. ‘It then provides an explanation on which car characteristics are driving the price evaluation up or down.”

The output of our price evaluation model on OTOMOTO.pl

The Price Fairness project has been a multi-departmental initiative. Serge, along with the Product Manager and Head of Product Research, led the project and invited team members from multiple backgrounds to generate ideas. They invited people from product, data, tech, and other business functions. They had people working on the backend who were new to design thinking.

“Having a diverse group helped create a better model.”

The team not only wanted to estimate pricing as accurately as possible, but they also wanted to see the impact of the AI pricing model on buyers and sellers. Serge and the team used a confusion matrix to gain insights into the model’s performance.

“We used a confusion matrix to estimate the performance of a binary classifier. Specifically, we utilized the confusion matrix in design thinking to identify the impact of AI price recommendations. For instance, what happens if in reality the price is fair and the AI model says it’s high? What are the consequences for buyers and sellers?”

Using a confusion matrix has helped Serge see the implications of adding AI pricing models to the platform in a methodical way. It enables OLX Motors to see just how the AI pricing model impacts buying behavior. With such data on hand, they can build a fairer model.

2. Elimination of Biases & Assumptions

To make the platform enjoyable and fair for users, the data science team must research without assumptions. With the Price Fairness project, Serge and others have made a point to challenge their assumptions.

“Here’s one notable finding: We originally thought that if the model flagged a car as too expensive, people would click less. But buyers clicked more actually,” exclaims Serge.

That willingness to challenge assumptions through testing has helped build a better platform. To further eliminate biases, Serge and his teammates have done extensive research.

By taking into account the location of where cars were being sold, the AI model’s analysis began giving lower prices on average for cars being sold in the countryside. Is it fair to provide a different price for the same car based on location?

“An issue we found was the potential for location bias with the pricing model. And this could be a historical bias with car pricing. By taking into account the location of where cars were being sold, the AI model’s analysis began giving lower prices on average for cars being sold in the countryside. Is it fair to provide a different price for the same car based on location? This is where ethics comes into play with the use of AI on user-focused platforms.”

Recognizing how biases could affect the model and OLX’s users, Serge has worked hard to make fairness the foundation of all the data models his team builds. This requires thinking about what’s best for the users long-term (and not’s what’s easiest now).

“We have to build models that don’t have historical bias. A bias can’t be the base of learning for the model. We also must realize the trade-off that happens when you take action to avoid bias. You lose data points and use a lot of resources. But you make the data model more complex by defining what is fair. You can push the learning to go the route of what you’ve defined as fair with your data science, product management, and business teams.

“We need to get better at detecting bias in the models we deploy in the world. Because these tools have an impact in the external world,” stresses Serge.

“For instance, even when you search online, advertising technologies analyze you using parameters such as gender or sex. Whatever page you visit and whatever you see, they will put you in a box in terms of gender. This creates difficulties for non-binary and trans people.”

For Serge, things like advertising for non-binary people have to get better.
“If I have another characteristic, such as if I’m a racial minority, I may encounter the same issue. The AI we’re deploying applies these labels to define who we are, and often it’s not fair or accurate.”

You read more about Serge Journey on How to build a more inclusive environment for people in tech here:

Serge, the team and other participants during the price fairness workshops

3. A Better User Experience

A better user experience is the motivation behind many of the data team’s projects at OLX Motors. Some of the team’s most important projects right now include:

  1. Wholesale Price: When sellers buy cars in bundles, they need to get a good price so that they can resell at a profit. The wholesale price tool recommends to sellers the price at which they should buy cars in bundles. Thanks to the wholesale price estimator, sellers have more data and knowledge on what’s the best price to maximize profit.

“This project is very valuable for our sellers. We must have good data to get the estimates right. Because if we make a mistake with the price recommendation, the impact will be massive. This project involves multiple teams in the company. It looks complex, but we’ve done a good job of aligning teams,” states Serge.

2. Relevance of Query Results: The idea driving this project is to display the most attractive cars for each buyer while ensuring search optimizes seller listings and ads. Buyers and sellers are connected on the OLX Motors platform, so improving the relevance of query results requires striking a balance.

“If we put too much weight on the relevance characteristics for buyers and ignore the seller side, sellers might be unhappy. Conversely, if we only focus on revenue, or the seller side, and promote sellers that pay for ads on top of the search results, the buyers might not get the best results and they’ll be unhappy.”

As you can see, to enhance the user experience for both buyers and sellers, Serge and the team have to strike a balance. This makes the work both challenging and fulfilling.

“There’s a continuous trade-off on a platform with buyers and sellers. We can’t think of it as a zero-sum game. We can uplift both buyers and sellers simultaneously.”

4. Knowing the How & the Why

For projects to have success, Serge stresses the importance of data provenance.

“Data provenance enables us to track data from the source to the final recommendation. We need to know how data arrives at a destination, the how and the why. We want to track data from the source till the recommendation and we want to examine transformations that happen in the data,” states Serge.

Data has become increasingly valuable to decision-making and enhancing the tools and capabilities of the OLX Motors platforms. This is why data provenance matters so much and why Serge has been busy building a team with a diverse set of skills when it comes to big data.

The how and the why…

“The data science field is expanding and teams benefit from having a variety of data professionals, from data engineers to data analysts. For data science to be at its best, it must be explainable, repeatable, and scalable. This is why we’ve seen the rise of the machine learning engineer.”

Machine learning engineers have indeed become more important to the data science function. As Writuparna Banerjee, a machine learning and data science enthusiast, writes:

“Machine learning engineers sit at the intersection of software engineering and data science. They leverage big data tools and programming frameworks to ensure that the raw data gathered from data pipelines are redefined as data science models that are ready to scale as needed. Machine learning engineers feed data into models defined by data scientists.”

In addition to having the right people, Serge ensures the team has all the right tools too. To better grasp the why and how of data, Serge has employed Pachyderm, a “data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes.” This summary highlights what Pachyderm can deliver to Serge and the data science team.

The platform allows users to comply with emerging AI legal standards while ensuring that machine learning developers can accurately recreate and repeat data science experiments. The ability to deliver data lineage (data provenance) is seen as a key step toward explainable AI.

Pachyderm’s platform targets machine learning pipelines and ETL workflows, managing data and models while tracking output directly to the input datasets from which they were created. The result is data provenance. Promoted as “Git for data science,” the service provides data science teams with version control for software development tools.

For Serge, data provenance is the main driver of their solutions, and that’s why they use tools like Pachyderm. Knowing the how and the why of the data helps build better models.

“As we use AI-powered models more, data provenance becomes even more crucial. I believe that AI must explain itself in order to work better on platforms like OLX Motors. Data provenance helps get you on that path to explainable AI,” adds Serge.

5. Explainable AI

Imagine this: You apply for a loan on a site, then you get this response:

  • “Your loan application has been denied?”

You may be wondering, “Why?”

Now, think if you really needed that loan to buy a car to get to work. Not getting an explanation would frustrate you, right?

An article in Noema Magazine uses this loan denial example to show why AI must explain itself. Unfortunately, this isn’t so simple (yet). But we must figure it out.

Nicole Rigillo, an anthropologist and the author of the article, writes, the mass implementation of machine learning and artificial intelligence tools as decision-makers “poses important questions concerning the fairness and transparency” of those decisions. The consequences are massive. This is why “better understanding how AI systems ‘reason’ is a necessary step to bridging the gap between human and machine intelligence.”

As we’ve discussed, Serge believes data provenance forms the foundation for understanding how AI systems reason. At OLX Motors, Serge stresses how data provenance brings a transparency layer, helping teams work more efficiently and manage the models better.

“We have the Pachyderm tool because we want to improve transparency and explain the reasoning process better. The ML Ops team needs a tool like this. It makes our reaction time much shorter. If the service breaks or the quality degrades, we have to look for the source of the problem. Maybe the source of the problem is at certain parts of the pipeline,” details Serge.

“Imagine that you have a pipeline with various components, such as an NLP deep learning component and another component that’s like a classifier. If you have a solution that includes provenance and you identify a problem in the classifier, you can just update that part and not everything else.”

For Serge, such transparency helps the team work more efficiently. That’s because his team understands the models more deeply.

“By getting to explainable AI, you can do everything from detect bias to debug models more effectively.”

This drive towards explainable AI is also why Serge has taken an interest in better data sharing.

“I really think data sharing can contribute to explainable AI. I just read a great research piece from Oxford about how markets for datasets could work.”

If you’re interested, here’s the article abstract that introduces the need and idea of markets for data well:

Although datasets are abundant and assumed to be immensely valuable, they are not being shared or traded openly and transparently on a large scale. In the context of data trading this article shares conceptual market design approach and demonstrates the importance of provenance to overcome appropriability and quality concerns. It analyzes the requirements for efficient data exchange, comparing existing trading arrangements against efficient market models and shows that it is possible to achieve either large markets with little control or small markets with greater control.

6. Think Bigger-Than-Life

Coming from a background in engineering and physics, Serge has worked on some incredible projects over his career. And his story is a testament to the importance of thinking big and jumping towards opportunities.

Serge studied space technologies with a specific interest in remote sensing. At the end of his studies, he left Belgium and went to Italy, where she worked as a remote sensing engineer for NATO in the Underwater Research Centre.

“We analyzed satellite images relevant for acoustics models. We extracted wind intensity and wind direction data from satellite radar images of the sea,” describes Serge.

Following his time at NATO, Serge worked for the European Space Agency. He stayed there for 13 years, working in Rome then Madrid.

“I was mostly contributing to innovative satellite missions of the European Space Agency. We did experimental missions where we put in new sensors in space that hadn’t measured at that level of precision and frequency before. We measured geophysical parameters, and this sort of data wasn’t available before.”

Serge was involved in designing, developing and operating satellite data centers. The objective was to build a capacity for each satellite mission to deliver high quality data as fast as possible mostly for research purposes to laboratories across Europe and the rest of the world.

“My daily work was interacting with engineers and scientists to ensure the data centers were being developed properly and then operated as expected. Before the launch of the satellite, it was to ensure the data centers would work once the satellite was launched. It was a lot of big data before that became the big thing,” says Serge.

“I began with data at the space engineering level,” notes Serge.

“Now here I am, exploring how data can help folks on earth unlock the hidden value in everything, as we say at OLX. Data science has an exciting road ahead. Dream big when it comes to setting your goals in this field.”

Read more about Serge Journey in Part 2:

That concludes our conversation with Serge. We hope his list of goals for data scientists, as well as his personal journey, have inspired you.

Discover more about Data Science at OLX Group here:

Unlisted

--

--

OLX Group Careers
OLX Group Careers Blog

We are one of the world’s fastest-growing networks of trading platforms, operating in 30+ countries, with over 300 million Monthly Active Users.