How to Build an Impactful Career in Data Science?

Interview with Alexey Grigorev on Data-Driven Chat

Alexey Grigorev
Data Science Insider
6 min readMay 6, 2020

--

I had a great pleasure to talk to Ganna Pogrebna on her Data-Driven Chat podcast. Ganna asked me many interesting questions, that’s why I decided to summarize our conversation in a blog post.

We covered many things, including:

  • The work I do at OLX Group
  • The importance of learning by doing when getting into data science
  • The effect of the current crisis on the future of machine learning

Let’s start!

Hello, everyone, this is Data-Driven Chat and today I’m very pleased to introduce Alexey Grigorev.

Alexey is a Lead Data Scientist at OLX Group in Berlin, Germany. He is an expert in Software Engineering and Machine Learning, and he’s an author of multiple books: Mastering Java for Data Science and Machine Learning Bookcamp. I’m really excited to talk to him today.

What are you currently working on, Alexey?

I work at OLX Group. OLX is a platform for online classified advertisements: this is a place where you go to sell and buy things.

One of the first projects I did at OLX was predicting the quality of images. Low-quality images are not very attractive, that’s why we wanted to build a model that helps sellers make better pictures. If a picture is too bright, too dark, blurry, or the object is not properly placed, we could detect that and suggest how to improve the picture.

The other project I worked on was a duplicate detection system. Sometimes our users want to get more exposure and sell their items faster, so they publish the same listing multiple times, spamming the platform. In some cases, it’s worse: there are fraudsters who copy existing listings, make the price more attractive, and pretend they are genuine sellers. Eventually, they trick honest buyers into paying a deposit and disappear with the money. The duplicate detection system solves both these problems.

You can read more about these projects on our tech blog:

Can you give some tips for people who want to get into data science?

I strongly believe in learning by doing. This is the concept I use in Machine Learning Bookcamp.

The best recipe for learning is: come up with a problem and then solve it.

Take a library, find out how to use it, and learn enough to solve this particular problem. Often knowing how to work with available tools is more important than knowing how to implement ML algorithms from scratch.

Scikit-Learn: a great library for doing machine learning (scikit-learn.org)

The most important thing in applied machine learning is being able to use ML to solve a business problem. This is what you should focus on. Knowing formulas doesn’t always solve the problem, but libraries often do.

Instead of concentrating on practice, many newcomers invest time into doing courses and going deep into theory. The amount of information on the Internet is overwhelming: there are so many things to learn. This creates a false sense of knowing nothing at all. I remember experiencing it myself: it tried to fill these gaps by studying more and more, only to discover other things I don’t know yet. I believed that I really needed to learn it before I could do anything practical. Don’t let this feeling fool you.

Focusing on solving the actual problem is the best way of dealing with this feeling. So, find a problem and try to solve it.

Of course, knowing theory is also important, but in my opinion, the practice comes first and then comes the theory.

I like your point about finding a practical problem. I also tell my students: you can spend hours listening to my lectures or reading something, but you will never learn it unless you start doing something.

To what extent data science and machine learning changed business practices?

My professional career started 10 years ago. I remember the job landscape and required skills back then — it was pretty different from today. Now there are roles like data scientists, machine learning engineers, data engineers — roles that didn’t exist 10 years ago. This shows that companies already realized that machine learning is useful for their business. That’s why we have all these new jobs now.

Machine Learning is useful for businesses and makes their customers happy

Machine learning has proven useful for the industry. There are already “classical” applications of machine learning: demand forecasting, fraud detection, marketing and advertisement, just to name a few. We saw many times when it was possible to take the data that an organization had and use machine learning to have a positive impact on their product and business.

Companies know it and realize that if they don’t use their own data, they lose their competitiveness.

Speaking of jobs, Alexey recently made a video on how to get hired as a data scientist. If you want to get tips on how to get a job, watch the video.

We’re all feeling the impact of COVID-19. Do you think data science can play a role in solving the current crisis?

The amount of informational noise on the internet these days is unbelievable.

The best thing most data scientists can do now is to stay home and try not to contribute to this noise. We cannot fit an exponential curve and then call ourselves experts in epidemiology. Without proper background, we may arrive at the wrong conclusions. This is definitely not something we should spread on the internet.

What do you think about the future of machine learning? What will we see in the next five years?

Machine learning, prior to the current crisis, was on the peak of hype. People were excited about it — perhaps, too excited. Sometimes, companies expected that they’d hire a team of data scientists who’d magically solve all the problems.

Machine learning was on the peak of hype (source: Wikimedia commons)

Other companies knew pretty well what machine learning could do for them and, as an investment, hired a lot more data scientists than their business actually needed.

It was a good time to be a data scientist.

Unfortunately, times have changed. Businesses will now optimize how they spend money and reinvest it into something more important for the company instead of hiring more data scientists. This is sad: as a data scientist, I’d like to see the field continue to grow. But it was inevitable, the crisis only caused it to happen sooner.

Of course, there are many areas where machine learning is still needed. I, however, expect companies to be more selective and thoughtful when hiring data scientists. The demand will likely decrease.

To stay in demand, we need two more skills in addition to machine learning: business understanding and software engineering.

Business understanding is about being able to formulate a business problem in machine learning terms. This also includes managing expectations and knowing when machine learning is useful, and when it’s not.

Once the problem is formulated in ML terms, there’s a lot of work to do: get the data, transform and aggregate it, plug it into a model, and use the output. Mostly, it’s is engineering work, so being able to do it in a reliable way is quite important.

You shouldn’t underestimate the importance of software engineering for data science — especially if you’re now starting your career.

Thank you. It was really nice to have you on Data-Driven Chat.

It wasn’t possible to cover everything that we discussed on the podcast in this post. For the full episode, check here and subscribe to the Data-Driven YouTube channel.

Follow me on Twitter (@Al_Grigor) for the updates!

--

--