Dogs, Wolves, Data Science, and Why Machines Must Learn Like Humans Do

Published in

VEON Careers

7 min readJun 9, 2017

We’re living in a world where machines are learning to tell the difference between dogs and wolves. Now, a robot knows not to pet a wolf! We’ll get back to this stuff in a second.

For now, let’s introduce Evgeniy, a senior data scientist at VEON. In our interview, we talk about a lot of awesome things (including dogs and wolves!). Let’s begin.

You have a Ph.D from Lomonosov Moscow State University in mathematical modeling. What led you into the field of statistics?

Yes, my thesis dealt with non-negative matrix factorization. I guess my interest in statistics stems from starting programming when I was young. I’ve always liked to analyze and create.

But my journey into statistics truly all started as a joke with my friend. We both knew statistics to be this super complicated subject, so we applied just to see what would happen. We both were accepted and just decided to do it. It turns out we loved the field.

It’s funny how it all happened. I just really enjoying working with statistical models, especially when it involves real data. After school, when I analyzed data from medical and biological research, I realized this was the direction for me. A simple joke ended up being my lifelong pursuit!

We discussed learning, research, and analysis a lot. Now that the age of AI has begun, what insights do you have that can help navigate this new world successfully?

It’s an exciting world. Things like neural networks are everywhere. Neural networks are designed to learn like the human brain, but we have to be careful. This is not because I’m scared of machines taking over the planet. Rather, we must make sure machines learn correctly.

[1602.04938] “Why Should I Trust You?”: Explaining the Predictions of Any Classifier

Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons…

arxiv.org

One example that always pops into my head is how one neural network learned to differentiate between dogs and wolves. It didn’t learn the differences between dogs and wolves, but instead learned that wolves were on snow in their picture and dogs were on grass. It learned to differentiate the two animals by looking at snow and grass. Obviously, the network learned incorrectly. What if the dog was on snow and the wolf was on grass? Then, it would be wrong.

So you believe there must be great attention given to the way these machines learn, right?

Absolutely. It’s tremendously vital. Not that I’m saying being able to differentiate between dogs and wolves isn’t important, but neural networks have the power to do incredibly important things.

Picasso: A free open-source visualizer for CNNs

Cloudy with a chance of tanks

medium.com

There is an example of when an army used neural networks to distinguish between camouflaged tanks and plain forest. The problem was that photos of the tanks were taken on cloudy days, while photos of the forest were taken on sunny days. The neural network passed all the tests successfully, but it was merely distinguishing between clouds and sun — not tanks and forest. It learned the incorrect way to differentiate the two.

By sequentially blocking out parts of the image, we can tell which regions are more important to classification. This image was classified by the VGG16 model, with a 94% classification probability of “tank.” Bright parts of the image correspond to higher probability of the given classification. For instance, **the sky regions are very bright because occluding the sky doesn’t affect the probability of this image being classified as a tank.** And conversely, the tank tread regions are darker because without them, it’s hard to for the model to know if it’s looking at a tank.

In both these examples models erroneously tuned to the implicit bias that was in the samples — that actually happens quite often (here’s another example). That’s why it is important that we understand how a model makes its decision.

AI can save lives, protect the environment, and overall help us build a better future. Clearly, this technology can get complicated and we not only have to make sure what we’re doing with it holds value, we must also handle it cautiously.

Discover all our Data Science and Engineering positions HERE

Can you tell us more about your career path after university?

Now, I’m one of the senior data scientists at VEON, but this job role, data scientist, wasn’t really a common term when I started. You could call me a statistician as well.

“All data scientists follow a crazy career path. It’s like a cool roller-coaster through all sorts of new territory — like artificial intelligence (AI).”

Over my career, I’ve worked on medical research data for research institutions. I’ve done budget optimizations and time series forecasting for pharmaceutical companies and supermarket chains.

Before VEON, I worked in Moscow for Yandex, the largest search engine in Russia. I specifically worked at the Yandex Data Factory, helping external clients leverage data for more success. I guess Yandex was where I technically became a data scientist.

You have a breadth of experience. What have you learned along the way?

My jobs after university have been about learning and reaching solutions by collecting and analyzing numbers and information — that’s the main point.

The biggest thing I’ve realized is the power of data. For instance, in medicine, understanding how to utilize findings in correlation and causation can save lives. Also, I should mention I’ve become very interested in causal graphs, which are booming in use in statistics right now. As we get better at discovering causal relationships, data will only get more powerful.

Maybe soon being a data scientist will be just as cool as being a techno star. Did I mention that I like techno? Let’s have a listen:

What made you join VEON?

VEON is an exciting company that’s undergoing an incredible digital transformation. The culture here allows for lots of creativity. There are two other main reasons.

One, VEON is sitting on a goldmine of data. I’m having a blast going through all of it. There’s a lot of rich stuff and I’m learning so much. There are just so many ways we can use this data to enrich the lives of our customers.

Two, the location in Amsterdam really attracted me. It’s beautiful here. I love watching the birds chirp in the trees, my cat enjoying the garden, and the cherry blossoms in Westerpark. The city also has a great work-life balance, which is important for me.

We heard you teach a course. Can you share with us some details?

I actually have a specialization we made for Coursera with my ex colleagues. It is about machine learning and data analysis, and I specifically teach applied statistics.

I’m really proud of the courses because we go beyond the basic stuff so students can actually use what they’ve learned in real-world situations. For instance, we explore advanced things like multiple hypothesis testing and play with real data-sets.

What do you do for inspiration outside of work?

For personal inspiration, I read a lot about statistics and data science. One book I like is Computer Age Statistical Inference, which gives you a great picture of where we’re coming from and where we’re going in the data world.

Computer Age Statistical Inference: Algorithms, Evidence and Data Science

The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence.…

web.stanford.edu

I also read Rob J. Hyndman’s blog. Hyndman has written the best R package for time series forecasting and his advice and opinions on forecasting are very valuable.

Another blog I like is one written by Francis X. Diebold, an econometrician most famous for the Diebold-Mariano Test. Sometimes he writes about mind-blowing results that went rather unnoticed (such as, for example, the proof that conditional mean minimizes any Bregman divergence, including asymmetric ones — it is just nuts).

No Hesitations

Francis X. Diebold's Blog

fxdiebold.blogspot.nl

In general, I stay aware of the latest developments in statistics, machine learning, data science, and other related subjects. I read a lot of new articles from scientific journals and data science mailing lists.

Dogs like snow, too

Quick question: Do you know the difference between a dog and a wolf?

Anyway, you get the point. If you’re going to learn something, learn it the correct way. Machines should, too.

Hopefully, Evgeniy’s interview has shown you the importance of learning and using the data you have to reach true solutions, especially in this amazing digital age. This way, you’ll always know where the sun shines and animals play.