The good, the bad and the ugly — finding your path in data science with the insider’s help

Y-DATA
Yandex school of Data Science
9 min readJul 13, 2019

We sat down with Omri Allouche, Head of Research in a start-up company and senior lecturer at Y-DATA, to discuss Data Science and what it takes to become a good data scientist.

Source: Doody Atrkatsi

What is your professional background?

I am a data science lecturer at Bar Ilan University in Israel. Also, I head the Research department in the startup company Gong.io. It’s a pretty well-established company by now. I joined a few years ago when we were only 15 people, and today we are over 150 people. We provide conversation intelligence to enterprise clients, like Facebook, LinkedIn, GE, Nasdaq etc. And of course, I also teach Deep Learning at Y-DATA.

Everyone is talking about Data Science and Data Scientists, but these terms are fairly recent in the industry. Can you please define who is a data scientist once and for all?

There’s indeed a lot of confusion about this, especially among new practitioners. One of the best definitions I heard is that “A data scientist is a data analyst working in the Silicon Valley”. But on a more serious note, it is really challenging to properly define a data scientist, as the industry itself has not figured that just yet. It cracks me up look at the latest job postings in data science which have very different, confusing titles.

Recent job postings for “Data Science”
Recent job postings for “Data Science”

However, a data science position requires a mixture of different skills — mathematics, statistics, computer science, data analysis and business understanding. Good Data Scientists are knowledgeable in machine learning, statistics, deep learning and its theory, understand well the business they are in with serious domain expertise, have strong coding skills with a firm grasp of databases of both SQL and NoSQL, and can effectively communicate their findings clearly with visualization tools. That’s about it… Have I forgotten anything? :)

Wait…Does it mean that I have to be a math wiz with mad coding skills and with years of experience in a certain field?

The need for data scientists and machine learning researchers in the industry led to many people making the transition from various backgrounds. It is not uncommon to see someone with a PhD in Ecology or Cognitive Sciences working as a data scientist.

At Y-DATA the program creators dissected the required skills in the industry and came up with a program tailored to strengthen them all. Do you actually need to know all of those things? Well yes, you do. But it doesn’t have to happen straight away — it is a journey.

Some of the program participants are students fresh from the universities who don’t have a lot of experience in data analysis and visualization. Others lack in project management. Some program participants are good at analysis, but they may not understand Logistic Regression on a deep level.

In Y-DATA the students learn the core skills needed for a good data scientist, and practice them regularly. We strive for a balance between theory and practice. In my class, I emphasize the mathematical intuition that underlie the amazing advances in the field of Deep Learning, which I find much more important than mathematical formulation and proofs. I try to make the students understand something intuitively first and only then check it hypothetically and practically.

It is still confusing. For example, my strength is software engineering and data analysis — a subset of the necessary skills — can I call myself a data scientist?

Well, you’re definitely off for a good start :) Generally speaking, I find there are three main kinds of “data scientists”:

  • BI/Data Analyst — strong analysis skills and good understanding of data with the ability to tell a complex story in a simple way.
  • Machine Learning Engineer — strong grasp of Machine Learning and Deep Learning building blocks and architecture, coupled with understanding of the full deployment cycle of a model from inception to production at scale, with performance in mind.
  • Researcher — who finds solutions to Machine Learning problems using various algorithms, often not just off-the-shelf but with special attention to what’s needed to get the best performance for a specific problem.

The good news is that the 1-year Y-DATA program was created with all these types of data scientists in mind.

Contribution of each Y-DATA program component to different types of Data Scientist

I don’t have practical data science experience — how should I develop it?

I find that there is a big difference between learning the theory of machine learning and data science, and using it in practical scenarios. Good data scientists are able to work with real, dirty data, manage project priorities and quickly identify project pitfalls to obtain the best model for the job.

Many participate in Kaggle competitions to improve their skills, but these often include clean data, or the data cleaning is done by the entire community jointly, and the research questions are often clear and set.

At Y-DATA the students are assigned to real world projects in leading high-tech companies in Israel, where they are required to provide solutions to actual business problems using the data science tools they acquire during the program. Nothing beats the exposure to real life product needs in an everyday data science environment — Machine Learning is much more than just applying different models with different hyperparameters to a matrix with rows being observations and columns being features.

Top data scientist understands the research question and product needs, and knows how to use her knowledge of algorithms and methods to achieve that. The projects in Y-DATA are done in collaboration with a project owner from the company as well as with a senior data scientist who serves as a technological mentor with weekly sessions.

We have discussed so many skills that it is difficult to keep track. Which is the most important skill of them all?

Don’t forget the data in data science — it comes first. Y-DATA is notorious for its challenging home assignments which were designed to teach the students how to work with data.

One of the course assignments we had in Y-DATA was to perform classification of song lyrics to various genres (“rock”, “pop”, “indie”, etc). Theoretically, we use a Recurrent Neural Network (RNN) to classify the text. However, following a “recipe” will give you very mediocre results! Why is that?

An experienced researcher would first examine the data, and would understand it before applying the model. She will then cycle the model through debugging, error analysis and improvement processes.

For example, looking at the data, you see that some of the tracks are just instrumental or that the others have an undetermined category. The problem is also very imbalanced, with over x100 more rock songs than folk ones, which eventually hurts the prediction of folk tracks in the model. Just following the recipe, without understanding the data challenges and gaining a deeper understanding of it, will only give mediocre results.

Suppose I know my strengths, and have a goal in mind to become an ML engineer, can I get there alone by studying courses in Coursera?

It certainly depends on what kind of a person you are. You surely can take an online course and gain some understanding of the data science. But from my experience as someone who interviewed such candidates at my workplace, you will probably know how to answer the basic questions.

However, you will have difficulty with advanced topics, which you will not understand deeply. Moreover, data science field changes rapidly and usually online content is several years behind the industry.

Y-DATA program is not just a convenient setting to put your data science studies on a schedule. The Y-DATA program is a community of high achievers. They continuously want to know something new. They don’t want to feel smart, but to be smart.

Don’t make me feel good; make me better.

I am constantly challenging my students with assignments requesting deep understanding of concepts. Students have already taken courses online, but these courses won’t give answers to such assignments.

This learning model is embraced with openness by some students; others may find it challenging.

You have experience teaching in academia. How is the program different or similar to the university course?

In academia I teach Master and PhD students. While they are very smart and work hard, they are most often in their twenties and have little experience working in the industry on real world problems. In the Y-DATA program, most of the participants are much more experienced and have been working in the industry for many years.

On the other hand, one of my challenges as a lecturer in the Y-DATA program is the vastly different levels of experience and background of the students. The majority have already taken machine learning courses online, read through the material, and understand the basics well. They ask in-depth questions about how things work. As we go deeper, valuable insights and relevant experience are shared with all the students.

Students who come with less background and experience work harder to catch up, but we do not compromise on the quality or complexity of the materials. We keep the bar well above the lowest common denominator. We work hard to provide a top-tier world class program in Israel, with content unparalleled either in academia or in other courses.

Any unusual topics covered in lectures?

I try to come up with relevant topics that students sometimes overlook. I recently spoke about the human brain, for example. I described the biological neural network by comparing similarities and differences between the human brain and Deep Learning networks.

Following the huge jump in the performance of Natural Language Processing tasks, I also taught about advanced Deep Learning architectures for Natural Language Understanding, like transformers, that have only been out for a few months when the course was taught and for sure weren’t covered in any online courses.

Is it necessary to quit my job to focus on the studies?

Nothing comes for free and certainly such a demanding discipline requires considerable commitment of time and effort. Y-DATA spans over a year and it is a very selective program which takes highly talented people who can potentially cope with the workload. Generally, you don’t have to quit your current position.

Our students come from high-tech companies understanding that the skills that their employees learn in the Y-DATA program — data analysis, data clustering, machine learning, deep learning, etc., are very valuable from a business perspective, and students get a chance to use the knowledge they acquired in their existing positions.

I am still skeptical that even a year will do to cover all the skills we discussed earlier.

Data science is a blend and you are bound to be better at some skills than others — understand and focus on your superpowers, while strengthening your weaker points.

You never stop learning in this field. You must always move out of your comfort zone.

Y-DATA campus is in Tel Aviv. Can you imagine how such a program is different from, say, a similar program in Silicon Valley where all the “real” data scientists are?

I don’t think there would be big differences. Israel is recognized in the valley as an oasis for top-class Data Science research and practice, and many companies have or are building Data Science teams in Israel.

Some aspects of the Israeli culture make it a good fit for research in general, and machine learning research in particular. Israelis don’t take things for granted and question the status quo to the extreme. We don’t do something just because it has worked earlier and wonder if we can make it even better.

In a relatively young field like Deep Learning, that just explodes so quickly, the “I am smart; let me try a few ideas I have” mentality is key to innovation. This is part of what we try to do in the Y-DATA program, encouraging students to ask deep questions and get to the heart of things, and letting them work on real world projects.

P.S. Recently Y-DATA announced program opening also at Ben-Gurion University and applications for October class are open for both campuses. More details can be found here: https://yandexdataschool.com/israel/

--

--