Rachel Shalom — How I spent a year becoming a Data Scientist at Y-DATA

Published in

Yandex school of Data Science

7 min readJul 13, 2019

Rachel Shalom, Product Manager and a Y-DATA graduate, talks about her background, aspirations, studies and industry project.

Tell us a bit about yourself

My background is quite varied. I studied at the Hebrew University in Jerusalem and graduated with a B.Sc. in Mathematics and Economics.

While I was at university, I was a teaching assistant for a few years, mainly in mathematics undergraduate courses. After that, I went on to earn my MBA at Tel Aviv University.

I spent six months in Austria as part of a student-exchange program and studied Python as part of the business degree there. Afterwards, I went on to work as a product manager at Technion — Israel Institute of Technology.

At the Technion, I designed and headed coursework focused on creating startup programs. It was an intense, problem-solving type of work I found very rewarding.

These days I work as a product manager for a young startup company and since we are small my tasks vary: from coding, to managing complicated projects with large enterprise clients, in my free time (that is, when I had free time before starting Y-DATA :) I built web applications to familiarize myself with the parts of our technology stack I didn’t get the chance to experience during office hours.

What sparked your interest in data science?

My interest in data science started while dealing with a project at work , when I began working with large corporations. About two years ago I joined an Israeli travel tech startup — Save A Train. It operates worldwide, and its main goal is to save money on already-purchased train tickets.

If you bought a €100 ticket to a popular destination, say, from Milan to Paris, the ticket-price may drop a week or two later. Then we allow you to exchange the ticket for a small fee and keep the difference to yourself. We’re able to do that because we build direct business relationships with the train operators, which means that unlike other OTAs, our website and apps benefit from direct integrations with their API.

Recently, we came up with more B2B ideas to save money on enterprise travel budgets. I can’t name any clients since I’ve signed NDAs, but suffice to say that any of these companies can provide us with over 10K tickets a month, so things were already going well.

We needed to monitor the prices algorithmically and trace the trends. It was at this point that I felt I didn’t have enough tools to answer all of our clients’questions. I wasn’t familiar with time series forecasting and other amazing tools I am familiar with now. . This case triggered my interest in data science, and I discovered I LOVE it.

Do you get to use what you’re learning at your work?

Definitely! Some examples:

I started building from scratch (using Dash package that I encountered in class) our company dashboard presenting information about sales, users and other insights in a clear visualized way. This is an internal tools for managers, investors and other stakeholders which is already in use.

Data cleaning techniques help me with the tough problem of extracting the correct data and present it to our users. Stations names is only one example of that: in Europe alone, there are over 10,000 stations. Some of their names are spelled differently in different languages which makes them difficult to find for some of our international users. I used techniques learned in class to overcome this. .

And last but not least: the more tools you have, the better you can solve problems. I am in the process of applying some machine learning models to additional business problems we have.

Why did you choose to study in Y-DATA program?

Unlike most education programs in Israel, Y-DATA doesn’t force you to take a huge risk and quit your day job to study. I combine studying with my current work. While my end goal is to become a data scientist, I think that at this point, where data is the new “gold”, every product manager should be familiar with machine learning tools Y-DATA syllabus looks like a mature one-year program: learning something takes time, and I don’t think one can learn data science in a few weeks or a couple of months, as some other courses promise. One year is an excellent duration in terms of understanding new concepts.

What are the insights you gained from your studies at Y-DATA?

One of the things I liked and learnt is that the art of exploratory analysis: before applying a fancy algorithm on your data, try the simple things — get to know your data, play with it, visualize it, try to make sense of it with simple tools — this can teach you amazing things . InAt one of the early classes at Y-DATA we were dealing with a kaggle challenge of bicycle-sharing in san Francisco. The task was to come up with interesting insights about the data that will bring business value. That really helped me understand the power of just exploring your data,. It was my first experience in actually making sense of 5 gigs of Data :) Another interesting task was working on dataset of 350,000 lyrics of songs from different genres (hip-hop, rock, country etc.) This was a text classification task: given a song you output which genre it belongs to. As part of the exercise we used word2wec — a model that is used for learning vector representations of words, called word embeddings, so basically words are represented in the form of vectors and placement is done in such a way that similar meaning words appear together and dissimilar words are located far away. A funny thing I got when adding/subtracting vectors of words is Man minus Love equals Banker :)according to the world of these songs, a man without love becomes a banker.

What can you tell about your Industry Project?

My industry project was at Nexar-a start-up with a mission to create a world without car crashes. The experience of a project with real business goals in a company is entirely different from the regular studies. Nexar’s connected car technology is using data from user’s mobile sensors (mainly accelerometer) and video

Our task is a binary classification: differentiating between dangerous and non-dangerous driving.. These mobile sensors: Accelerometer, gyroscope, magnetometer and GPS can tell us a lot about dangerous situations on the road.

I worked on this project with 2 other students. Our approach was to start first with a base model and standard time series feature engineering (mean, std, autocorrelation etc..) , using Random Forest and gradient boosting. We wanted to get a sense of where we are and were surprised to get good accuracy of around 80% only using accelerometer data. Adding the other sensors improved results a bit. Since Nexar was skeptical about these results we used additional validation data, and showed our models perform consistently. Then we took the deep learning approach. We based our work on a paper comparing different architecture for time series classification. After some trial and error and using accelerometer, gyroscope, some of the GPS data and even hand crafted features we achieved the goals we got from Nexar: accuracy — 88%, recall — 87% and precision of 88%. At this point we were already at the end of the program and started to prepare our presentation for the demo day. Among 14 projects, our project was selected to present in the final round, in a demo day with mentors, lecturers and other industry practitioners.

A big help in the project were our mentors: Glib Ivashkevich-a real professional-I learnt a lot from him and he helped us achieve a lot in this project. Also, the guys from Nexar — Dr. ELi Brosh and Tomer Ahrak guided us and helped us through the work. One more thing is the presentation and public speaking skills, which I consider to be a life-skill. Y-DATA provided us with a presentation skills workshop before the demo-day which was an invaluable experience.

What are the major challenges you’re facing?

The combination of working, taking the courses and completing all the exercises plus the industry project was very challenging. At a certain point I decided to reduce my position to 50% since for me this was the only way to keep up with the workload. Lecturers gave us a week to get through a subject that we know nothing about. We also were required to read and understand research papers for weekly seminars — we worked in study groups on this. Making a plan and prioritizing helped me not to lose it, but I study all the time, on the bus on my way to work, during lunch and on weekends.

What are your concerns?

The industry is developing so fast, and the knowledge we acquired yesterday may be irrelevant tomorrow. Lecturers are teaching us to learn, but I still feel that I’m missing some parts from the first semester, and I wish the program would have been two months longer to understand the nuances properly.

I also think that having a some prior knowledge about machine learning from online courses would have helped me with the fast paced learning in class. Anyway — I plan to go through some selected subjects again — since there are some concepts I feel I didn’t grasp in full.

What are your thoughts of the future?

The program made me realize that I only begun my journey into learning data science, so I’m planning to keep learning new interesting things now that I understand the basics. I plan to become a data scientist. I feel that the industry project, along with my product experience make this the most sensible career option for me.

But as I mentioned before — I think every product manager should learn about machine learning. When I started the program, it was out of my own professional interest — I wanted to get more tools and knowledge as a product manager but throughout the year I actually realised that data science is what I want to do as a career.