Data Science vs Data Engineering

Roy Klaasse Bos
The Outlier by Pattern

--

From multiple offices spread around the globe, Studyportals is trying to make education transparent globally by listing thousands of international study opportunities in one place. With over 30 million unique visitors from 240 different countries per year this “Google for studies” is not only one of the fastest growing scale-ups of the Netherlands, but also a playground for every data enthusiast. Last April Agis and I visited their Eindhoven HQ to interview Lead Data Engineer Tara Farzami in the Control Room; the same room where I had been job-interviewed four weeks prior to that. Although this time I had the privilege of asking questions, we still talked about work but also about fancy research proposals, the importance of teamwork and whether or not you should mention the significance of your results during presentations.

Data Science is a Team Sport

Originally from Iran, Tara moved to Germany for a masters in Neural Information Processing after she had obtained a bachelor degree in Statistics in her homeland. She explains: “Somewhere in the middle of college I realized that I’d like to apply my scientific toolkit to better understand the brain, not only from a biological perspective but also from a statistical one. The master program in Germany has been specifically designed for people with a Science, Technology, Engineering and Mathematics (STEM) background. It focuses on simulating the brain using neural networks by tuning mathematical models and testing them on data collected from a real brain.”

Thereafter, she joined the new neuro-informatics department at Radboud University as a PhD-candidate. Halfway through her PhD she realized that academia is no longer for her: “They train you to become a successful researcher, someone who can independently investigate and solve a problem. At the same time, I believe teamwork is a very important part of any success. In machine learning most of the computational neuroscience labs in Germany had good publications simply because they collaborated with other labs that mastered their craft. However, you don’t see this in every university. So it’s fair to say that teamwork is a must in business, in academia it’s a choice though.”

Teamwork is a must in business, in academia it’s a choice.

Fancy is not always better

As a researcher you need a leader that creates the right environment to make you shine. In practice, this is not always the case which is why you see a lot of smart and hardworking individuals burn out and leave academia. In the end, it’s all about who writes the fanciest proposal, not necessarily about the best scientific idea. A large network and a number of highly-cited publications will get you far but aren’t always guarantees for a grant. If you don’t manage to win any of them it’s very hard to stay in your current job.

It’s all about who writes the fanciest proposal, not necessarily about the best scientific idea.

Tara mentions another risk of working as an academic: “In most work environments there’s a high chance of getting promoted and securing a permanent employment if you live up to the expectations. In academia, however, many postdocs still have to deal with the uncertainty of a temporary contract.”

Data Science vs Data Engineering

Given her experience in both technical fields, I asked her to share her view on the differences between the two. “By definition a data engineer is in charge of building and maintaining a data system (also known as ETL: Extracting, Transforming and Loading data). So you make sure that whatever data comes in, comes out without losing any information (unless you would like to process it on purpose). A data scientist, on the other hand, cleans the data and then starts extracting patterns. Throughout the process they help data engineers improve the data validity by spotting outliers. For example, a tracker in the front-end may be broken that leads to unusual patterns.

In general data scientists also have some gut feeling: ‘hmmm… I think this data got something in it!’. That is something that engineers are sometimes lacking a bit. Therefore, data scientists moving to data engineering are a really valuable asset because they bring these insights with them. I wouldn’t say the one is better than the other, but I do think mutual understanding of both fields will be really helpful. Knowing the data engineering principles helps you become a better — and less naive — data scientist. For example, you can give suggestions on how to load and prepare the data optimally or what data is good to be collected.

So, in short, you can’t have a data science department without a data engineering one, because otherwise you don’t have any data for your analysis. But you also need a data science department to gather proper feedback about the quality of your data.”

Both job roles, Data Engineer (left) and Data Scientist (right), require a different set of capabilities (image retrieved from DataCamp). Surprisingly, you currently see many data scientists doing data engineering work according to Jesse Anderson. The differentiating factor between the two: data scientists pick up programming out of necessity to accomplish what they couldn’t do otherwise. Their programming and system creation skills aren’t typically at the levels of a data engineer — nor should they be.

Leadership in Data

Initially Tara started out as a Data Science Researcher, now she’s leading a team of data engineers. What’s her view on leadership in data? “A manager’s highest priority should be protecting the people from all the noise and distractions. At the same time, you need to inspire the team so that individual dreams become team dreams.”

She continues how this affects her recruitment policy: “I tend to look a lot at potential in people: I don’t necessarily hire someone with twenty years of working experience. I prefer to work with people whose eyes start shining once they start talking about data. As a leader it’s your job to provide them with opportunities to become a better version of themselves.”

Same story, different versions and all are true

Every few weeks Studyportals organizes a company wide showcase where all engineering teams give an update about their progress. I asked Tara how she deals with the variety of backgrounds in the audience while presenting there: “The first thing that comes to my mind is: stay away from the details. It’s very important to put yourself in someone else’s shoes. In practice this means I usually present differently to the engineering department than to colleagues in sales and marketing. For a non-technical audience I typically focus on high-level results. For example, this recommendation engine is winning because all metrics are going up. You don’t necessarily need to explain the details like why, how much, whether it is significant or not. Of course, behind the scenes you should check for that, but you don’t need to present it on stage. On the other hand, most data scientists and data engineers really do want to know such details, it makes them excited and they are really good at picking them up.”

Job Hunting Advice

Finally, I asked for her advice for students entering the job market. “Data Science is a bit of a buzzword these days: some companies need simple bar charts but are looking for data scientists. I don’t see that as data science, it’s just basic statistics. Nowadays anyone who brings value with data is a data scientist. But as a candidate I think it’s good to choose your job title wisely, that it suits your interests best.

Also realize that job interviews are the opportunity for you to ask questions, not just the interviewer. So keep questioning your potential employer until you really understand what job you are going to do; that you don’t end up in the wrong place at the wrong time.”

Data Science vs Data Engineering — so can we conclude the one is better than the other? Nope, we can’t. What we can say, however, is that they are different but inextricably connected at the same time. The one cannot live without the other, because it takes both sides to build a bridge...

Which side are you on? Let us know in the comments down below!

--

--

Roy Klaasse Bos
The Outlier by Pattern

Senior Product Analyst @ bol | Formerly Microsoft, Volkskrant, Studyportals