Two weeks in the life of a Data Scientist

Tech@ProSiebenSat.1
ProSiebenSat.1 Tech Blog
5 min readAug 31, 2022

Our interns Estelle and Ha Vu share what a typical sprint in our AI Products department looks like

by Estelle Weinstock und Ha Vu

Over the past few months, we’ve had the opportunity to work at P7S1 as Data Science interns in the AI Products department, where we learned a lot about the tasks and responsibilities of a Data Scientist.

Various tasks around Data Science

Our department’s mission is to be a driving force for P7S1 by championing the adoption of AI and data-driven thinking. We build innovative products that create value, unlock business opportunities, and empower our colleagues. By combining data science and software development expertise, we develop AI products based on the main data source areas of performance/usage data and content data.

With usage data from TV formats and advertising, the potential for various products is enormous. Among others, this includes predicting market shares of tv shows to help our program planners make decisions, or recommending the best ad blocks for our customers to reach their target groups.

Countless videos, texts, images, sounds, and other forms of content data are available for any kind of analysis you might think of. This data is transformed into useful information about the content using state-of-the-art machine learning models and deep neural networks. Examples of applications that you probably already encountered while using media are subtitle generation or thumbnail selection. The automation of such tasks frees valuable time for our colleagues.

Two weeks in a P7S1 project team

To show you what our work looks like, let us take you through a typical two-week timeframe in our project team: You may ask yourself: “Why two weeks?” This is because we are using Scrum as our agile software development method. If you want to learn more about it, take a look here. Those two-week timeframes are called sprints.

Working in sprints

Every sprint usually starts with sprint planning, where tasks from the backlog are prepared so that we developers can just grab and execute them. We aim to put as many tasks into the sprint as we have the capacity for by estimating their complexity of them. Also, a common sprint goal that summarizes the individual tasks is set.

The tasks within the sprint mainly consist of Machine Learning, Data Engineering, and DevOps which can be condensed under the term MLOps (see figure). While Machine Learning and Data Engineering are no foreign concepts to us as Math and Data Science students, we did not encounter so many DevOps tasks during our studies. Since DevOps mainly involves software development and IT operations, it is a critical component of product development. Entailed concepts such as security, speed, stability, and scale play a much bigger role when developing products designed for production compared to setting up separate models in spread notebooks as this is common practice in universities.

Diverse requirements for the optimal solution

The requirements for the continuous availability of the product also make it necessary to have a well-tuned environment, which usually requires working with cloud-based systems. Examples of such tasks are refactoring a module to become more efficient due to a large amount of data, or setting up a monitoring system to ensure the quality of the model over time.

Figure: MLOps Venn Diagram

In daily standup meetings, we then take a joint look at our Jira backlog board, which contains this sprint’s tasks. We also update each other on yesterday’s progress and today’s challenges. To overcome those challenges, pair programming sessions can be arranged to cooperate easily. Particularly interesting solutions and other ideas worth sharing are presented and discussed with the entire team in occasional operational meetings.

After we have met the acceptance criteria for a task with our code, other team members review our work to ensure it meets all set quality standards. Also, we get feedback on our coding style to continuously improve our coding skills. Once our code is approved, we can merge it into the main project branch and grab the next ticket to work on.

The final sprint

Towards the end of the sprint, the team meets to already think about possible tasks for the next sprint. These meetings are called “Backlog-Refinement”. Finally, the review and retrospective meetings end the sprint with the presentation of our work to the stakeholders, the reflection on the past two weeks, and how we can improve our way of working within the team.

But the sprints do not only consist of working on the backlog tasks. We also take the time to socialize and learn. Dedicated knowledge-sharing sessions are held regularly to spread gained knowledge across all our project teams. And at the end of a sprint, we find ourselves sitting together in one of Munich’s beautiful beer gardens and celebrating the last sprint’s achievements.

Our takeaways

“We can say that during our internships, we got a good insight into the life of a Data Scientist. Our main takeaway is that Data Science in a product team doesn’t stop at Machine Learning.”

Estelle and Ha

Depending on the skillset of the team, you may not spend as much time on pure model training and data analysis as expected. Also, software development and documenting are important tasks that need to be conducted, to turn an AI Model into a successful product. This makes the job even more challenging and interesting for us. The range of tasks is so broad that no two days are the same and we learn something new every day.

--

--