Data Science and Engineering: Two pieces of a puzzle.
Is it correct to say that a data science product is successful because of the statistics and machine learning algorithms used by Data Scientists? Yes, but only partly. The other significant part is the team of software engineers who develop and maintain the scalable infrastructure for the end to end workflow. This underscores the difference in the nature of work and the significance of the two types of team members: Data Scientists and Engineers (which refers to software engineers and data engineers). Both of them deal with data, but they differ drastically in terms of skills, responsibilities, tools, goals, and the language they speak. And yet they come together like two pieces of the puzzle that complete each other. The unusual mix of skills brings in challenges of its own.
I would like to share our experiences of working with data scientists and engineers while developing a data science product and a few ways that proved instrumental in making it successful.
The first and most obvious challenge that we faced was a collaboration amongst the data scientists and the engineers. At the onset, the two teams, data science, and engineering began with giving requirements to each other and consuming the output. They were two teams working in silos for the product. Scoping a new feature, dividing the task amongst the team, or resolving an issue started becoming a nightmare. The lack of understanding of what the other team does make it difficult to communicate amongst them. To resolve this, we began with knowledge transfer sessions from the data science team to the engineering team to help them understand the lifecycle and requirements of the product better and vice versa.
Having a data scientist who had prior experience in software development was a boon for our team. As I said earlier, they speak different languages, creating misunderstanding even on basic terms as ‘feature’, which means different things to both teams. He helped the two teams translate the requirements and dependencies. We leveraged his skills to ensure that the software engineers understood the requirement of a data science product and the data scientists knew how to use the infrastructure built by the software developers to transform raw data into recommendations. Once they started understanding the other significant part of the product, it became easier to collaborate and resolve problems. In a continued effort to reduce the gap between the two teams, the team ensured sharing details of how a particular issue was resolved within the team. It was quite a journey for the teams to move from the stakeholder mindset to a one-team concept. This united the two strong pillars of the product.
We did not realize it then, we felt we were having a lot of meetings, but in hindsight, it really helped us to have everyone — product manager, engineers, analyst, data scientists in the same room from the beginning. It helped in building empathy which eased the collaboration issues. Team meetings were a crucial time when both the teams aligned their work to form one product. It was very critical for the teams to understand the progress of each other’s tasks to function together as one team following the agile methodology of software development. This helped in better planning but we were still missing something.
We knew it, but took some time to acknowledge we cannot keep considering a data science workflow to be similar to a software development workflow. Data science work is experimental, iterative, and exploratory in comparison to software development. It is rightly said that you do not know how much time a data science workflow will take until you do it. This led us to tag our tasks as delivery and discovery in order to fine-tune the agile framework to suit our use case. It was easier to estimate time for delivery items while we were aware that discovery could take time. We started being mindful of this while planning and handling dependencies. As it is said, Rome was not built in a day, with time and experience the situation improved as the estimations got better. The team had a better idea of the buffer needed to align the work of two teams together. The scrum ceremonies of the agile framework proved to be a very good tool to increase the efficiency of the teams.
The presence of the two types of team members with different skill sets has its advantages too. The team has the opportunity to leverage the best of the two worlds. They have best-practices to absorb from each other and evolve. The team of engineers helped the data scientist to adopt writing unit tests, branching methods, releases, deployment and list goes on. Learning is continuous and ongoing. This led to the product being more robust, secure, and scalable. The team also puts in a lot of focus on optimizing the workflows.
In our journey of building this product to date, we have had a lot of lows and we have learned from it and come out stronger. A few of the methods mentioned above enabled us to write our success story.