From Chaos to Collaboration: 5 “Do’s and Don’ts” for Data Engineers Working in Teams

Eden Bar-Tov
Wix Engineering
Published in
6 min readMay 8, 2023
Image by Midjourney

Data Engineering is often a solitary role that involves working with various stakeholders, where the Data Engineer is the sole member of the team. But what happens when there are a few Data Engineers working together on adjacent projects?

At Wix, each business group, referred to as a “company” internally, has a dedicated data team with a few Data Analysts and one or more Data Engineers. With this structure, effective collaboration becomes essential for data engineers working in teams.

For the last two and a half years, I have had the privilege of working in a team with three amazing Data Engineers. In this article, I’ll explore some tips to get your team to the next level, including sharing design plans and alternatives before implementing, conducting quality code reviews, writing generic code, and allocating tasks with growth and interests in mind.

1. Share Design Plans and Alternatives Before Implementing

Sharing your plans before you start implementing can be a huge time saver for the following reasons:

  • One of your teammates might have faced a similar problem and can share from their experience.
  • An issue that you haven’t considered before might surface and you can plan for beforehand.
  • Avoid redundancy by consolidating tasks (e.g “oh, we both need to transform this table before our process, so lets do it once and share it between our processes” ).
  • Making sure that everyone is on the same page can facilitate the code review process, which we will address later.

There are a lot of ways to achieve effective collaboration among team members, such as creating a design document that everyone can comment on:

Pros : Everything is written, all the alternatives, debates and considerations can be referenced later.

Cons: Probably will not be done for smaller projects that might also benefit from the collaboration of the team.

Another method (which is what our team opted for) is to have a weekly meeting, where we share our progress, talk about what we will work on soon and what is our current plan for it.

Pros : Much less formal, can be face-to-face or via Zoom, it’s friendlier and can range from being a smaller problem to giant projects.

Cons: If not documented properly, a lot of the “why we choose this” will be missed.

2. Invest in Quality Code Review

When it comes to pushing changes / new code into production, it might be tempting to just push your code without review, or on the other side do a rush shallow review. But the benefits of a proper review are massive for both the reviewer and the reviewee:

  • It allows team members to review and assess the quality of each other’s code, identify potential bugs or issues that might have been missed while developing, and also ensures that code is readable (which can save a lot of headaches in the future when this code will need to be maintained)
  • A code review also helps to promote knowledge sharing among team members and provides opportunities to learn from each other and improve coding skills (I had a lot of moments when I thought “ah I did not know I can do that in python good to know”).
  • Lastly, it gives the entire team a better understanding of this process, which will come in handy next time the developer of this project is on vacation and there is a need for a fix or change.

When conducting code reviews, it can be helpful to write a few sentences about what this change is about, but also what places the reviewer should focus on (e.g “I’m not sure if my implementation for the method `func` is the most ideal, what do you think?”).

When giving a review, be respectful and constructive in your feedback, focusing on the code rather than the individual who wrote it . Encouraging open communication and discussion allows team members to ask questions and provide explanations for their coding choices. Following up on code review feedback, either by incorporating changes or explaining why a suggested change may not be appropriate, is also crucial.

Image by Freepik

Some tips for the reviewer:

  1. Use “We” instead of “You” — make reviews less personal and more professional by focusing on the team instead of the individual.
  2. Stay positive and down to earth — Never use “this is wrong”. Present an edge case that might not have been considered. Ask questions that lead to a more suitable solution and suggest improvements as a question and not a statement (e.g “Maybe this method could be used in a separate utils rather than in this module, WDYT?” ).
  3. Compliment good code — It is always nice getting a good word for your work. If you see something worth parsing, don’t hold back (fire emoji also works here)!
  4. Share links and documentation — Don’t give away all the solutions up front. Open the door for learning and let the solution come from the author of the code. It might be tempting to just write the full implementation as you see it in your mind, but this might harm the growth of your teammate and also prevent them from creating their own solution on top of the materials you pointed them to.

3. Data Quality

Any Data Engineering post is not complete without talking about data quality.

Investing in quality tests can save a lot of time and frustration downstream and provide confidence in the data.

My colleague Itai Sevitt wrote a great article about Mastering Data Quality: 6 Considerations for Success — it’s a great read!

And I hope to write another article about unit testing for data engineers — so stay tuned 🤩

​​

4. Writing Generic Code for Reusability

When writing data pipelines, there is a lot of shared logic. Some can be implemented as methods in Python, while others can be expressed as business logic in SQL. There is no need to reinvent the wheel in each new pipeline.

This alignment and code reuse can be achieved by having a shared Utils file that holds the commonly used constants (e.g slack channel, env variable) and commonly used methods (e.g create a data quality task, create a generic filter to insert into a query).

This approach is both a huge time-saver and keeps everyone aligned on the correct methods and filters that are needed.

Image by Freepik

5. Allocate Tasks with Growth and Interests in Mind

When allocating tasks to team members, it’s essential to consider each person’s growth and interests. This is not only the job of the team lead. This is something that each of the members of the team should think about as well — own your growth.

Assigning tasks that align with an individual’s career goals and passions can increase motivation, engagement, and overall job satisfaction. Additionally, allocating tasks based on individual strengths and interests can help to create a more well-rounded team and ensure that each person is contributing to the project’s success. By doing so, team members can learn from each other, grow their skill sets, and contribute to a more successful project.

So, when talking about which teammate should take the next task, think also about who can benefit from each task.

Image by Midjourney

In conclusion, to be a successful data engineering team you need to “play as a team”, invest in each other, invest in your shared resources and invest in the quality of all your code.

Some of the points, such as sharing design and code review, might seem to harm velocity in the short term, but from our experience they boost productivity in the long term.

By sharing design plans and alternatives before implementing, conducting code reviews, writing generic code, and allocating tasks with growth and interests in mind, teams can work together more effectively, create higher quality projects, and achieve greater success.

--

--

Eden Bar-Tov
Wix Engineering

Data Engineer at Wix, Fascinated by big data and the business it represents