Cultivate good working relationships with data consumers
The relationship between Data Engineering and the data consumers — be they data scientists, BI, or one of a multitude of analytics teams — is always complex. All of these functions exist to serve the overall data driven goals of the organization and are expected to integrate seamlessly. So there is clear motivation to cooperate, but more often than not the division of labor is far from balanced, a situation which can develop into real tension between the teams.
There is no recipe for creating a perfect symbiotic relationship, and this is doubly true given how much variation there is in the structure of data teams across different organizations. That said, the following points can serve as indicators for the data engineer of where the major pitfalls are and what direction to aim for:
- Avoid the temptation of letting data consumers solve data engineering problems. There are many types of data consumers and the core competencies of each individual vary across multiple spectrums — coding skills, statistical knowledge, visualization abilities, etc. In many cases the more technically capable ones will attempt to close infrastructure gaps themselves by applying ad hoc fixes. This can take the form of applying additional data transformations to a pipeline that isn’t serving its purpose or even actual infrastructure design. Superficially it may appear to the data engineer as win-win situation: their own time is saved and the consumer’s work proceeds unhindered. However, this usually results in convoluted layers of sub optimal solutions that make the organization’s data infrastructure increasingly hard to manage.
- Avoid applying engineering sensibilities to data consumers: As fully trained developers, data engineers usually follow contemporary programming best practices. These imply stringent coding style and focus on efficiency and unit-test coverage. Some organizations opt for very close knit data teams or even unify all of the functions inside a single “data ops” team. In such cases, data engineers should tailor their expectations from data consumers that work intensively with code but do not follow those best practices. This is usually not motivated by ignorance on their part, but is rather adherence to their primary business function which requires that they prioritize other things.
- Put a premium on knowing what data consumers actually do: Data consumers rely on data infrastructure to do their respective jobs. Their level of comfort, productivity and adoption depends on the fit between that infrastructure and the dynamics of their work. Data engineers are tasked with developing this infrastructure from conception to implementation, and the actual day to day needs of the respective consumers are therefore critical context. This usually implies spending both time and effort to get a clear read, be it in the form of shadowing sessions, iterative POCs or both low and high level ideation discussions. The increase in professional familiarity between the teams also leads to an increase in mutual respect and amiability, and that in itself is a powerful driver of success.