How should our company structure our data team?

David Murray
Super.com
Published in
12 min readOct 22, 2020

Real-life learnings from five data team iterations: centralized, embedded, full-stack, pods and business domains.

Three years ago, the data team at Snaptravel started with a software-engineer convert interested in supporting the company to make more data-driven decisions. At the time, the company’s mantra was to move fast, so refined decision-making with data took the backseat to shipping new features. The structure of our data team — one software engineer who periodically had to build front-end applications when needed — was perfectly aligned with the size and needs of a young seed-funded startup.

Since 2017, our needs have changed. We’ve grown to process $1MM per day in sales, so our requirement for making decisions — now with far more financial leverage — has also grown to more than a part-time software engineer. Three years later, we’re a team of eleven people with varying specialties combining analysis, engineering, and data science who act as a centre of excellence in data and analytics for the rest of the org. And yes, the visionary software engineer who started it all, Nehil Jain, is still here!

As we’ve grown, we’ve optimized our team’s organizational structure to reduce communication overhead while maximizing context in two areas: between various skill sets on the data team, and between our team members and the rest of the organization. We’ve adopted numerous frameworks along the way, which will be described below. We’ve also made lots of mistakes while we tried to scale: pervasive meetings, too many decision makers in those meetings, and different people coding the same metric in different ways.

Chapter One — Chaos (mostly)

Chaos is probably a hyperbolic term, but anyone who’s worked at a seed-stage startup knows what we mean. Snaptravel began with two key verticals: engineering and business. Nehil, the protagonist of our evolution, worked as a data engineer on the engineering side and although our business side used data to make their decisions, it was disparate and irreproducible (done in Google Sheets). Three years later, being data driven is one of our six company values.

Before we go further, we should differentiate between two typical industry models for data teams: embedded vs. centralized. The former means data analysts (DAs) work closely with the business alongside account managers, operations managers, growth marketers or finance analysts. They go to similar meetings and report to the same heads of department. The centralized model means all the data people sit beside each other, and operate like a consultant to the various teams (their clients). There are pros and cons to both models that are really helpful to decide which model works for your company. Snaptravel didn’t really consider these and blindly opted for the embedded model. Chaos, remember?

Centralized models have data engineers (DE) and data analysts (DA) reporting to data leaders, working with heads of business units like a consultant-client relationship.
Embedded models centralize the technical-heavy data engineers, but have analysts report directly to heads of business units

Series A fundraising meant the launch of our Growth team, and the beginning of embedded DAs. Their purview was to tackle the heavy lifting data projects that come with digital analytics, and exploring how our marketing affects CAC, LTV, ROAS and GMV. Our supply and operations team also needed heavy lifting to be more efficient with our hotel rates and our customer success agents, so another DA was added.

More analysts was good for the company, but gave Nehil a big headache. How could the data engineering team make sure all the required data sources were regularly ingested, and the data was verified, reproducible, and consistent across teams? This was one of many problems facing the team at the time:

  • How can we standardize our definitions for revenue when it can be measured in 10 different ways? Is it gross of cancellations or net? What time zone?
  • How do we ensure our data is discoverable and understandable for all our internal partners?
  • How do we make sure there is ownership over data sources when pipelines or models break?

The solution was partly a tooling problem, and partly a process problem: we needed a tool that helped us keep track of all our data models (representations of data, commonly as a SQL VIEW or TABLE) and also allowed our DAs to work more closely together to keep our practices consistent. Nehil vacationed in London, England and stumbled upon the perfect tooling solution, dbt, while reading a blog post on a crowded airplane. To fix the process, we decided to centralize the DAs onto one team. And switch to agile.

Key Takeaways

  • We were evolving our analytics team at a rapid pace and lacked consistency across the organization
  • There were disconnected interpretations of the data that worked on a small team in an embedded model, but became a problem as we scaled

What is dbt?

Data build tool (dbt) is a tool that allows DAs to adopt software engineering best practices in how they manage their data. The general principle is that data is ingested into the warehouse in its raw form and SQL is used with version control, testing and metadata (data dictionaries) to manage the data that lives in production. It incorporates the benefits of functional programming to ensure reproducibility and simplicity.

Chapter Two— Centralize the Analysts

In hindsight, the key tradeoffs between an embedded structure and a centralized structure can be condensed into the following:

Major Trade-offs of the Centralized vs. Embedded Models

When setting quarterly priorities, company alignment is preferred when there are significant differences in the leverage of various team’s projects. Team alignment puts control of data in the hands of the business units, which enhances ownership. Knowledge share is more technical in a centralized model because DAs are in the same stand-ups and helping each other through problems, at the expense of developing a deep understanding of the business context.

One potential drawback with a centralized model, taken from Bob Iger’s Ride of a Lifetime, is the morale degradation when control over data is taken from heads of business units. As a small company, it has worked because our data team are coworkers but also friends with the heads of business. At a larger company or when/if Snaptravel becomes looser-knit, this may be a much larger issue.

We needed company-level prioritization because we were hyper-focused on growing the business, and we fuelled that growth with data insights. We were also spending a lot of time refactoring SQL views in multiple schema into our dbt ecosystem, which required lots of technical communication around naming conventions, best practices for code syntax and metric standardization.

Together, fixing our tooling with dbt and our business processes with a centralized team enabled our DAs to move quickly along with our data engineers (DEs) to reshape our data infrastructure. At that point, our engineering team was focused on meeting data service-level agreements (SLAs), continuing to reduce tech debt that had been accrued, and supporting DAs in adopting software engineering best practices (tests, code review, SQL optimization). The analytics lead pushed back against these initiatives — why should we waste our time on that stuff when the biggest revenue driver is to continue tuning algorithms to connect demand and supply?

This misalignment in priorities caused lots of problems — data engineering was focused on building the data health of the company long term, while the DAs were pursuing short term revenue at the cost of the companies’ long term data integrity. Our solution was to take the best of both teams and combine them into one full-stack data team.

Key Takeaways

  • A centralized analyst model allowed us to share our technical knowledge between team members
  • There was a disconnect between data extraction and loading (done by DEs) and transformation and analysis (done by DAs)
  • We needed the priorities of the DEs and DAs to align with company-level priorities, so we merged the teams

Chapter Three— Full Stack Team

Our newly-branded data analytics team merged four DEs with four DAs, and also merged each team’s objectives for the quarter into one team. The two skill sets were already working closely together, but the new team formalized the relationship and improved knowledge sharing by being in the same meetings. DEs met with DAs and business users at the same time.The merge allowed us to prioritize the right things, balancing long-term infrastructure health with our growth targets. Our business had grown enough that there were distinct areas of the business that needed support, with some overlap of key data like hotel bookings information.

The merge contrasted two fundamental ways to structure a data team, borrowed from multiple HBR articles on the topic: functional vs. divisional org structures. In an embedded model, our analysts were split divisionally and our engineers were grouped functionally. After we centralized the analysts, but before we merged with data engineers, both groups were divided functionally. After the merge, there were no formal separations of responsibilities: our combined team dealt with all data tasks for all divisions in the organization.

Engineers were on the same team but analysts were distributed and reported up to the head of each business unit
Engineers were on the same team and analysts were combined into one analytics team, but there was limited communication between the two.
Engineers and analysts on one large team, though with two leaders: the Analytics Lead and Engineering Lead

The change from divisional, to functional, to a holistic model happened over a timespan of 9 months. If we did it again, we would have skipped the functional model in step two and gone straight to full stack. We’d do this for two reasons: there was confusion internally (who reports to who?) and externally (who do I ask about our revenue targets?)

In the new structure, DEs and DAs worked together more closely, which was a major benefit. Everyone was aligned that we needed to pursue high-value revenue initiatives in the short term without accruing technical debt. The merger helped because the DEs could level up the DA’s best practices with close communication while they both prioritized revenue initiatives. Knowledge share for best practices happened naturally around the team desk and during standup.

Key Takeaways

  • A holistic full-stack approach to supporting our business stakeholders allowed us to keep everyone in the loop as we scaled
  • Changing our team structure so frequently disrupted manager relationships and created instability for team members, which was a big drawback

Chapter Four— Pods

By this point (March 2020), Snaptravel had exceeded growth targets for 4+ quarters, and we continued to scale our data team. Our one-team approach worked with eight people, but as we scaled to 12 our meetings became pervasive, irrelevant for many and crippled our productivity. Many agile rituals became wastes of time for the majority of team members when only one or two people were required for a decision. It was good to share knowledge, but our large team removed all specialization.

Our solution was to create multiple pods, each of which owned full-stack problems in a given area of the business. For example, one pod worked on a BI tool migration as well as data infrastructure. The entire pod (DEs + DAs) worked on the same problems and made decisions together.

This worked because it meant meetings were more specific to the problems being worked on, pods had full stack visibility into all the tasks required for our BI migration, and the priority of tasks being worked on within pods was aligned with the company priorities.

Large team was split into three teams, each of which worked on self-contained projects in the organization. Analytics and Engineering leads co-managed all pods.

This did not work for one reason: too many cooks in the kitchen. Shared ownership meant there was an abundance of opinions from people who had a stake in the outcome of a project. In some cases, there were four people on a 6-person pod all trying to make decisions, including cross-pod managers. Snaptravel has grown, but not to the point where all our decisions are so high leverage they require that degree of discourse. In some areas of the business, it’s preferred to move fast instead of deliberating.

Key Takeaways

  • Separating our large team into separate pods for each area of the business aligned our priorities and kept communication relevant
  • Unfortunately, there was no clear ownership over objectives in the full-stack pod structure and it slowed down our team’s progress

Chapter Five — One Team, Many Domains

The last change to our business structure, affectionately-named Domain Structure has emerged as the most preferred in our current state, as well as the structure most likely to scale. The structure denotes one senior member of the team as the ‘domain lead’ of a given area of the business (Growth, Data Infrastructure, Finance). That person (and their manager) ‘owns’ the domain. Depending on the size of the domain, or the priority for that quarter, other members of the data team (DEs + DAs) are volunteered as ‘Contributors’ of that domain, meaning they work as individual contributors supporting the work of the domain lead.

Projects are owned by domain leaders, who are accountable for liaising with heads of business and dividing work within their domain. Domain leads can be contributors in other areas of the business.

The above format works because it takes all the benefits of the pod structure (full transparency in meetings, alignment with company priorities, relevant communication only) and adds in the layer of ownership that was missing. The domain lead’s performance evaluation is directly related to the success of the domain that they lead and their contributions to other domains. In meetings, team members fall into one of two defined categories: active or passive contributors. Active contributors influence the decision making and are held accountable by outcomes. Passive contributors are involved in the meeting because it affects their domain, but their role is to gain context, and provide feedback for how decisions made implicate their domain. They are not responsible for outcomes.

In getting to this structure, we considered two correlated tradeoffs: knowledge sharing and ownership. Knowledge sharing was the degree to which a role should be working closely with other people and have shared context. Ownership was the degree to which a role should be able to make decisions in a silo if desired, and be held accountable for outcomes. We split the types of roles on the team into management, analysts, and engineers and decided the ideal degree of knowledge sharing and ownership that each role should have in the future structure.

Proposed Future State and Implications for Ownership and Skill-set Focus

Analysts should have the deepest knowledge of their domains because business context was so important. They should also have a high degree of ownership over their decision making because constantly being questioned slowed their output and moving fast was important. Analysts were also held accountable for the efficacy of our production models. Building a good data engineering stack meant shared knowledge and the ability of all DEs to step in and commit code to various repositories whenever required. Managers’ jobs were to share knowledge throughout the company and between domains, so they should have the most broad knowledge.

Whenever more analytics work is required, we scale the domain structure by adding contributors or, in the case of launching flights or ticketing, creating new domains. Analysts and engineers grow as individual contributors by building context in more domains, or as managers by becoming domain leaders.

Key Takeaways

  • The shift to a domain structure gave ownership over business outcomes while optimizing skill specialization, prioritization and knowledge share
  • DAs, DEs and managers all have separate responsibilities and the domain structure reflects that diversity in roles
  • Scaling the domain structure has been flexible with little disruption to org charts or areas of focus

Conclusion

We learned a lot in the time it took to find a data team structure that worked for us. We interviewed other tech startups and soon discovered that many teams struggled with the same problems we did. We made organizational changes that disrupted our team’s productivity. We had problems with communication within our team, and externally with other company stakeholders. The new domain-based structure fixes those problems. Ownership over key company priorities is defined and scalable. We can be flexible between quarters to focus on areas of importance to the company. People are (mostly) only in meetings that require their attention or are relevant to their work. We effectively share knowledge, both technical and business-related, as we scale to grow our team. It’s a structure that has enabled us to make the difficult data-team transition from reactive to proactive, working as a thought partner with stakeholders instead of as order-takers. It’s been a long road, but well worth the process as we aim to grow in the coming years.

Interested in chatting further? We’re hiring.

Snaptravel Careers

Data Engineer

Data Engineering Lead

--

--

David Murray
Super.com

David has worked in analytics for 5+ years and helped to grow two analytics teams from their infancy, most recently with Snaptravel.