Decentralize Data with Confidence — How Amazon’s Successful Bar Raising Program Applies to Safely Adopt Data Domains

Read Maloney
Data, Analytics & AI with Dremio
5 min readSep 23, 2024
Photo by kaffie.co on Unsplash

In today’s data-driven world, central teams often become bottlenecks, unintentionally stifling innovation. These bottlenecks are critical to resolve in the AI-era, as competitive advantage is increasingly based on how quickly organizations can bring changes and new ideas to market. The solution? Decentralization. But decentralizing core operations, especially data, is risky without the appropriate governance controls. How do you ensure that governance, quality, and security remain intact when ownership spreads across different business units?

In my time at Amazon, I served as a Bar Raiser — a unique program designed to keep hiring standards high as the company scaled. The recruiting team decentralized hiring decisions by empowering trained employees to maintain quality across the organization. Now 15 years into my career as a data-driven leader, I see how these principles can be applied beyond hiring — specifically to the world of data, analytics, and AI.

What is Amazon’s Bar Raiser Program?

Amazon’s Bar Raiser program is one of Amazon’s most effective mechanisms for maintaining high standards in hiring. Bar Raisers are selected and specially trained employees who are brought into hiring processes outside of their own teams. Their primary responsibility is to ensure that every candidate meets or exceeds the quality of existing team members, essentially “raising the bar” with each new hire.

The key element of this program, that I think can be borrowed in the world of data, is the concept of selecting specific talented individuals (think of those that are known by the data engineers in the central team) and then training them as data steward for their domain. In this manner, the central team maintains a strong (and trusted) link with the individuals that will be central in governing each domain.

The Challenge of Data Domains

Organizations are increasingly moving toward data domains, where each business unit (such as marketing, finance, risk, energy trading….) owns and manages its own data. This approach unlocks agility, allowing teams to innovate without waiting for a central data team to give the green light. However, it also brings challenges. How do you govern decentralized data to ensure it doesn’t lead to increased business risk, such as exposed sensitive data and poorly written queries that eat resources and block mission critical ones?

Enter the Data Steward Program

One approach, to decentralize data with confidence (and I recognize there are many), is to install a program that mirrors Amazon’s Bar Raiser approach. I call it the Data Steward Program — a way to balance autonomy with oversight.

Here’s how it works:

1. Appoint Data Domain Stewards

At Amazon, Bar Raisers were selected from the employees with proven track records to uphold hiring standards. Similarly, Data Stewards are experienced data and analytics professionals, selected and trained to ensure governance standards are met across domains (or if you’re in a large enterprise sub-domains). These Stewards are your gatekeepers — they ensure that data domains adhere to the central team policies, but they are funded and directly report to the domain. This individual would likely be a team member within the data domain and work closely with the data domain leader, or the Data Product Manager.

2. Train, Train, Train

Bar Raisers went through rigorous training at Amazon, and Data Stewards should be no different. Stewards are trained in compliance, governance, and data stewardship. They know the ins and outs of your company’s data policies and are empowered to validate that new domains are adhering to those standards.

3. Create a Clear, Standardized Process

Much like how Bar Raisers had a well-defined checklist when evaluating candidates, Data Stewards use a standardized framework to assess data domains. From access control to data lineage, they ensure that all domains comply with your governance standards before they’re green-lit.

4. Decentralized Ownership, Central Oversight

At Amazon, hiring was decentralized, but Bar Raisers ensured consistency. In the Data Domain Guardian Program, each business unit retains ownership of its data while Stewards maintain a central oversight function. The domains have the autonomy to operate independently, but Stewards ensure that the integrity of the data remains intact.

5. Automate Where You Can

At scale, we used tools at Amazon to support the Bar Raiser program, automating many of the administrative tasks. In the same way, tools, such a Dremio, make it easier to operate a data domain. Technology can help organizations automate governance tasks like access control, data quality monitoring, and compliance checks, reducing the manual overhead while maintaining control.

Decentralizing with Confidence

The core takeaway here is that decentralization doesn’t mean increased risk and decreased governance. It is possible to achieve increased governance and scale, while also decreasing risk and improving governance (largely due to solving scale issues that cause low quality work and putting someone with domain expertise in change of the domain governance).

The icing on the cake is the resourcing for an initiative like this one. There is pressure from a business unit to improve response times from your team, you say, “great”, we’ll set up a data domain for you, but you need a Data Steward for each 50 end users. These are likely to be the data or analytics professionals that your team already works with, and your team will likely have strong opinions on who to trust. Those resources are trained up, and you’re quickly in a train the trainer model. The domain now owns solving for their own agility pain, instead of pointing the finger.

Again, this is just one approach of many, and I’ve used it to scale other operational functions I’ve managed throughout my career. Let me know what you think.

Read Maloney
Current CMO of Dremio, and a technology leader for the last 15 years.

Dremio is a data lakehouse platform designed to empower organizations by enabling unified data access with governance. With Dremio, companies can move toward decentralized data domains while maintaining the necessary governance controls to keep data secure and compliant. By simplifying data access and governance across multiple sources, Dremio helps teams unlock innovation without compromise, making it the perfect partner for organizations transitioning to a domain-driven data architecture.

--

--

Read Maloney
Data, Analytics & AI with Dremio

CMO @ Dremio, the unified lakehouse platform for self-service data analytics and AI