all hands meeting cancelled

API Driven Management

Service dependency driven org charts

Greg Wester
3 min readDec 1, 2013

--

The slowest evolving structure in a fast growing technology company is the org chart. Big bang rebalancing of management hierarchy relies on the art of politics and negotiation. What if we could instead solve this problem incrementally with data? Management boundaries in organizations can be determined by the science of measuring dependencies.

Consider a software startup with 60 engineers that scales to 400 engineers over 3 years and has as many major reorganizations. Engineers working individually or in pairs will organically form into isolated engineering teams like apps, globalization, search, network ops, site reliability, etc. Rational VP’s will split the organization every time it exceeds 150 people (or another target for span of control) but they may not fully understand the dependencies and bottlenecks within their logical departments. Management rightly has a customer centric view and may organize around interfaces exposed to customers, like apps. This view gets murky the further the team is away from the customer.

I propose three tenets for self balancing technology organizations:

  1. Define software interfaces between teams
  2. Measure traffic on the interfaces
  3. Continuously balance resources to relieve “hot spots”

We’ve been successful with aligning teams at one layer of the stack but we’ve failed with overall delivery, as an industry. For instance, we can align the Apps team to work with the Search team on an important release but we fail to consider the participation of Network Operations until deploy time and forget to incentivize Release Engineering who may resist the change.

Apps depends on the Search service over HTTP. Search depends on Network Operations for config changes but the ops team has no software interface. Apply management here. Network Ops now needs to expose their hardware as software interfaces. Management will ensure the configuration change won’t be done over a conference bridge during a maintenance window at night. It will be tested thousands of times in software before deployment.

The Apps team also depends on Release Engineering to deploy the new code. RelEng doesn’t know how to build the app so they write their own deployment script in PERL. Apply management here. Management will ensure that Apps provides a descriptor to deploy their code in any environment, not just their workstations and the CI system.

RelEng depends on dashboards and monitoring from the Site Reliability team to know if the deploy was successful. Apply management here. Teams that depend on RelEng to deploy should be logging metrics and configuring synthetic user transactions that appear on this dashboard for system operators.

Now that we have defined and exposed software interfaces between teams, we can define metrics that illustrate how well those pathways are functioning. Apps makes 10,000,000 API calls to Search per day. Search changes network configurations 11 times per quarter. Apps asks RelEng to deploy 8 times a week. RelEng refreshes the Site Reliability dashboard 100,000 times per day and adds 1 new metric per week.

When we track these metrics over time it expose trends that can be managed. Software executives can check with teams to see what obstacles they would have in scaling if the trends were extrapolated. A group may need to refactor the code for a release to accommodate 10x or 100x growth.

Lastly, we can analyze waiting or blocking time between teams to rebalance and eliminate bottlenecks. Teams with internal customers are notoriously under resourced, routinely firefight, and aren’t able to plan for or predict who will depend on them. As in our Network Operations example above, the Search team interrupts them for last minute changes to production to accommodate deployments. For the first time we can see that the teams furthest from the customer or business goal may be the ones that need more earlier attention for the deploy to be successful. The VP of Apps may take a keen interest in working with the VP of Ops from the onset to make sure the effort is successful.

--

--