Reducing Batch Sizes
Something struck me about a 2019 talk by Irakli Nadareishvili of Capital One Tech advocating for “reduction in batch sizes” when it comes to Engineering. The idea was that with smaller but cohesive agile processes, lean product management processes, and services, you get the main prize: more frequent deployments. More frequent deployments have a high correlation with more productive teams and successful companies. So it seems reduction in batch size indirectly creates or enables high-performing teams.
Note: A batch here refers to what you deliver. Pushing 2 weeks of commits to prod is a larger batch than pushing every commit in a continuous delivery system. You can apply the concept of batches to project management (1-week sprints over 1-month sprints) and services — microservices over monoliths
However, the fact is that we mostly see negative outcomes when organizations attempt to reduce batch sizes. They typically begin implementing microservices and agile without planning ahead for what side-effects this approach will have. We see organizations that end up with distributed monoliths, accidental architectures, high rework costs, heavy cross-communications, unclear ownership, and speed issues. The empirical data suggests we haven’t established robust methods to reduce batch sizes in technical organizations.
The (strawman) simple claim “reducing batch sizes increases performance and productivity” is almost always false.
The more nuanced statement made in the talk that “reducing batch sizes while maintaining cohesion increases productivity” is misleading. This claim implies splitting things into smaller batches results in net productivity and performance gains and ignores the taxes of coordination, communication, and the complexity of distributed systems.
The charitable interpretation here is that reducing batch sizes, on average, outperforms the previous setup of larger batch sizes if your organization has the proper capabilities and practices. The problem is this talk provides no way of establishing what those capabilities and practices are. Where is the difficulty in this process? Many companies are ignoring the difficulties of reducing batch sizes and slowing to a crawl. We are promoting something vague that is likely to weaken most businesses.
Note: The book Accelerate goes into more detail here about batch sizes and proper implementation
Why Not Increase Batch Sizes?
There is a reason Amazon is able to handle having 90+ microservices constitute a single product offering to clients while most companies would flounder under the pressure of having this many services (and the resulting smaller service batches) for one product.
In fact, a recomposition of services could boost performance and productivity. Recomposition delivers fewer agile projects, fewer product management functions, fewer CI/CD pipelines, and fewer microservices. Sometimes, perhaps in year 3 of a company lifecycle, increasing the batch size would yield performance gains for teams. We see this often when people perform reverse Conway maneuvers and recompose services to better align teams with deliverables.
A reasonable frame of reference for the skeptic of small batch size efficiency would be Fowler’s notion of business capability-centric teams. BCCT’s are not antithetical to smaller batch sizes but when I envision BCCTs, I do not envision the single responsibility principle (which has really become the entity-service antipattern) in extreme form. Instead, I think of volatility-based decomposition and elemental architectures — larger/more comprehensive units based on Workflow that capture a business capability/context cohesively. With these architectural patterns in place, there is more flexibility when it comes to batch size.
Reducing Batch Size is Hard
The problem is that the talk seemed to skip the essential parts of the small batch size discussion— how do we size the services and teams? The Goldilocks batch size. During the talk, it required extensive automation by CapitalOne to ensure compliance across their large set of microservices. The first two phases in the picture below were “too much of an uphill battle” given the number of distributed systems. Could a “decomposition at the last moment mentality” have saved them here?
Digging More Deeply
Without a mature and integrated DevOps/SRE (automation) and platform services culture, increasing the number of services you have is almost sure to slow things down and cause problems. Picking up 100 pennies is harder than picking up 5. However, if I had something which automated the toil in picking up pennies, then it might be fine or better to have 100 pennies than 5.
Do you have a platform that engineers can leverage to simplify handling this shift towards smaller batch sizes? Has your platform team generated leverage in the form of uniformity that makes the task of standing up and grok’ing new services simple? Will every team have its own unique full-stack solutions?
Have you integrated the execution of unit tests, regression tests, and automation tests as part of every commit and build?
Can you org handle (lead) the rework needed to move to smaller batch sizes?
Do you have sufficient architectural expertise in your organization to reduce batch sizes properly? The Goldilocks batch size is why we switch from centralization to decentralization cyclically.