Moving to Trunk-Based Development

Simple Guideline to Leave Feature Branch Workflow Behind

Published in

Inside Bukalapak

5 min readNov 25, 2020

For years, we engineers in Bukalapak had been using Git Feature Branch Workflow. As shown in Figure 1 below, we code our feature in a dedicated branch. After we finish the development, we merge this feature branch to the trunk/main branch (or master, in old Git terminology — but we no longer use it for a better world).

This way, the trunk branch never contains broken codes. Any deployment from the trunk is safe (but then, the reality is not that sweet).

Our problems with feature branch workflow

This common workflow constantly gives us at least four problems.

1. Big PR (pull request)

Not always, but often too big. Too many changed files mean the code reviews are painful. Reviewers will be caught in a review-fatigue situation where the final approval will be just “LGTM” (which might be “ILGM”: I’m Lazy, Go Merge).

2. Unavoidable merge conflicts

Besides the ineffective code reviews, big PR also leads to merging conflicts. Not only delaying the deployment, but this is also difficult to fix. Sometimes there’s no way to resolve it other than discarding the whole feature branch.

3. Feature competitions

When different squads or tribes need to work on the same microservice, merge conflicts lead to a feature competition. The team that releases later will be at a disadvantage because they need to resolve the conflict. Sometimes a feature needs to be delayed because another urgent one needs to be deployed first.

4. Running out of staging servers

Yes, we have plenty of staging servers to test each feature branch in isolation. But, they eventually run out.

New trunk-based workflow

In early 2020, the whole engineering team moved to Trunk-Based Development. As shown in Figure 2, the idea is simple: to have multiple short-lived and small feature branches, and merge them to the main branch as fast as possible. This is reducing the merge risk as we experienced in the old way.

But moving to a new way of work is not easy. We tried and learned, and failed a lot. Learning from all the challenges we faced, these are little internal guidelines that we compiled for ourselves:

Small branch

The core of trunk-based development is easy-to-merge branches.

1. Pull request should be small

We agreed to limit max 10 changed files in a single branch. Some teams went further to limit to 5. Small PR is easy to merge and resolve, and also easy to review. Effective code reviews mean it’s easier to maintain the SOLIDity and Clean-Code-ness of our codebase.

2. Develop in phases

What if a feature is so big, it changes 50 files? Then we split the development into smaller tasks that can be developed and merged independently. If some tasks are tightly-coupled with other tasks, we can employ release toggle (more on this later).

3. Pull request should be forward-and-backward-compatible

New codes should be able to safely interact with old data or API. The opposite should also apply: old codes should be able to safely interact with new data/API. This is common sense, but it’s easier to do and verify this in a smaller context.

4. Yes, merge unfinished feature to the main branch

Ideally, a branch should live only 3 days at max. We merge them as soon as possible, even if the code is marked inactive with an OFF flag using a release toggle. Meanwhile, our high-coverage of unit tests and end-to-end integration tests are safeguarding all these changes to make sure existing behaviors are not breaking.

5. The main branch should be deployable at anytime safely

Every PR merge to the trunk branch should be a safe merge. It should not introduce new CI/CD pipeline errors and should pass all the automated tests. Once a production deployment release is set, we should never cherry-pick any commits to the release branch directly. This is dangerous (and evil).

Release toggle

I personally hate toggles because they add complexity to our codes. They add more paths and permutations to our system flows. But when we develop in phases, release toggles are our friends (with a love-and-hate relationship).

1. Release toggle is temporary

Release toggle is different than feature toggle. Feature toggle is forever. It belongs to the product and business team for business decisions. But engineering’s release toggle is temporary. We use them to manage development phases, for staged rollout.

2. Always toggle OFF unsafe codes

When we have more than one branch, always toggle off unsafe code so they can be merged safely to main. When unsafe code runs in production (failed to be toggled off), this means an incident.

3. Toggle should be OFF by default

By default, the toggle value should be false. False means off and safe. True means the feature is alive in the production environment and impacting users.

4. Don’t forget to clean up the release toggles

Since release toggles are temporary, we should never forget to remove them after a feature is fully alive in production. We should put the cleanup tasks as part of the epic development task.

5. Where to put toggles?

Both feature and release toggles should be placed in our Toggle microservice. It’s a universal service providing quick set-and-get toggle info for the whole Bukalapak. This way, all toggles are registered and visible in a single source of truth.

Preproduction Server

Since now the true source code is always closer to the main branch, there’s no point in maintaining multiple staging servers. Now for each microservice, we have only one preproduction server. Everything is there (like a singularity), and its environment is set to be as similar to the production environment.

1. Preproduction = main branch

Every time a branch is merged to main, it will trigger a deployment pipeline to the preproduction server. Unless it’s a matter of life and death, we don’t deploy other branches to preproduction. If a merge causing the main to be broken, then it is intended to have the preproduction broken right away.

2. Preproduction is stable for testing

Since preproduction always contains the latest “safe” main branch, the test engineering team can always assume the preproduction is safe. Since it’s almost similar to the production environment, all QA (quality assurance) operations can be performed there — like automated end-to-end tests, or the manual/exploratory tests (yeah, not everything can be automated, like OTP (one-time password) test), or user acceptance test with the business team.

This “guideline” is by no means complete. As we overcome new challenges in the future, we’re going to update this guideline along the way. Finally, quoting from my spiritual belief leader, as an engineer we should always strive “to boldly go where no one has gone before!”