Sixty Prod Pushes in a Month

Published in

FordLabs

7 min readJun 4, 2020

At FordLabs, we were working on an engagement with the customer experience division at Ford. The project itself was referred to as “Owner”, and the broadest description of our goal was “to improve the owner experience.” That ultimately manifested in the building of https://myfordvehicle.ford.com. It is a portal for getting information about your vehicle, initially by its year-make-model, and later by its VIN. At the time of writing this the 2020 Ford F-150’s page looked like this.

FordLabs’ participation in this project wound down last week, and a part of that winding down included a lot of reflecting on what went well and what didn’t go well. We walked away from this project with many lessons learned, but I want to focus on one in particular: continuous deployment and how it led us to success. Between April 24th and May 29th we did sixty-three production builds. At FordLabs, we aspire to ship to production as often as possible, but never in my nearly-two years at the office have I come close to doing it that many times for one product, let alone in that short of a timeframe.

We didn’t start out like that, though. Our first code commit was in March, and by early April we were in production. From there, we repeatedly ran into a problem with our deployments. Whenever we wanted to deploy to production, we would manually trigger a build in Jenkins. This would use the latest commit to create a production bundle of our website and then deploy it. This pipeline didn’t run automatically at first, but it didn’t need to. My pair and I could complete a story, have it reviewed for design and functionality, and then send it to prod. If changes needed to be made, we could make them, review, and deploy.

Unfortunately, this workflow makes the assumption that features can be reviewed as soon as they are finished. Rarely was this the case, so there would frequently be multiple stories prepared for review before they could be looked at.

This was further complicated when stories were rejected as a part of the review process. Imagine a situation where Feature B was rejected, but Feature A and Feature C were both accepted.

Our Jenkins pipeline was configured to send the most recent commit. That meant it would try to deploy Feature A, Feature B, and Feature C in this case. Since that meant shipping a rejected feature, we were forced to fix Feature B before we could ship two working features. There are plenty of things we could have done to solve this problem, but we chose to hold deployments until all stories were ready to go. Now, suppose we had Feature A, Feature B, Feature C, and Feature D in our backlog. Our product manager would arbitrarily pick a cutoff point, let’s say Feature C. We would do all the work for Feature A, B, C, and then we would stop. Once all the stories were approved, our PM gave us the go-ahead to deploy. Then work could begin again.

It wasn’t the best solution, but the problem outlined above didn’t happen to us often enough at the start for that to matter. Our designs were low-fidelity, and the functionality was simple. There was also only one pair of engineers actively committing code at the time.

All at once, however, our designs became higher fidelity, our functionality less simple, and a second pair of engineers joined the endeavor. In the same timeframe we got more stories done, but had more rejections, which increased the size of our batch and the time between deployments.

With those changes I made a proposal: we would do the work for a feature on a branch. Every branch would get it’s own deployment for the designers and PM to use in their review. Once the story was accepted we merged it back into master. Every push to master would then trigger a production build.

In the past, teammates of mine had been skeptical of using branches, so on those teams we often used the workflow I originally outlined for the Owner team. They had good reasons too. Branches often lived too long, which was a problem. When a branch lives for a long time, it is likely to include changes from many files across the codebase. Additionally, if the team is using a branching strategy there’s likely multiple streams of work going on at the same time, probably touching the same code. When it comes time to merge them into master, there are going to be merge conflicts. A merge conflict is especially dangerous when you don’t have the context around the code you’re merging in, making it an easy place to accidentally remove a feature or introduce a bug.

In my experience, a long-lived branch tends to grow out of a story with a large scope. The Owner team used “T-shirt sizes” for our stories, meaning a story could be small, medium, large, or extra-large. Branches that lived for a long time usually came from large and extra-large stories, so the simple solution was to break them down into small and medium sized stories.

At first, the website allowed a user to enter their vehicle’s year and model, and then they were sent to a page that displayed the relevant information. The next feature we wanted was to allow them to enter their VIN to get more specific information about their vehicle. Specifically, we wanted to display the recalls and field service actions (FSAs) for their vehicle. This started out as one user story written like so:

As the owner of a vehicle
I would like to be able to enter my VIN
so that I can get specific information about my vehicle

This story seems pretty straightforward. The flow could be imagined like this:

The user enters their VIN on the homepage.
They are redirected to /vehicle/vin/1234567890ABCDEFGH.
Their data is shown to them.

When we looked at how this flow could work from an engineering perspective, it gets more complicated:

We would have to add an input for VIN on the homepage form, which includes an entire redesign of the form.
We would have to create a component that gets rendered when the user hit the appropriate URL
Upon landing on the VIN page, we would have to make a request to get the year-make-model of the given VIN.
If that request failed, we would have to add an error state for an invalid VIN.
We would use the year-make-model to get generic information for that vehicle.
If this request failed (which usually meant the vehicle isn’t supported in the user’s region), we would have to create an error state to handle it.
At the time, we could get VIN specific recall and field service action (FSA) information, so we would have to create a service to make the request, and add the tile to the page.

The story as written would have been extra-large. “What stories are hiding in here?”, we asked ourselves. We eventually broke it down like this:

Navigating to /vehicle/vin/1234567890ABCDEFGH shows the vehicle's vin and all the default information for the corresponding year-make-model. If this request fails, just navigate to the 404 page for now.
Update homepage to include form for entering VIN.
Display the recall information.
When the VIN decode fails, navigate to a page saying that an invalid VIN was given
When the VIN decode succeeds, but the vehicle isn’t supported in the current region, navigate to an error page describing that information for that vehicle isn’t available

One extra-large story became one medium story and four small stories.

Not only do smaller stories make it so developers don’t conflict as often, they allow us to parallelize our work. Of those five stories, the first one had to be done first. From there, the stories really could have been done in any order, and could be done at the same time without causing conflicts with the other developers.

As we wrapped up our final retrospectives and reviews with our management, product owners, and stakeholders, we received praise for a number of things, but by far the loudest praise was for our ability to iterate and ship to production so quickly and so often. I said earlier that the solution was simple: “just write smaller stories.” One person alone can’t make this happen. The team wanted to put stories in production as soon as they were ready. For that to work, we needed a branch for each feature. For that to work, we needed branches to be merged back into master frequently. For that to work, we needed small stories.

The technique succeeded for us where it had failed for others because the team put in place a system that supported its success.

Sixty Prod Pushes in a Month

Written by Christopher M. Boyer