The specifics of this post are about software development — but the general lessons are about leadership and communication.
Last year was the first time in my 10 years as a software manager that I haven’t been upset at the pace of engineering on the engineering team.
Let me define upset: that feeling you have when you’re in a conversation and aren’t being understood.
For ten years, I’ve been trying to make the case for a higher pace of software development. And for ten years, the engineers I’ve worked with have been hearing my case as some variant of “just make it shitty.”
Lots of managers are in this situation too — it feels like you’re in an argument with your engineers that you don’t know how to get out of.
Who’s at fault? It doesn’t matter. Just know that you can get out of it.
The case for faster development
The case for rapid iteration has been well hashed out in famous books, most recently in the Lean Startup. But the concept isn’t new. I learned engineering management from the Steve McConnell book, Rapid Development, first printed in 1996. I’m sure the observations of the value of rapid iteration go back centuries.
But if I had to summarize, I’d gravitate to a quote, backed by mathematical justifications, from The Principles of Product Development Flow.
“The perceived efficiencies of slower cycle times never pan out”.
Faster is safer because it requires working smaller. You make smaller changes you can fully understand.
Smaller is then faster still because it requires less coordination.
Radically faster development opens up new design opportunities — you can just build it and know for sure whether the design works.
The problem though is just that this mode of development is a leap of faith, and in many cases requires a change in habits.
What is a fast iteration cycle?
For the purposes of discussion, I’m going to say that a fast iteration cycle is one day. It could be a week.
But let’s imagine that it’s a single day. That gives everyone time to goof off, take a long lunch, chat with coworkers and still feel like they got something done.
Here’s a story about one of our iterations.
Coaches were asking for a per-user-notes field so that they could keep their own summary of a client’s goals rather than having to scroll through a long chat history.
What I just wrote, I’d call that a simple user story.
In traditional silo’d development, a design team would debate and research that user story. Then they’d pass the design over to an engineering team who would slot it into their development backlog, hopefully launching it within a few weeks.
Then users would start giving feedback to the product support team. If those comments were harsh, the support team might pass the comments back up to a product management team who would pass it back to the designers. In other words, a one-month cycle time for this feature would be a miracle. The real cycle time is probably four months between designing, building, launching and reacting.
Here’s how it worked in our world. An engineer started the work with roughly the same level of information that you have right now. She started at 8am and launched it at 10:30am. At 11:00am I posted the live feature to our Coach’s Slack in order to ask for feedback. They loved the idea, but hated the implementation.
At 12:00 we all went to lunch and talked about philosophy or movies or TV or something. At 1pm we came back to work, we looked at the feedback and decided to change the implementation. At 3pm, the engineer launched a new version and the coaches have been happy with it ever since.
In other words, the cycle time was 7 hours, including lunch.
That seven hour cycle time took a lot of work to get to. That work comes from an organization design principle called Change Elements.
If you want a team to change you can’t just ask — you also have to build in supporting elements that support the new behavior while preventing the old status quo. Much longer explanation here.
I visualize change elements like this:
In that model, everyone commits to a change to the status quo and then puts in place new elements that prevent a return to the old status quo.
Below are the change elements that were in place that allowed one engineer to take a one sentence specification, build a feature, get feedback, and rebuild that feature in the same day.
#1. Continuous deployment
The simple version of continuous deployment (that we use) is that every code commit for the web and server gets sent through an automated testing framework run by Snap CI. If the tests pass, then that code is deployed immediately to our production servers.
We also built an iPhone version of continuous deployment— each commit sent a new build to everyone on the team. Actually deploying to all of our users is blocked by the Apple review process.
More advanced teams build user metrics into the deploy process. For example, my understanding is that Flickr’s continuous deployment system deploys just to a single server first, checks that core usage metrics remain in range, and then rolls the change out to the rest of the servers.
#2. Feature flags
Practically no feature is understandable without production data. So either you can spend your entire life mocking up production-like data in your development server or you can wrap a feature in a flag that locks it down for a subset of users.
Most commonly, we use an admin flag to only show the feature to people on the team.
Feature flags are a way to remove excuses for not shipping and getting feedback. Almost always my feedback is “this moves us forward, launch it.”
#3. Basic unit tests
A friend of mine calls test driven development a “downward pressure on innovation.”
I think he means that an engineer will try to make a change, the tests will fail, and then they’ll end up spending their entire day trying to understand if the test has caught a bug or just that the test needs to be changed.
The reaction to that experience is “don’t make changes.”
That’s the opposite of what testing is supposed to do. You want your tests to create a feeling of a safety net rather than a roadblock.
The balance we found was just to test at a higher level. In Rails, that mostly means testing at the controller level only. This catches the “completely breaks a feature” bugs — which are pretty common.
#4. First version scope
One of the reasons for fast pace is accepting that product ideas are wrong a lot of the time. Real feedback on a real feature with real data from real users is always better than theoretical discussion.
But that real feedback can be prohibitively expensive if development is too slow. This is why I gave one day as my example of fast turnaround. One day is rarely too expensive.
Given that, my rule for the scope of the first version is that we should go with the implementation of what the engineer finds convenient.
If the feature rocks, it will get expanded and revised. If it doesn’t it will get abandoned or deleted.
#5. There are always alternative implementations
If someone tells you that a feature is too hard or will take too long, what they’re really saying is that the first implementation they thought of is too hard/long.
But there are always alternative implementations.
This is where I break out my rule of three.
This is a brainstorming observation that one idea is a bad idea, two ideas is an argument and three ideas is a brainstorm.
So, if someone seems blocked by the size of a project, you can lead them to a white board and start throwing out alternative implementations until you get to three. At this point the discussion moves more easily from blocked to brainstorm.
To get things moving I throw out things like “we could not do the feature” or “we could shut the company down.” Honestly, it doesn’t seem like the quality of the first few options matters — just that you break the false notion that you have limited options.
#6. Directional refactoring
Directional refactoring is knowing roughly what code standards and architecture you want to have. And then engineers refactor toward these standards when they are updating or adding code.
The core rule of refactoring is that you only change code that’s related to what you’re intending to work on. Otherwise you don’t really have the context to know if you broke something.
Refactoring code that isn’t related to a feature you’re working on is a lie — the correct word for that is rewriting. And rewriting is slow and error prone.
A simple Rails example for us was moving toward Interactors. Like most Rails shops we started in simple Model-View-Controller architecture and then ended up with fat models and fat controllers with duplicated code. Interactors help fix that.
We never had a halt to feature development so that we could rewrite code. Now, several years later, I count 183 interactors. That’s a pretty clear indication that refactoring over time worked for us.
#7. Have slack time
When development is taking a long time, everyone else in the company starts getting incredibly anxious. They feel like the company is falling behind and not learning fast enough.
This anxiety turns into pressure. And then suddenly every single minute of development time is accounted for.
However, if the product is changing and improving on a daily basis, then that company anxiety goes away.
The method above, about directional refactoring, requires that engineers have some amount of slack time. And that time is allowed for because they’re moving quickly.
#8. Ban the phrase “Technical Debt.”
Technical debt is a bullshit term that destroys rational discussion.
It’s the developer’s version of the question, “Do these clothes look good on me?” You can’t answer this question by saying, “No, you look ugly as hell.”
Same with technical debt. It’s hard to say, “Just work in a shit hole a little longer, please.”
Yes, the architecture and accumulated features do impact the speed of development. But the technical debt phrase is a lazy way of talking about that problem.
Instead, refactor the code you’re in without telling anyone. This is mostly what happened for us.
Or, say something specific and solution oriented like “this module has got a lot of edge cases — what do people think about refactoring it toward this other simpler architecture?”
#9. Instant user feedback
There a are a lot of ways to get quick feedback. If you can get that feedback within an hour of writing your code, then you can go back to changing that code without breaking your flow state.
If it takes a day or a week, then the feedback is probably coming too late. The engineer will have moved on to a new project.
Here are two ways to get instant feedback.
One — be the user. This is the joy of working on consumer apps. You can test the feature yourself because you’re a passionate user.
Two — have power users you can ask easily. We have a Slack team of our coaches and can release features there and get immediate feedback.
We post Trello and Github notifications into Slack whenever people commit code. What this means is that everyone knows whether you are getting work done.
It helps create a culture of fast iteration.
Now that we’ve gone remote, those notifications are basically the only way to show people that you’re at your desk.
I don’t mean this as purely manipulative social engineering — I often need this for myself as well.
And when the whole team is knocking out features, the feeling of momentum ends up being a lot of fun.
All of the above achieves a couple of goals:
- Rapid iteration.
- High quality — bugs are rare and quickly fixed.
- Evolving code base — code quality improves organically.
None of that is enough to guarantee your company is a success or that the final product is fantastically polished. But getting this pillar of the company right helped us out a lot.