Wrangling the Wild West
Growing up in a startup has been amazing. I had the fortune of joining FullContact in it’s infancy when things were still the wild west. We slung code as fast as we could write it, watched it fly onto production servers and scrambled to deliver bug fixes in a recursive fashion. All of us were on call and would collectively receive the pages day or night. It was all part of the gig and we embraced it.
As the company began to take off, the pressure to deliver continued to mount. We formed a larger posse of cowboys with each of us independently dueling against the clock to get to the promised land. Tech debt accumulated with each line of code and when we inevitably rewrote it, we hastily rewired the necessary components leaving the cruft for the future™.
Over time, things began to become harder to simply rewire and maintain; we started slowing down. Our phones were now blowing up more than once a day, we had a customer success team with questions, we pushed the occasional “500 the world” bug, and our visibility into our systems and “supported” functionality was next to none. Further, we had created a culture of silos where it was hard to jump around projects if Billy Bob was out sick or on vacation.
It was becoming clear that our wild west needed an upgrade.
We evolved in a number of ways, but one of the most significant boons to gaining stability was with a process, yes that dirty word process, which we called Fireman. The basic idea is that we have one person per week, acting as the team’s shield.
When you are the Fireman, this means that you:
- Are the first responder to any pages. You analyze the issue and either fix or reroute if necessary.
- Answer questions and triage ad hoc issues coming from our dedicated Slack channel for inter-departmental teams (Customer Success, Sales, Marketing, etc).
- Monitor a slew of dashboards throughout the week looking for un-detected anomalies.
- Address any functional test failures as soon as possible
- Are empowered to fix non-critical bugs, add tooling and expand the scope of test coverage, all in favor of doing feature-based work.
This may sound like a lot, and you’re right. But here’s the thing. We’re all in it together, including myself. We got each others’ backs and are learning from each other, each step of the way.
I have found that with the Fireman, the rest of the team can maintain focus despite all the questions, bugs and alerts. When on Fireman, you are learning more each day about how the customer is using your product, how to better engineer reliability and visibility in your systems, and you try not to pass the buck onto the next Fireman. A byproduct is that it begins to break down those silos since each person is sharing ownership and contributing to different repositories. Standards begin to emerge among the team as they realize better and simpler ways to do things.
This process builds both trust and accountability without having to create mandates or perform trust falls. Trusting that your peers are working to improve the status quo is a great feeling. It means you don’t have to be the sole contributor to adding tests or being the ‘know it all’ who is always getting PMed by sales — we’re all on the hook. Best yet, it allows a fluid and dynamic process to flow through your team. We are all working towards the common goal of stability while still servicing the external demands.
One of the hardest things for us was getting the process put into motion. Product didn’t like us not doing feature work and engineers don’t like embracing new systems. Frankly, it’s a tough nut to crack, but if you can build a case around the things that drag your team down the most, you will begin to detect themes. Take those themes to your team and figure out a set of guidelines that works for you and then start chipping away. In some cases you may find that the sprawl is too vast at which point you want more than one Fireman. Little by little though things will improve in those dark areas of despair.
Our success was not overnight. It took a series of incremental improvements to our Fireman process where we tweaked the responsibilities and general scope until we arrived where did. It took us about 6 months for it feel natural with a team of 4–5 members.
3 years later, it is still in action and has been incredibly successful. The success on my guinea pig team gained adoption by most of our engineering teams. As a result of our efforts, our phones rarely ring, inter-departments know who and how to reach us, our systems are now industrial strength and standards organically emerged to where we are now comfortable in multiple areas.
This post first appeared on the FullContact blog.