Lessons learned from the government’s biggest attempt to fix tech procurement

Where I did not want to be spending my weekends. (Source: Google Street View)

Late last year, on a beautiful fall Sunday morning, I found myself in a windowless conference room in a run down DC office building pouring over a 900 page report and wondering where my life had gone wrong.

I was reviewing the technical evaluation results for the Department of Homeland Security’s procurement for agile software development, and preparing to make a decision on which companies would be awarded spots on the $1.5 Billion contract. Deciding billion dollar government contracts sounds like it should be glamorous, with high powered negotiations over fancy meals and Congressmen calling about jobs in their districts. In reality, it’s spending your weekend (and for most of my team, many of the weekends prior) sifting through mind-numbingly boring reports in order to prepare a slightly different mind-numbingly boring report to make sure that every part of every decision is documented ad nauseam.

DHS launched the Federal government’s biggest-ever attempt to implement a radically new way of procuring software development services, in the form of a $1.5 Billion, three-year contract vehicle that would have helped build everything from online immigration applications to import control systems. We did a lot of things right, most notably evaluating companies by having them design and build working software instead of submitting hundred-page proposals. We also made some huge mistakes which allowed the project to get out of hand. This led to the biggest disappointment of my time in government: the entire effort being canceled over a year in, thousands of hours of work from dozens of people reduced to a few pages of legalese on an obscure government agency’s website.

This is the story of how a former Google engineer dove head-first into the strange world of government contracting, and of what happens when you try to bring modern tech approaches into a system that’s designed around buying fighter jets rather than mobile apps.

Part One: Why in the world am I talking to you about government contracting?

If a government agency wanted to buy a helicopter, they’d start out by writing down all of the requirements for what the helicopter would need to do, then ask companies to bid on how much it would cost them to deliver a helicopter that met those requirements. Companies would submit long written proposals with their costs and how they’d approach the project. The agency would pick one, and the company would get to work. When they were done, the agency would test out the helicopter to make sure it flew and met their other needs. The company would be responsible for fixing the helicopter when it needed maintenance. Once it fully broke down or was obsolete and the agency needed a new one, this entire process would restart.

That approach makes a lot of sense for a helicopter, but think about when it’s translated to software development. A government agency that needs a new piece of software starts by spending months or years writing down all of their requirements for what that system needs to do. Companies submit hundreds of pages of written proposals, written more or less as marketing material and often with very little relevance to their actual work. The agency then takes several more months to review these proposals and awards a contract potentially worth hundreds of millions of dollars based solely on these written proposals and references. The winning contractor spends years building to the exact requirements the agency put together several years ago. Those requirements are fixed in the contract and tough to change. Once they’re finally done, if the software works at all, it’s likely already obsolete because the agency’s needs have changed.

Anyone who’s built technology in the private sector knows that this approach is doomed to fail. User needs, technology, and product requirements change frequently over the course of a project. The government’s approach closely follows the traditional waterfall model of software development, which has been out of favor in the private sector for well over a decade. Part of the answer is embracing agile development — an approach to building software that focuses on embracing change and delivering small pieces of functionality very quickly.

When we created the U.S. Digital Service, we realized that changing this approach was one of the most important ways we could have a lasting impact on the services government delivers to the American people. We released the Digital Services Playbook which outlines 13 principles for government projects to follow. We also invested heavily in helping the government’s procurement workforce learn and apply these new principles — creating an in-house procurement team and releasing the TechFAR, a handbook which shows how procurement regulations allow modern practices.

Agencies across the government started to embrace new approaches to contracting over the last several years. U.S. Citizenship and Immigration Services transformed a half-billion dollar contract with IBM into four smaller contracts where companies compete with each other for future business based on the quality of the software they were building. 18F in the General Services Administration created an Agile Blanket Purchase Agreement which asked companies to create an open source project on GitHub as part of their bid.

During my first year in government, I watched all of this with interest but spent most of my time working with important projects at the Department of Homeland Security to help them succeed. As my team grew, we started to notice a trend. More and more of the projects we were working with wanted to embrace these new contracting approaches, particularly evaluating actual code rather than lengthy written proposals, but their contracting teams didn’t have the skills in-house to do it. These projects had spent years building out the skills to do things the old way, which meant they were well equipped to read through thousands of pages of pitches but didn’t have enough government employees who could properly evaluate code and design. The requests for help started coming in, and it looked like we could quickly be spending all of our time evaluating contractors.

Part Two: Enter FLASH

(Side note: I may not have learned many modern tech skills during my time in government, but I am now EXTREMELY good at coming up with acronyms for things.)

Wrong FLASH. (Source: Wikimedia)

Having a big contract that many projects can use is a fairly common approach in the government, but we had a few new ideas to set FLASH apart:

First, we ditched the written proposals and evaluated companies based on how well they delivered working software. We made every company that bid come into our offices for half a day and work together with a government employee to iteratively design and build a simple product. My team of engineers, designers, and PMs would sit in and evaluate their practices. After they were done, we’d go over their code in detail.

This exercise, which we called a technical challenge, became far more complicated than we expected when we had over 100 different companies bid on the contract and come in to code. But it proved invaluable. Rather than reading documents which listed out buzzwords, we got to see if they actually could execute. For example, this meant that:

  • Instead of reading that they practiced test-driven development, we got to see in real time if they started writing tests early or snuck them in at the last minute. After the challenge, we reviewed their code to see who wrote working tests and who threw in assert(true) everywhere just to get them to pass.
  • Instead of scrolling through long lists of Agile Certifications, we got to observe their planning process firsthand and see how their team worked together.
  • Instead of hearing how user-centered they were, we were able to see how they talked to a user and incorporated it into their designs.

You can see how this was a big step up from written proposals. The longtime government employees working with us on the contract started out skeptical but became huge fans. One who had managed complex tech programs for decades told me that she had developed a good sense for when contractors were glossing over something important, but actually being able to ask an engineer to review their code and know for sure was a game changer. Some of the contractors started to tell us how much they hated spending months preparing proposals and how much they preferred to demonstrate their abilities by actually writing code.

Second, we aggressively sought to get companies that were not traditional government contractors to bid on the contract. This is hard, because the world of discovering and competing for government business is pretty complex and requires a lot of specialized knowledge. We spread the word about FLASH far and wide, and worked directly with newer companies to help them get registered as contractors and understand what they needed to do to compete. We didn’t show them any preference once they bid, but tried hard to get them in the door.

Finally, we set out to enable DHS projects to bring teams on board with FLASH extremely quickly. That means that from the time a project put out their request, or Task Order, they’d have developers writing code in roughly a month. This was unheard of in government, and many told us it was impossible. But with strong support across the department, we were able to streamline many of the steps involved in the ordering process such as various reviews and security clearances. We were also committed to keeping the number of companies we awarded spots on FLASH to pretty small — 8–12 companies. We wanted every company on the contract to be highly vetted and extremely capable, so that a small team at FEMA or the Coast Guard would be able to get quality services even if they didn’t have the people in-house to do their own tech review. Giving spots to too many companies would make that impossible, as every individual Task Order competition would be longer and more complex.

Still the wrong FLASH. (Source: Wikimedia)

We first announced FLASH in May 2016 and got feedback from companies about the process for the next two months. We released the formal Request for Proposals in August and ran the technical challenges in September. Shortly after that, I found myself spending my weekend reviewing the results of the challenges and deciding on the winners. We announced 13 winners in November 2016. It was one more than we originally planned, but we were confident they were all outstanding and could do the work.

We hoped it would be smooth sailing from there, but I soon became familiar with another important part of government contracting — protests. Government has an obligation to be fair in spending the public’s money. Companies that think they were unfairly denied a contract can issue a formal protest to the Government Accountability Office, an independent office that reports to Congress, or go to the courts. This is a good thing. It’s why, for example, the government can’t give away Federal land to build a new Trump Hotel without seeing if another company will give it a better deal.

Several of the losing companies had filed protests alleging that they deserved spots on the slate of FLASH winners. We reviewed their protests and our own documentation, and found there were a few cases where our justifications were lacking. So we underwent what is called Corrective Action — in this case it meant redoing part of our analysis of the different bids and making a new decision. This took a while, and in March 2017 (just a few weeks before I left the government), we announced a new decision with 11 winners. One of the original 13 had dropped out and we had decided one other shouldn’t have actually won.

A few days after that, as I was cleaning out my office and packing for a long vacation, we received even more protests on the new decision. Contract protests aren’t a one-time thing — companies have the right to protest after every new action the government takes. I was off to start my sabbatical, so it was up to others on the team to navigate the process from then on.

Part Three: The Cancellation

That made it all the more shocking when last week, the news came out that DHS decided to cancel FLASH altogether. After thousands of hours of time invested by my old team and others at DHS, and even more put in by the companies who bid, the Department determined errors in the process were so serious that they needed to stop it completely, rather than working to tweak things to find a solution, as almost always happens.

In many situations like this, government agencies make the problem go away by giving everyone who protested a spot on the contract. This can sometimes work, as each individual project using the contract will run its own competitions, or Task Orders, amongst the pool that make it through. If some of the companies didn’t deserve to be on the slate, they wouldn’t win these Task Order competitions and won’t actually get work. But it would have killed FLASH, as we wanted to allow projects at DHS to run quick competitions against the smaller number of companies on FLASH without needing to do their own deep technical reviews. Letting a few weak companies on just to get past their protests would have made that impossible

It’s incredibly difficult to see something I worked on for over a year meet this fate, but I ultimately think it was the right call. We screwed up, and the mistakes were serious enough that correcting them would have been impossible or turned FLASH into something that no longer did what it was supposed to do. DHS’s “motion to dismiss” (warning: PDF link) document, which effectively ended the process, goes into detail on some of the issues. Ultimately, I think there are a few foundational areas that got us to this point which future efforts should learn from.

To start, we went way too big with FLASH, way too quickly. We knew the demand for these services was high across DHS and released a $1.5 Billion contract that would serve many big projects for several years. This helped attract over a hundred bids, and we weren’t ready for it. While the techniques we were using had been proven out in other agencies, all of the different teams involved at DHS didn’t have experience with them and they didn’t scale the way we thought they would. We should have put a much lower dollar value on FLASH to start and run a larger contract later on.

We should never have gotten into a position where 100 companies were participating in half day technical challenges. Running the challenges became a logistical nightmare: we had to scramble to open a second location to hold everyone and had to usher over 1000 people in and out of our buildings over the course of just three weeks. We had to schedule contractors to come in for their challenge with very little advance notice, which led to some feeling they had inconsistent experiences. Multiple companies were participating in the challenge at once, meaning our teams were literally running between different conference rooms rather than focusing all of their attention on one team.

More importantly, the number of companies participating also meant we were dealing with hundreds of pages of writeups of the results of those challenges. In the rush to award FLASH and meet the needs of many DHS programs, we weren’t as thorough as we needed to be in producing these writeups. This resulted in us saying inconsistent things across different companies — one of the biggest reasons FLASH was on track to lose its protests. It also contributed to one of the more glaring errors that’s being reported — making edits to these documents after they were considered signed and final to try and fix inconsistencies noticed late in the game. I’m confident we picked the right winners, but we didn’t document that properly, and having a well-documented process is key to ensuring the procurement was fair.

To fix this, we should have used what’s called a down select — asking for an initial, less intensive application and then only inviting a smaller number of companies to participate in the longer challenges. We decided not to do this because companies could have then protested not making it past the first phase, but it ultimately would have been worth the hassle. In the future, I could imagine remote coding challenges for a down select that require companies to write code that hits provided APIs and automatically scans submitted code for evidence of basic good practices. This could help narrow the field before conducting deeper reviews.

Beyond the challenges, we completely screwed up the pricing portion of our evaluation. We asked companies to submit price ranges for a variety of different staff, but were vague about what skill levels we were looking for. These price ranges never would have mattered much because every future task order using FLASH would ask companies on the contract to propose specific pricing for that project. We needed to figure out a more useful way to evaluate price at this early stage, or just not care about it and focus on technical competence.

Conclusion: Where do we go from here?

These people couldn’t be more wrong. While FLASH was flawed, it showed the tremendous potential of running a contracting process that rewards excellence at designing and building working software, rather than competence in writing proposals and navigating bureaucracy. The technical challenges, exhausting as they were to execute, were viewed by government and contract teams alike as a far better way for companies to demonstrate their skills. FLASH also got many innovative firms thinking about doing business for the government for the first time. And the demand for contracts like FLASH isn’t going away. During my last few months at DHS, more and more projects were abandoning old school waterfall development and lining up to use FLASH or similar approaches to move to an agile model.

We could have run FLASH better as an individual procurement, but this experience also highlights the need to reform how technology procurements are run across the government. While debriefing FLASH, DHS’s Chief Procurement Officer said: “We had an agile process for doing the solicitation and the evaluation, but we fell back to a waterfall documentation process.”

This gets to the heart of the problem. We can innovate in how we recruit and evaluate companies, but ultimately procurement regulations require massive, waterfall-style documentation to be produced at the very end. The protest process enforces this practice — Government Accountability Office staff only get involved after protests are filed and do their work by reviewing final documents after decisions have been made, during months-long periods where no work can take place. What if instead of coming in at the end, during particularly new and challenging efforts like FLASH GAO auditors participated throughout the entire process, doing even more to keep the procurement fair without adding months of delays?

We also need to address the incentive structure behind protests in the first place. Let me be clear: protests are a vital part of ensuring government funds are spent wisely and fairly. But the way the system works today, the costs for companies to protest every decision they have even a shred of hope of winning are far lower than the costs to the government of seeing vital acquisitions delayed without end. A 2013 study showed that less than one percent of protests “resulted in the objecting vendor winning the work,” while the 1,600 protests in the period studied surely delayed many important government services for months if not years. Something is very wrong if companies find it worthwhile to file increasing numbers of protests with so little likelihood of actually winning the business they sought in the first place. There’s no easy answer here, but we need a system that allows the government to continue serving the public while maintaining the integrity of the acquisition process.

Finally, procurement challenges show the need for continued focus on bringing more technical talent into government service. For decades, the overwhelming majority of practicing software engineers and product designers working on government services have worked for contractors, not the government itself. This ratio will likely never completely flip (and it probably shouldn’t), but it has been far too unbalanced for far too long. A single non-technical government employee overseeing hundreds of contract developers (a fairly common occurrence) will never be able to make the best acquisition and project management decisions no matter his or her intentions. We need more engineers, designers, and product managers working as Federal employees and involved in contracting decisions. I’m encouraged to see USDS continue to receive funding from the Trump administration and support from the new Office of American Innovation. Bringing technologists into government isn’t a partisan issue, and there’s a lot more to be done.

Despite what some who have spent decades wasting taxpayer dollars would like to believe, modern software development isn’t going away. The failures of massive contracts, proprietary off-the-shelf systems, and expensive hardware are well documented and understood by Republicans and Democrats alike. FLASH shows that the process of integrating a modern technical approach with the government procurement system isn’t simple and will take time to perfect. But this work couldn’t be more important. If the government is ever going to stop wasting billions on technology and delivering sub-par citizen services, future acquisitions need to learn the lessons of what worked from FLASH and other early efforts, fix what didn’t, and keep iterating until they get it right.

Tech, data, and design for a better country. Former @USDS @DHSgov, @ObamaWhiteHouse, and @GooglePolitics.