Let’s Return Upfront Design to Software Development

If you think the quality of apps and operating systems is declining fast, it’s not your imagination.

18 min readFeb 18, 2016

I’ve been writing software professionally for almost forty years, and in all that time I’ve found that there are two things that divide the winning projects from the losers:

winners have a great upfront design, and
winners have the discipline to implement that design.

That’s it. There’s no magic formula. There is no magic methodology, despite the army of consultants who will tell you otherwise. There are no magic development tools or languages; such things are relevant only to the extent that they might help you implement a great design.

The prevailing culture

The most unfortunate misconception in software development today is the idea that there is little or no design involved in writing software. In the minds of most people, it seems, software is nothing more than an amorphous blob of text — keystrokes typed into a keyboard. There is nothing solid about it at all, nothing that cannot be changed at a moment’s notice. If you write some software and it turns out to be not what you want, you just change it. It’s just text, after all, right?

This is perhaps the greatest and costliest fallacy in the business world. It’s a belief that could be held only by non-technical people and by the most fantastically naïve and inexperienced programmers. Anyone with any real industry experience who has written anything larger than a simple class project knows differently, or must be too dense to see the obvious.

The primary force behind this failed idea is a pervasive ideology called “Agile”. According to an article entitled “Agile Design”, the entire extent of the software design process should be limited to something called “Model Storming”, which is “light-weight modeling for a few minutes on a Just-In-Time (JIT) basis to think through an aspect of your solution.” That’s it. According to this bit of published wisdom, there is no design problem that cannot be solved by persons of unknown qualifications brainstorming for just a few minutes. This same article also states that software design is “emergent” and “not defined up front” and that “your unit tests form much of your detailed design documentation”. All of these notions are utterly preposterous, but the non-technical reader, which includes many if not most managers of technical staff, cannot grasp this.

It is difficult to imagine the incompetence and inexperience of a person who would make such statements, and who would further have the bravado to commit such nonsense to print. Do you really believe, in your heart of hearts, that complex technical problems are this easy to solve? Could anyone imagine such a thing in any other field of endeavor? Would you believe that a working car can be designed by small teams of junior craftsmen, iteratively “modeling” small components and eventually assembling them? How about a building — could we build a skyscraper by designing the whole thing one room at a time and then incrementally adding more rooms?

Imagine building a bridge using the stated Agile methodology and the staffing practices of today’s typical software project. We’d hire some contractors, preferably junior ones to keep costs low; the only hiring criterion would be the applicant’s familiarity with the tools we’re using. We’d set up a task system to funnel tasks from the project manager to the builders. Requirements would be collected; they would consist primarily of the stated need to have a road that crosses the river. Construction would start immediately. Nobody would spend more than a few minutes per day, iteratively conceiving a “just in time” model of the bridge design. Contractors would just start on one side of the river, improvising a bridge day by day, tacking on piece after piece. Every time the initial section got large enough to collapse, we’d spend a few minutes improvising another approach. Never mind the casualties.

Preposterous, you say? Doomed to failure? Of course. Only in software could we fantasize that such a method would produce a workable result. But fantasize we do, day after day in thousands of software projects around the world. Yes, in software you can actually get away with it, at least for awhile. In software you might actually slap up something that looks like a bridge and it might even work for a short time, but only for a short time; the changes and pressures of reality would quickly bring it down and thereafter we’d need an army of workers to keep propping it up.

The obviousness of this failure has not prevented the wide adoption of Agile, however. Agile is popular because it’s exactly what most of the people in the industry want to hear. Non-technical managers desperately want to believe that they can manage technical projects by simply following a process. And for whatever reason, most programmers despise any form of design; they want solely to crunch out code, and they think fixing an endless stream of bugs is a normal part of programming. Perhaps most of all, they hate documenting and refactoring their code so as to make it readable to others. They get a kick out seeing code run; long-term consequences of their coding practices are of no concern to them. Few managers understand anything about programming, and so they cannot fathom the consequences of these behaviors, much less rein them in. So we have a culture in which the innate undisciplined desires of junior programmers control the dominant practice, and we have an ideology that has become popular by extolling such practice.

There’s another way …

In my first major project, where I was the team lead, there were two different kinds of processors being used in one solution. Another team had been working on a software package to support one of the processors, and my team was asked to copy that package as closely as possible for the second processor. I promptly went to the other team to get a copy of their software so that we could adapt it.

When I reviewed the software I realized two things:

the other team’s design was unworkable and
the other team was apparently not disciplined enough to even organize its code properly, let alone apply formal design methods.

I spent many hours in analysis determining this, plus days more redesigning the project. I knew that if I used the software base I’d been given that we’d have no chance of success. A solution of this complexity required mathematical modeling, without which there could be no reason to expect the software to work. Even as a recent computer science graduate I knew that you cannot improvise or iteratively develop a complex solution by just brainstorming or sitting down and writing code, any more than a sniper could hit a target a half mile away by hip-shooting with a pistol.

The story has a happy ending, for my team at least. We developed the package within a year, and it was an astounding success. The story with the other team, which had started before us and employed more people, was not so happy. They lingered on at least a year beyond the end of our project. Eventually another programmer from outside both projects was assigned to adapt our package to the original team’s processor. So our solution ended up being the total solution for both processors; the other team was folded after having burned uselessly through probably a million dollars.

If you’re not in from the start …

More recently I was on a project where I didn’t have the opportunity to design things from the start. The code had been developed using today’s prevailing methodology; it was written directly off of the requirements, with no intervening design phase. The project suffered from the usual malaise:

The code was very difficult for a new person coming on the project to comprehend.
It contained multiple and sometimes inconsistent implementations of the same functions.
Making changes was difficult and risky because system behavior was unpredictable.
There were frequent failures because nobody could truly understand how the code worked; much of it was a nearly incomprehensible labyrinth.

This is the fate of every project I’ve ever seen that used seat-of-the-pants design. The only way out was to spend some serious time — days at a time — thinking about a structure that would actually work. I tackled the problem piece-wise by identifying each set of functions that needed to be re-implemented and then building new sub-systems, one at a time, to replace the existing code. I chose to rewrite rather than refactor because the original code was too confusing to refactor. Using this piece-wise replacement, I was able to use up-front design on a project that was already well underway.
Oh yes, the result. Our error rate, along with the financial losses of the company due to errors, dropped to approximately zero within a year.

Why incremental design generally doesn’t work

People get sucked into believing in incremental design because on a very small scale it works. You can actually build working trivial software solutions by just sitting down and coding. This is all especially attractive to junior programmers, who often have no aptitude (and consequently no enthusiasm) for designing things; it’s so easy for them to believe that real-world problems are all as simple as the trivial exercises they were given in their 12-week boot camps.

An example of how this backfires: a common design problem in software projects is the absence of a mechanism for deleting data. I find that most people in most organizations, including account service agents, business analysts, project managers, department managers, and programmers, are almost incapable of imagining data deletion until the need arises. Clients don’t imagine it either. The order of popularity in the stated requirements is always:

Adding data
Updating data
Deleting data

Usually “deleting data” is so unpopular that it isn’t mentioned at all; the first time anyone thinks about deletion is after delivery, on the day when the client calls asking how to delete something.

Projects tend to fear actually deleting records, so they almost always opt for “soft deletion”, which is where you have a “deleted” column in each table indicating that the record has been deleted and therefore should be ignored by all readers. The project was developed iteratively, of course, which means that no experienced designer was hired at the outset to anticipate this simple problem.

The developers solve the problem by adding “deleted” columns to the tables requiring deletion. In keeping with Agile minimalism, they do this only for the currently required tables; if more tables require deletion later, the programmers will have to undertake this process again. But we don’t worry about the inconsistency or the future effort; with Agile we leave tomorrow’s problems for tomorrow.

The problem with this solution is that it doesn’t work. The project is already built in standard haphazard distributed fashion. There are dozens or perhaps hundreds of queries scattered throughout the database code, the middleware, and possibly even the UI code, that do not know about the new deletion flags; all these queries continue to recognize the records that are marked “deleted”. Because of the unsystematic and non-centralized nature of these queries, finding and fixing them all is somewhere between prohibitively expensive and impossible.

Even if someone actually finds all the queries, they’ll be fixed in the quickest, dirtiest manner possible: to each query the developer will add an independent test for the “deleted” flag (rather than redirecting it to a centralized view or other mechanism that implements that test in one centralized place). When our system requires yet another change in the way that data is read, we’ll have to repeat all this effort. It’s called “rework”, and it’s one of the many hidden costs of building software badly.

Perhaps the saddest part is that nobody on the project realizes that there was another way to do things. The non-technical department managers cannot even imagine how it was possible to build a system in which data could not be deleted. Even if they’ve been through the same problem before on a previous project, the programmers, who are prisoners of the incremental design ideology, cannot imagine that such a thing could have been anticipated. Finally, the low level project managers measure success in terms of how many sprints they execute; they have no conception that it would have been even more successful to have avoided the need for this new set of sprints in the first place.

Of course it is not possible for even an experienced designer to anticipate every future need of a project. But a competent upfront designer can spare you the cost of childish mistakes like this one, and further can design your system such that unanticipated needs can be more easily accommodated.

How software gets built today

Microsoft distributes a bunch of free videos that help developers use Microsoft’s development tools. Usually these are very good, but recently I watched one in which the presenter was marveling about the wonders of modern software development. He stated that he was overjoyed about how today’s developers no longer have to bother with building boring infrastructure. I was aghast. Really? Does this guy also believe that Santa Claus delivers presents every Christmas, mysteriously free of charge to everyone? Apparently this presenter is unaware that his projects are indeed building infrastructure, it’s just that now they’re building really crappy infrastructure.

Here’s how software gets built today. The non-technical account services team collects requirements from the non-technical client. These requirements get translated by non-technical project managers into tasks, under the mistaken idea that requirements translate one-for-one into independent bits of code. The project managers generate no tasks devoted to building any sort of common infrastructure because “common infrastructure” is not a requirement. Viola! We’ve built a project plan that is unencumbered with any design tasks. Through the magic of Agile, we’ve simply removed “design” from the bad old Waterfall paradigm (requirements, design, implementation, testing). No wonder we have more time to code, test, and fix an endless stream of bugs than we used to!

Two or more tasks get generated that both require access to the same data, and these tasks get farmed out to two different developers, each of whom has no idea what the other is doing (since their communications are limited to a five-minute slot each during a “scrum” in which they are admonished to discuss blocking problems only). So the two developers independently develop the infrastructure required to access that data. Having no centralized design for doing such things efficiently, both implementations might be inefficient, and if they are complex enough they will almost certainly disagree. The project now has two disparate implementations of the same function. Since the project has no central repository for infrastructure, nobody except the two implementers knows where these bits of code are located.

Now multiply the above scenario by hundreds or, in large projects, by thousands. Instead of being centrally managed, well controlled and efficiently written, the project infrastructure is massively replicated in bits of code that are scattered throughout, embedded in whatever features happen to need the functionality. The implementations were conceived in a big hurry and so are not necessarily efficient, consistent, or even correct, and furthermore each one is buried within other functionality and so it is not separately unit-testable. But everyone is congratulating himself because he made his sprint on time. By skipping the design phase, the project has managed to throw up more code in less time than a well-designed effort.

The bill comes due when the infrastructure needs to change. Nobody even knows where the infrastructure is. The many copies of the data-access code may be written in different languages and distributed among various sections of the application and/or the database, so there is no deterministic way to search for it; you cannot know whether you’ve found it all because the code is not organized in such a way as to convey a sense of totality.

My physical analogy is to imagine a car being built using the Agile prescription. There are five sets of user-visible systems requiring an underlying electrical system: the radio, exterior lights, interior lights, instruments, and engine. These tasks would be farmed out to five junior builders, each of whom would incorporate his own electrical system to drive his specific component. None of these ad hoc electrical systems would work particularly well, since electrical systems are not the specialty of their implementers. There would be no call for a single, efficient, unified electrical power supply because there is no “power supply” in the requirements, and the subject of building common components simply does not arise in team meetings. So we deliver a car with five batteries and alternators. In the world of cars, where such design flaws are visibly evident, the product would be rejected on the loading dock. But in the world of software, where fantastically huge and ugly structures are completely invisible to the buyer, even horrific design flaws are undetected.

At a higher level, there is simply no way for the typical modern development project to deal with the behaviors of complex systems. In order to write a large system that does not constantly exhibit unpredictable behaviors, you need designers who are capable of building mathematical models (e.g. “finite state” models) of the system behavior well before the system is coded. Without such modeling and without the resulting architectural specs (like the architectural specs for a bridge), there is no way for the developers to structure their code so as to actually work.

After coding, the project moves on to testing (though in a “test first” project the tests are written first). The modern project strategy is to farm out the test development tasks in much the same way it farms out coding tasks. It ends up with a large number of distributed unit tests; integration testing is often left to QA staff who tend to perform ad-hoc tests by hand.

The modern project prides itself on how much testing it does, completely unaware that it would be many more times more efficient using formal modeling and proper architecture, which would enable the project to effectively do orders of magnitude more testing using orders of magnitude fewer tests. In the real world, there are potentially billions, trillions, quadrillions or more combinations of external stimuli to which the application might be exposed. There is no way to construct enough tests to anticipate all these combinations. The only way to deal confidently with this complexity is to use formal models and a highly disciplined architecture that strongly limits the number of states that the application might assume in response to all these possible stimuli. In a well architected project (where behavior is controlled by a very small amount of well-organized infrastructure), it is possible to impute the system’s responses to quadrillions of combinations of stimuli using just a handful of tests. In the poorly or non-architected project, the only way to be sure of such things is to write quadrillions of tests.

A depressing situation

The business world today is caught up in a fantasy of reducing costs that cannot be reduced. I refer them to Einstein, who said, “Everything should be made as simple as possible, but not simpler.” When you run out of firewood, the imbecile points out the exceedingly clever low-cost solution: just burn your own furniture. Today businesses everywhere are saving time and effort by burning their own furniture, incurring future liabilities that are far beyond their managements’ imaginations. It is hard to watch.

Recently I had lunch with a friend with whom I formerly worked at a consulting company that is frequently called in to fix software disasters. The company mainly hires very capable senior guys, and while I worked there I don’t recall any discussions about methodology. Senior programmers who write software that really works, I find, tend to discuss design. At the end of our lunch my friend summarized the consulting business: out of every three projects, two will suck and one might be fun. That sounds about right; I’d say two out of three projects I’ve seen are nearly intractable messes.

I hold little hope that this article or anything else will do much to change the landscape of software development. The PR in this business is completely dominated by the Agile consulting industry, which is so powerful that it has littered the Internet with so much propaganda that it is almost impossible to find any competing viewpoints. If you search specifically for articles that either disparage Agile or that support some competing idea, you will find hundreds of high-ranking industry-authored articles that are cleverly written to first sympathize with your point of view and then to very professionally coach you back to the one and only Indisputable Truth, which is that Agile is the One Way to do software development. I am unaware of any competing consulting industry that supports the commonsense notion of good design, which is not surprising. There just aren’t enough competent software designers to properly serve the now-enormous software development industry, and those few who exist are busy designing software, not running management consulting firms (how many Agile consultants could write a single line of working code?).

The entire “design as you go” philosophy, regardless of what name you slap on it, is utterly bankrupt, and if you follow it you stand a good chance of going bankrupt as well. There are some very limited small endeavors where you can iteratively design things on the fly — creating game characters, for example. But notice the characteristics of this example:

the task is a very small one;
the thing being designed is merely a small component in a larger system that was not designed in such a manner;
the larger system provides a well-defined interface into which rapid prototypes can be easily inserted;
the success of the component is either highly subjective, or cannot really be determined until after you’ve implemented it, or both.

So practice all the methodology you want. Perhaps you could even lead the next methodology bandwagon. You could start by changing all the names of all your existing practices: your morning meeting could become a Scrimmage and the person who runs it could be the Coach or maybe the Quarterback. Perhaps, in addition to limiting team communications to a few minutes while standing up, you could require everyone to dress in a clown suit or do the Hokey-Pokey. Shorten your sprints to one week instead of two — wouldn’t that logically make you even more responsive (after all what is magical about two weeks)? Maybe you could get your sprints down to a day. Who knows?
Have fun. But I won’t be betting on your success.

Where are we heading?

It is hard to imagine any substantive change occurring while management remains bamboozled by the superficial appearance of short-term success that Agile delivers. My sister Katherine, who writes complex embedded software for the cell phone industry, puts it this way:

“My design-first approach did take much longer to show even one piece of code, but by the deadline, my code was finished and had very few bugs. The ad-hoc agile programmers had pieces of code much earlier than I did, and therefore management was more comfortable with them during the development process. But in the end, it became clear that programmers who did the upfront design produced code with drastically fewer bugs. I think it is partly an issue of trust. If the project timeline is 6 months, and one programmer spends the first 4 months doing design work and writing high-level and detailed designs, and doesn’t produce any code till month 5, there is a large leap of faith required by management. It makes them nervous. Contrast that to the programmer that can show bits and pieces of something small that partially works after just a week or two. Management tends to be a lot more comfortable with that. I think it boils down to a problem of human nature. Most people, including programmers, do not have the desire to do design work. It is a long, tedious process, and requires a great tolerance of delayed gratification. Most programmers therefore will do everything in their power to hold onto a process that does not require design work or documentation, because they hate both.”

There is no need to worry about my sister’s job security, by the way. Despite her finishing her working code on time, her managers always find plenty for her to do following the project deadline — she usually has months of subsequent employment helping the other coders fix and test their broken stuff.

Agile pays lip service to “design”; in fact one of the stated principles in the “Twelve Principles of Agile Software” is, “Continuous attention to technical excellence and good design enhances agility.” We couldn’t agree more, except that essentially everything else in Agile seems dedicated toward excluding “design” from the software development process almost entirely. You don’t need to take my word for it; just read any of the voluminous tomes on Agile and you’ll see either no mention of design at all, or admonitions to absurdly minimize design effort. In my experience, in the most heavily Agile-influenced projects the developers almost never discuss design; their focus is entirely on process and methodology.

Before buying things, customers care about requirements. After taking delivery, customers care about design or at least the obvious effects of design. Read some Amazon product reviews and see how much ordinary people talk about the designs, both good and bad, of the products they receive. Agile claims to have the customer’s interest as its highest focus, while simultaneously deriding the one thing that customers most seem to care most about after delivery: the design, or in the case of software, the effects of design. Agile says we should deliver “Working software over comprehensive documentation”, but alas leads us down a path that makes it almost impossible to deliver working software.

Since 1980 the Japanese (and now also South Korean) auto industry has been decimating American auto makers’ sales. The foreigners have done so by delivering cars that can easily run for 200,000 miles or more without requiring major maintenance. The average car buyer has no conception of how Honda, Toyota and Hyundai do this. To get even the faintest concept of their methods you’d need to read a book entitled, “Dr. Deming, the American who taught the Japanese about Quality.”

It turns out that car quality is all about fanatical attention to design. It’s about using mathematical modeling and other design techniques that are far beyond the imaginations of most people. It’s about employing up-front design to a degree that most people, including most of the people building cars, could scarcely imagine. To put it simply, Honda does not have junior line workers designing auto parts and assembly processes on the fly.

Until recently when the American auto industry thought about design, it was all about styling and about building something that looks enough like a car and runs long enough that the customer will sign the paperwork. This seems consistent with the ethos of most of today’s software projects. In contrast, when Honda thinks about design, it is conceiving of how to assemble transmissions in such a way that the tiny statistical variations in the parts’ physical properties cancel each other out in order to reduce destructive vibration. This is not the province of amateurs.

Line workers don’t design working systems in software any more than they do in the automotive industry. Perhaps someday there will be a revolution that will replace today’s dominant software ideology. If so, I suspect it will be led by some major software vendor that produces demonstrably more reliable products than everyone else. Perhaps you will lead the revolution.