Software: Sand Castles or Concrete Structures

I have always been uncomfortable with the fragility of software — make a small change and the software can grind to a halt. In this light, I have heard arguments about why engineering software is different from engineering buildings. However, I was never convinced by them cos’ I was never able to put my finger on such difference, if it truly existed.

When I was reading May 2017 issue of Computing Edge magazine, two snippets from Gerard Holzmann’s Code Craft section of the magazine caught my attention. Contrary to the original purpose of the snippets, they kinda helped me understand the difference between software engineering and physical engineering disciplines.

Why build sand castles?

“The often-made analogy between constructing a bridge or a house and constructing a software system is therefore flawed. Building software is perhaps closer to constructing a sand castle at the beach or playing a game of sticks. Building sand castles instead of concrete structures requires different skills and tools.” (snippet 1)

While I agree with Holzmann’s observation that building sand castles instead of concrete structures requires different skills and tools, my concern is why are we building sand castles instead of concrete structures? We do not build our buildings to be fragile like sand castles. We build them to be robust. So, why do we want software to be fragile like sand castles?

Since we are incorrectly committed to building sand castles, we realize building sand castles instead of concrete structures requires different skills and tools. Consequently, we incorrectly spend resources to improve how we build sand castles.

Don’t get me wrong. I agree with Holzmann — we should improve our ways of building software . However, before improving our ways, we should change our way of building software. Specifically, change our way to build software to be like concrete structures — robust to both internal and external changes.

This brings me to the second snippet.

Is engineering software system inherently harder than engineering physical structures?

“Modelling and analyzing the design of a physical structure might be easier than modeling and analyzing the reliability of a complex software system. I say this because a large bridge or tall building is unlikely to come tumbling down if a single rusty nail hiding somewhere deep in its internal structure unexpectedly breaks. Similarly, paint peeling off at one end of a bridge is unlikely to cause the roadway to collapse at the other end. Yet this type of thing can happen in large software systems.” (snippet 2)

Software is a human creation with (almost) no physical representation. It is all based on man-made concepts and artifacts. Unlike engineering a bridge, we do not wrangle with physical forces in software engineering. Hence, we should have absolute control over engineering software and have highly predictable engineering process. Yet, we observe that modeling and analyzing software systems (an integral part of engineering process) is harder than modeling and analyzing physical structures. What are we missing?

Let’s look at it from another angle. What makes a bridge not crumble when a rusty nail in it breaks or its paint peels off? Here’s what I think.

  1. Each component used in the bridge has well-defined characteristics, e.g., what can it do (e.g., bind dry surfaces), what cannot it do (e.g, not bind smooth surfaces), what can it withstand (e.g., water), when will it break (e.g., acetone), and how will it fail (e.g., liquefy). These characteristics make components readily usable off the shelf.
  2. Each composition of components used in the bridge has well-defined characteristics, e.g., its intended purpose (e.g., buttress), what can it withstand (e.g., maximum load capacity), what is required of its components (e.g., rigidity), when will it fail (e.g., lateral force), how will it fail (e.g., buckle), and how should it be constructed. These characteristics make compositions readily applicable off the shelf.
  3. Construction engineers are well-trained to install components and compositions to construct the bridge. Further, they have techniques and processes to test if installations will serve their purpose as intended by the design of the bridge.
  4. The bridge experiences various stressors, e.g., strong winds, ground tremors, vehicle movement. Even stresses due to failure of components, e.g., a bolt comes loose. The influence of such stressors on the bridge are systematically considered and factored into the engineering of the bridge.
  5. Determining the exact influence of stressors in all possible situations on various components of a bridge at various points in the lifetime of the bridge is (almost) impossible. Likewise, it is impossible to determine exactly if and when a component can break. Hence, the engineering process overcompensates to make the bridge robust against such possibilities.
  6. Consequently, when a bolt comes loose, it does not take the entire bridge down cos’ other bolts bear the extra load and keep the fastening intact (at least until a well-known load limit is reached). This can be viewed as limiting the scope of influence of a component. Both this and the overcompensation are possible only due to the ability to determine, quantify, combine, and reason about the scope (including magnitude) of influence of components on other components and on the entire bridge (at least in aggregate).

If the above observations are indeed true about physical engineering disciplines, then they seem to be serving physical engineering disciplines well cos’ most of the concrete structures are still standing :) So, why aren’t we embracing these observations in software engineering?

I think we have tried to embrace these observations in software engineering but we haven’t got it right yet. Reusable abstractions — function , objects, and components — are intended to serve as components (1 listed above) while design patterns and component frameworks are intended to serve as and enable compositions (2 listed above). However, most realizations of these concepts lack well-defined characteristics (e.g., performance, security) that would allow them to be used in an off-the-shelf manner. Consequently, it is hard to train engineers to use and deploy these realizations in a predictable manner (3 listed above). Further, it is hard to reason about these realizations and the systems built using these realizations (6 listed above). Consequently, it becomes harder to reason about how stressors can affect the systems (4listed above) and how to overcompensate to ensure robustness (5 listed above).

So, no, software engineering is not inherently harder than physical engineering disciplines.

As for the reasons for this acquired hardness, I suspect the concepts we use could be wrong or the realizations of concepts could be wrong or both. However, my strongest suspicion is the lack of understanding of the relations between concepts/realizations and how a change in one concept/realization affects the another concept/realization. This is the one we should tackle first — gain an understanding of relations between concepts and relations — as it will help us fix the situation or prove why our concepts and realizations are wrong and need fixing.

Can we do better?

While it may be hard, we can do better. This would require us to revisit, reconsider, and replace or redefine prevalent concepts and constructs used to describe, build, and reason about software.

Considering the recent changes in software development community, we may already be doing better by exploring and adopting “alternative” programming concepts (e.g., immutability, purity, explicit ownership), containerization and microservices (e.g, isolation, explicit scopes, predictability), and web APIs (e.g., composability).

While I am not sure if the current changes will suffice to build software to be like concrete structures, I am sure better ways to consider, measure, predict, enable, and reason about robustness and reliability of software will certainly help a lot; specifically, in conjunction with both internal and external changes.

Postscript: I don’t know if Holzmann thinks we should continue building software like sand castles.