Reuse reviewed

The idea of software reuse has been recycled many times. At a NATO conference in 1968, “mass produced software components” were already being touted as the answer to the software crisis of the time. The idea evolved into the concept of the “software factory” and then the “software product line”, ways of organising development designed to facilitate and promote code reuse. As the variety and quality of open source software like Node.js and React continues to increase, reuse has become more important than ever before, with some corporations now choosing to model their internal reuse practices on those of open source (so-called “inner source”). Today, the first question we ask when faced with a programming task is “when will npm be back online?”

Let’s see what Seneca the Elder has to say on the topic…because, why not? Is not idea reuse at the core of software reuse? And reuse is a very old idea…

We should follow (people say) the example of the bees, who flit about and cull the flowers that are suitable for producing honey, and then arrange and assort in their cells all that they have brought in

A flower in the literary context means a motto worth reusing. The advice (collected from other people of course) is to collect first. Do Not Repeat.

Yet how many times have we rewritten — and not without cause — the same CRUD endpoints? How many different react-dropdowns can you find on npm? Why?

Now let’s listen to Donald Knuth, who might have a bone to pick with Seneca:

You’ll never convince me that reusable code isn’t mostly a menace

Software reuse is a bottomless topic, with research groupsdedicated to it, and a fat volume released every year by the International Conference on Software Reuse. This research exists because reuse does not provide something for nothing. Reuse is not a panacea but a drug, addictive and dangerous. Problems occur in three main areas:

  • Making code reusable
  • Adapting and integrating reusable code
  • Advertising and finding reusable code

Reuse divides into at least two major categories with overlapping but distinct problem sets: component-based reuse and framework reuse. Our focus here will be on component-based reuse. We’ll examine each of the three points above, but only scratch the surface.

Making a component reusable

Making a component reusable is hard. To put some flesh on that assertion, let’s looks at some of the methods and challenges we have encountered in creating a reusable article-submission platform for the biosciences journal eLife Sciences. YLD has been working on this project (which is still in early stages) along with the Collaborative Knowledge Foundation (Coko) and other publishers interested in adapting the software for their own requirements.

Frontend components

We have chosen React because it enables us to efficiently and declaratively compose front end components, but we cannot do so if the requirements we are implementing have no patterns in their structure. The process of enabling reuse therefore begins at the requirement sourcing and design phase with a commitment to atomic design. As Lucretius, another Roman, tells us:

Many atoms common to many things are mixed in things in many ways; thus the various things are nourished by things that are various

Scoped CSS (CSS modules or CSS-in-JS) allows us to package styles with our components but makes enabling overriding those styles for custom theming more complicated. We probably deploy the provider pattern to inject global overrides anywhere into the component tree, as, for example, styled components has done. But to provide total control we need to specify extension points on the styles of each sub-component, or else rely on nested selectors. Maybe we try to decouple ourselves from a specific CSS-in-JS implementation by injecting the styled components themselves, but now we have to deal with the possibility that our component will be used in a way outside its scope.

We need to share data as well as views, so the next level of reusability is generated by taking responsibility for handling state out of our components and putting the state into a globally accessible store (Redux). Now we introduce special higher-order-components (“containers”) for publishing and subscribing to this store and compose these with different “presentational” components to get different results.

Soon, though, we realise that many of our state-management components are actually not reusable at all because behaviour and display are still too closely intertwined. Maybe (we have not actually done this) we refactor our higher order components to be more atomic by restricting them to add a single prop. We need a lot of boilerplate to avoid props name clashes with the components they wrap and we have to manage the other pitfalls of higher order components. Perhaps we start to write our stateful components with injectable render functions instead (function as child, render prop). At some point it emerges that if we want to be able to reuse any of these stateful components in the same application, we need to namespace their actions (redux-doghouse).

All this abstraction, dependency-injection and recomposition creates complexity, and the closer we get to the model of the atomists, the more complex things become. Is it any wonder that in work entitled Anti-Lucretius it was said of “the father of atoms” that:

As if drunk on error, he fell on his own sword

?

Backend components

We would be limited in our ability to reuse fronted components across different applications if we could not reuse the backend code as well, which means that also needs to be extensible. Document stores to the rescue! We will give our server a rudimentary data scheme consisting of collections and objects, and allow users to define any fields on these they want. Unsurprisingly, this strategy makes for a painfully generic API. Perhaps our users can add a GraphQL layer to bend it to their needs, but at that point the extra labour has probably exhausted the benefits of reuse. Moreover, we still have all the downsides of working with unstructured data. Finally, if the assumption is that each application will be defining its own arbitrary data types, it is very unlikely that the library of components reusable between projects will grow over time.

The solution we are settling on (still up for discussion) is a modularised SQL schema. Schema components will be able to define migrations from other schema components, adding, for example, a single table with a relationship to a table that already exists. Each schema component will therefore depend on one or several others if the data is related. In turn, backend components adding routes to the server will depend on schema components. Modifications to schema component dependencies should be fine as long as primary keys are not modified, since these constitute, as it were, the component’s API: these are what other components will depend on explicitly. To handle any other modifications we can simply run all migrations in the root dependency, then all migrations in components that depend on it, and so on. If the need arises to modify a primary key in a dependency, it will no longer be possible to run migrations in the order of the dependency tree and ensuring they are carried out in the correct order will become a headache. We hope this need will not arise, but it might!

If you have a better way to do it, answer this stackoverflow question, why don’t you?

Adapting and integrating a reusable component

Adapting, integrating and keeping a reusable component updated obviously takes time. The application as a whole will contain more foreign vocabulary. Bad dependencies and brittle couplings may creep in. Too much reuse, or reuse of the wrong sort, may even lead to reuse fatigue and make it difficult for developers to maintain a personal stake in the project (factory syndrome).

Dependency

A Rob Pike apothegm:

Dependency hygeine trumps code reuse

Dependencies make both feature development and ongoing maintenance more difficult. Regarding feature development, a system composed of many reusable packages is more difficult to change because often multiple releases must be made to introduce a single feature. Lerna helps us cope with this problem, but since sometimes features bridge multiple monorepos we have been unable to eradicate it entirely. In addition, the more dependencies in your application, the more likely something outside of your control will go wrong, the more likely you will be to unearth hidden conflicts, and the more time you will spend simply updating your code to catch up with new releases rather than building features. The Ariane rocket crash (mentioned before on this blog) was caused by a module reused from an earlier version, which proved unable to cope with the higher speeds of the newer rocket.

The icing on the cake is that the task of keeping dependencies up to date will go on to consume large amounts of time indefinitely. leftpad will not be the last event of its kind. There are tools such as nsp and Greenkeeper which can help automate aspects of this task, but no amount of tooling will obviate the extra work that maintaining dependencies implies.

The costs and benefits of dependency must be weighed anew for each problem, but two cases spring to mind in which it is very likely that introducing shared dependencies is a bad idea. Code reuse in these cases is in fact a common antipattern. The first is the overzealous test helper, which instead of helping you write tests, tries to generate them for you. It pays to keep tests fairly repetitious so that the reasons for failures are clear and modifications are easier to adapt to. The second case is the microservices “framework”. Code reuse between microservices should be minimized, otherwise bugs will propogate through the system and a fleet of microservices will behave more like an unusually expensive and complicated monolith.

Factory syndrome

One aspect of reusing a component that is perhaps less talked about is the potential psychological and personal impacts of depending more on libraries and writing less original code. Japan’s software factories is a study of attempts to adopt the software factory methodology in the USA and Japan in 70s and 80s. It highlights several US examples in which software reuse programs failed because developers did not cooperate with what they perceived as a shift in their status from master-craftsman to factory drone (Japanese workers were apparently more tolerant of this transition).

The concern is not a trivial one and continues to find expression today (e.g. stackoverflow, stackexchange). Repetition is a prerequisite for learning, as the Roman poet Ovid opined:

A droplet carves out a stone

But as high level languages and frameworks have become more important and software reuse has increased, programmers have become increasingly able to ignore the fundamentals of how their programs work. Whether one views the increasing abstraction with which we operate as a liberation or an alienation, we can probably agree that the wrong kind of reuse has the potential to deskill and demotivate. Consumers of reusable components need to be able to choose their own components and they need to be empowered to learn about and make changes to them. This will only occur if the creators of frameworks and reusable components assume that their users are as curious and intelligent as them.

Advertising and finding a reusable component

Not only must someone be able to find your reusable component, the component’s documentation must be clear about exactly what it does and how it can be adapted to their circumstances before they will consider using it. This communication burden is huge on both sides, and means that, according to an unfortunately well-corroborated empirical law, most components developed with care to be reusable are not in fact likely to be reused.

Empirically, the frequency of component reuse obeys a “power law” similar to Pareto’s principle and Zipf’s law describing the usage frequency of vocabulary items in natural languages. That is, the most frequently reused component will be reused about twice as often as the next most frequently reused component, about three times as often as the the third most reused component, and so on. This steep decline means in any given domain we can expect a few components to be reused very heavily, and the vast majority to be reused little or not at all.

Power laws crop up in many situations — citation counts, filesizes, website hit-counts, city populations — and there is no agreed-upon theoretical explanation for them (nor, probably, just a single mechanism at play in any given example). There are a couple of explanations based on communication dynamics which seem particularly pertinent to the case of software reuse, however. The first is that because of the way information propogates through social networks, the popularity of a component is likely to grow in proportion to its current level of popularity — the more reused a component is, the more people are aware of it, contributing to it, and sharing it, and so the better its purpose is understood and the more new users it reaches. This powerful preferential attachment process ensures a few components absorb most of the communication bandwidth at the expense of everything else.

The second explanation, formulated for Zipf’s law specifically, derives the law as a minimum of the cost function of communication, where cost is traded off between signal usage and communicative ambiguity. Higher signal usage cost corresponds to longer word length; higher communicative ambiguity cost is incurred by reusing the same words for different ideas. The steep decline in usage frequency characteristic of the law is said to be a result of the fact that there are exponentially more possible long words than short words, so most words are too costly to use often, although, in order to minimise ambiguity, they are part of the vocabulary. The cost function associated with communicating a reusable component from producer to consumer involves a similar tradeoff between signal usage and, let’s say, “functional” ambiguity: that is, between how hard it is to communicate the component and how well the component fits the user’s requirements. The higher the entropy of a domain, the more components will be required to address specific requirements without functional ambiguity, and so the higher the signal usage cost will be: not simply because there are more components to sift through, but because the more there are, the harder it is to succinctly describe and understand the differences between them (take a look at CSS-in-JS implementations!).

Now we confront another ancient truth. One that the philosopher G. W. Leibniz expressed for us during modernity’s first Information Technology crisis:

The indefinite multitude of authors will shortly expose them all to the danger of general oblivion

This reasoning about communication overhead applies equally to the cost of integrating a component. The more components there are to wire together, the more boilerplate will have to be provided per lines of code reused. Thus, the inherent entropy of the domain (i.e. the number of distinct possible requirements within it) places a hard limit on the possibility of reuse, just as the entropy of human life has necessitated vocabularies that could not be contained within the walls of Babel. At some point it becomes cheaper to write your own component than to select correctly from the mass.

Conclusion

Reuse is a performance enhancing drug that must be taken with care. The key advice when planning for reuse is therefore to focus on “vertical” reuse as much as possible: aim to have your component reused only in contexts highly similar to your own. This advice, of course, applies to componentized, atomic code, which has been the focus of this article. The rules for frameworks are perhaps different. But what does it take to make a framework as universally applicable as good motto, like that of the Pythagoreans:

Hold all things in common

? Perhaps that other, and less appealing, Pythagorean motto, ipse dixit:

The master spoke it

Further reading

Power laws: this paper presents evidence for power laws describing reuse frequency in several package ecosystems including CPAN, Ruby, Windows, FreeBSD and Eclipse. Another paperdiscovers the law in three different Unix OSes.

For more detailed essays on reuse, besides the series of books by the ICSR, check out this gitbook or, more thorough still, C.R.U.I.S.E.

There are many somewhat repetitious blogposts warning against reuse (in addition to this one): one, two, three, four, five, six, seven, eight, nine, ten, eleven. Practice what you preach, eh?

Published by Sam Galson (Software Engineer at YLD) on YLD Engineering Blog