Code modularity is like democracy — respect & enforce it, or it will lithify

Jesse Paquette
Nov 21 · 9 min read

Let’s face it — any democratic government, at any static point in time, is not the most efficient system. There are unnecessary procedures, coalition-building, votes, elections, and paperwork to file. It would be much more efficient to cut all the red tape and establish simpler, fixed processes and chains of command — i.e. to lithify the system.

It’s worth noting that most businesses are lithified and not at all democratic except perhaps at the Board of Directors / shareholder level.

But from a government perspective, democracy is better — why? It is more adaptable to change over time.

Society changes, technology changes, transformative events happen — democratic governments can adapt to these changes (relatively) quickly.

Bill Kunstler: “So, how do you overthrow or dismember, as you say, your government peacefully?”
Abbie Hoffman: “In this country, we do it every four years.”
- From The Trial of the Chicago 7, Aaron Sorkin

However, the tenets of democracy are at great risk during a crisis — the temptation to lithify government and adopt fascism is especially acute in times of war and civil unrest. At those times, democracy is only preserved via respect and enforcement of the rights of the people.

The software industry experiences perpetual change at a much faster rate than other industries. One can generally trust that the structure and tooling of a factory will be sufficient for at least 15–30 years, but the software you’ve built will probably only suffice for 5 years, max.

You should be planning right now for how your software systems (and your organization!) will need to change next year.

If you’re not prepared, it’s going to cost a lot more than you expect.

A popular term for the future cost of adapting your software systems to change is technical debt. Send this article to your CFO if they don’t understand why the budgetary requirements will continue/increase for a software system that was supposed to be “done”.

The current revolution in cloud services — i.e. Amazon Web Services, Microsoft Azure, Google Cloud Platform — is a distinctly significant change that is occurring right now. But that’s not all — we’re still experiencing the scaling problems of Big Data & IOT, the transition to web, mobile & social applications, and for some especially lithified industries — I’m looking at you, Healthcare — we’re still adapting to the internet.

This is why modularity is so necessary in software systems — the data, technologies & user needs of yesterday will be significantly different tomorrow. Modularity provides a useful design pattern for anticipating and adapting to these changes. (details below)

However, much like the example of democratic governments above, if you consider any software system at any static point in time, modularity appears to be inefficient, unnecessary overhead. Short-sighted leaders and developers will push for lithified alternatives as cheaper and more efficient. Of course in the long term — and, again, in the software industry, that’s as soon as next year — lithified systems are very expensive to upgrade.

The modularity of software systems, therefore, must be respected and enforced — otherwise, like democracy, it will disappear in favor of short-term, lithified solutions — especially during times of crisis and urgency.

Modularity essentially refers to relatively isolated, higher-order components within a software system. A simple example is a database — it manages the structure, storage and input/output of data in a relatively isolated component from the rest of the system.

I say “relatively isolated”, because in order to work, modular components must interact with each other via code — and each component must contain sufficient information about other components in order to function properly.

If implemented well, modularity enables efficient adaptation of software to changes in requirements over time. Individual components could be redesigned and potentially replaced without significantly impacting other components.

Modularity enables separation of concerns, which, in turn, enables separation of labor. To continue with the database example, coders working on other parts of a software system don’t need to work on the database code.

Modularity is facilitated via interface abstraction. The other components of a software system interact with the database via the database’s SQL interface, not the lower-level code or data structures within.

Software libraries are modules that you can import into your system, and typically function within a single coding language — e.g. there are thousands of available libraries for Python, Java, etc. Software libraries can be developed within your system/organization, or they can be imported from software vendors, or from open-source communities.

Software frameworks provide a useful suite of “out of the box” software functions for well established tasks — e.g. user session management. Unlike software libraries, which plug-in to your code, you run your code within a software framework. Angular is a good example.

Low-code templates offer a way to configure state within one or more components from a safe distance with minimized code complexity — typically the settable low-code options are all well tested and documented. NGINX web servers and Docker containers use low-code templates, for example.

Pro-code plugins, similar to software frameworks, enable a developer to write small units of custom business logic code that integrate within a narrow context of a larger component. Unlike software frameworks, pro-code plugins often involve execution of code in a different language than the component. For example, Tag.bio allows a developer to specify R or Python plugins for execution on data in a very specific context of a Java component.

Cloud services and containers have emerged recently as extremely convenient modular components within the modern software stack. Many organizations have spent a good amount of time/money over the last 5 years transforming their systems to utilize this new tech.

So, you may be thinking—”we’re using Docker containers in AWS, we have a well-designed data lake, and we use lots of open source libraries and modern software frameworks — so our system is therefore modular and protected from lithification, right?”

Not likely.

There are plenty of articles out there about how teams produce bad software — e.g. poor leadership, poor teams, poor processes, etc. This article is not meant to be another one of those. I’m going to get more specific now about situations where modularity converts to lithification.

First, let’s discuss some of the positives of lithification. At some point in every codebase, the elegant abstraction has to stop and actual work has to get done. Information needs to be modeled within data structures, business logic and algorithms need to operate on those data structures, and processes need to be parallelized/synchronized. At low-levels of the system — i.e. within modular components lithified code can offer significant advantages in performance — ie. faster computation and lower memory consumption.

For example, a machine learning algorithm can operate much more efficiently on a data in an explicit structure. The algorithm can then be encoded to jump straight to specific locations in the structure to read data (and potentially change it). This prevents the system from creating unnecessary transient or permanent copies of the data, and it prevents the algorithm from spending cycles traversing data it doesn’t need in order to get to the right content.

A hallmark call for positive lithification is when a developer says “we should re-write this component in C to improve memory use/performance”.

Lithification can also be a positive situation when multiple components are consolidated into a single coding language or codebase. This enables efficiency of development (and hiring)— developers of the unified codebase can speak the same language and automatically test their updates within a language-specific testing framework — e.g. JUnit.

Lithification also occurs unnecessarily — typically via negligence and haste.

Some examples of these anti-patterns:

  • A developer realizes that their component can connect to and query a database directly and bypasses the database controller module.
  • The codebase is heavily dependent on SQL queries to function, resulting in a situation where the database schema can never change without breaking code everywhere.
  • All data is required to live in a single database, i.e. a “data lake”. New data types are contorted to fit into the omnibus schema. Scaling problems on one side of the data lake create problems when working with data on the other side.
  • Data is over-modeled within objects that are directly used everywhere in the system, without any interfaces.
  • A open-source software library is abused to do something unintended — eventually hacks and workarounds arise to accommodate the ill-fitting module.
  • Too many incompatible open-source libraries are used in conjunction, resulting in hacks and workarounds to stitch them all together, or dependency hell is caused by libraries having conflicting sub-dependencies on other libraries.
  • A decision to use an inappropriate software framework produces a situation where coders spend more time figuring out how to write hacks or workarounds within the framework than they would spend writing elegant solutions outside of the framework.
  • Similarly, a decision to use a specific cloud service provider (e.g. AWS) results in utilization of too many “native” features of that specific provider — preventing portability to a different one (e.g. Azure).
  • Components within the system are designed with unnecessary “awareness” of where they fit within the system — e.g. with knowledge of the database, or the user authentication system, or specific hard-coded URIs to other systems.
  • Use of reflection between components. Some folks like reflection — not me. Besides the security risk, invoking lower-level code from outside a component will prevent it from being able to change.
  • Poor review processes around developer or third-party contributions to the codebase.
  • Strict, hasty deadlines for delivery of a software project.
  • A failure to spend time and money on refactoring and testing.

Am I guilty of some of the above? Maybe...

  1. In advance of the project — dedicate enough time and the right people to designing a modular, adaptable architecture with respect to current and anticipated future needs — i.e. respect modularity.
  2. Building the project — establish culture, process, and technology to enforce modularity and seriously identify/question/fix all unnecessary lithification. Establish rigorous testing mechanisms within components and between components.
  3. Maintaining the project — allow time to be spent refactoring components in the system to adapt to changes in technology and system requirements. Question previous decisions to utilize software libraries and frameworks that require too many hacks/workarounds, or are incompatible with future needs.

Some tips from my experience.

We’ve put a lot of thought into modular design and lithification at Tag.bio, and we’ve made some key decisions over the last 8 years.

  • Data sources shouldn’t all be lumped into the same omnibus schema and served from the same component — i.e. a data lake. This is an anti-pattern of lithification. The Decentralized Data Mesh architectural pattern, with domain-designed data products, is far more modular, efficient and scalable.
  • We have lithified a major component for fast, in-memory computation— the dataset hypervisor, AKA, the “Flux Capacitor”, or the “FC”. This is the engine that serves up data and applications for all of the instances of data products within our platform. By lithifying the FC into a single Java codebase and executable, it is fast, has an optimized memory footprint, and is easily containerized/deployable to our Kubernetes clusters. Those containers also contain R/Python environments and libraries for running pro-code plugin algorithms.
  • However, we’ve established some important rules for the FC to enforce modularity. An FC instance must not know about any external components of the system, or where it lives. It only knows about governed data that’s mapped in — via low-code JSON, not Java code — and the apps/algorithms designed for that data — via low-code JSON and pro-code R/Python plugins. User authentication and useful data artifacts, for example, are managed by other components of the system which connect to each FC instance via its RESTful (“Smart”) API.
  • Customer-developed and third-party-developed components can thus communicate with the API of any Tag.bio data product just as easily as the rest of our system — with proper authentication and authorization, of course.
  • The Tag.bio platform is cloud-agnostic. Because we eschew native cloud functionality wherever possible, our system can be deployed as a turnkey, lithified Kubernetes cluster in any customer cloud platform.

If you have any comments or questions, please let me know.

Tag.bio — Your data. Your questions. Your answers.

The latest news and updates from Tag.bio.

Tag.bio — Your data. Your questions. Your answers.

Tag.bio is a San Francisco, CA startup solving the last mile problem in data analysis for Healthcare and Life Sciences — with a distributed data mesh architecture, a domain-native user experience, full reproducibility, automated cloud orchestration, and enterprise-grade security.

Jesse Paquette

Written by

Full-stack programmer, computational biologist, and pick-up soccer addict, located in Brussels and San Francisco. https://www.linkedin.com/in/jessepaquette/

Tag.bio — Your data. Your questions. Your answers.

Tag.bio is a San Francisco, CA startup solving the last mile problem in data analysis for Healthcare and Life Sciences — with a distributed data mesh architecture, a domain-native user experience, full reproducibility, automated cloud orchestration, and enterprise-grade security.