Fractal Engineering — A lens for simplifying complex software systems
Viewing software engineering concepts through a fractal lens can simplify complex systems. Here we discuss this as a tool for designing system architecture. We start with a rough description of fractals, follow with some applications, and end with points on what this means for innovation.
A fractal is a shape in which inspecting an edge reveals a ‘similar’ version of the shape previously seen. Increasing the magnification on this edge reveals similar shapes again and again, in an infinite formation.
Fractal geometry is a limb of chaos theory. The observation is that wildly complex systems have underlying patterns that can be researched and understood. Functions with slightly different input parameters produce similar, but ultimately very different results. Benoit Mandelbrot, who gave fractals their name, would refer to this similarity within systems as having a “degree of order”.
Fractals appear often in nature. They allow mathematicians to theorise about previously considered ‘random’ systems. Mandelbrot was initially theorising about the British coastline when he made his discovery, that any section of the coastline under magnification reveals another, similar part of the coast. Other examples are branching trees, the patterns fire creates as it burns, and the most often referenced snowflakes.
Taking this concept, to bring order to the seemingly unordered, we can apply the same reasoning to software system architecture. This allows us to reduce large complex systems into simple concepts that repeat themselves at different scales. A deviation from the theory is that fractals possess an infinite perimeter: as the magnification increases, more versions of the self are revealed. As we a constrained by living in the real-world we cannot continue to infinity but will do our best.
Eventually, software engineering is reduced to a few concepts that we apply up and down the stack. This document aims to offer the fractal lens as a tool, rather than establish a dogma, so we will look at a few applications and not provide an exhaustive list.
A simple example is how redundancy is often approached for resilience. We will often deploy an application into a cluster (as a series of containers or VMs), with routing we enable the application to be accessible if one VM is taken offline. Zooming out, we apply the same pattern across availability zones (in AWS terminology), we now have clusters of clusters within a region, if we lose an AZ the application remains accessible. Zooming out further, we look at multi-region deployments for our application, providing an extremely high level of resilience.
To allow ourselves to zoom out further, we begin to discuss multi-cloud approaches, for when the core cloud infrastructure begins to fail. This absolutely may not be required for every business, however this fractal method of working through the problem forces teams to at least consider it. We lessen the possibility of blindspots existing.
It is also possible to zoom in on this example, although not as far. Starting at the VM level, we can zoom into the process level on the box. Depending on the application, it may be required to run a process per core, and/or to wrap that process in something like Upstart. This means that if the process dies on the box, it may be restarted quicker that the load-balancer will remove it from the live group.
We can apply this tool to other concepts within software engineering, for example testing methodology. On one end of the magnification you may have people testing critical systems, running power surges through their systems and eyeing the output. This is (probably?) as close to the metal as it will be possible to go. Zooming out we can take a technique like fault injection and other classes of ‘damaged-hardware’ testing, flipping compiled instructions or corrupting memory locations. Out further, eventually we may arrive at compiled code testing through unit tests, testing the smallest amount of code we can imagine. From here we move through the traditional testing pyramid, component, functional, eventually we’re testing against production (which is the only environment that is like-production).
If we zoom out as far as possible with testing, we are looking at our system within its live ecosystem being used by real users. Here approaches such as A/B testing, performance monitoring as testing, and surveys taken by actual users are invaluable. Applying the fractal lens here shows us that testing is much broader than the pyramid, and engineers should not consider waiting for bug tickets as part of their testing process.
Monitoring seems like another good candidate for applying the fractal lens. To start we take the smallest useful monitoring we can conceive. We can start with recording VM-level metrics like disk space, memory usage, system log files. Zooming out slightly, our VMs purpose is to run our application, so we consider recording variable level metrics (made easy with something like OverOps, but also simple trace logging from within the application). Zoom out once more, CPU-time for performance or call count on specific methods (think APMs, New Relic etc.).
Application-level health checks may be the next level, broad yes-no responses without detail, as we have already recorded the detail at lower levels. ‘Detailed health checks’ are therefore rejected by the fractal approach as it conflates magnification levels.
Moving further back, external monitoring tools should be applied to the origin servers (in the case of a website or API). We eventually move back to monitoring CDN exit node speeds from locations around the global on different types of network, which is a service that some companies already offer. This informs the use of more than one CDN if coverage is weaker in a target region.
Hopefully these few examples show how considering a system as a series of concepts organised into fractals is useful to start discussion and build well-rounded systems. Other areas that seem interesting to me are how the method applies to security engineering (the scale from: OS driver-level to social engineering prevention), alerting (and the implications for escalation policy), and caching (CPU L1 to CDNs).
It must be said that I do not propose this as the way to build systems, more as a way to understand what we already have. In fact, by employing fractal engineering and scaling up and down the same ideas will hinder innovation as everything becomes just ‘good enough’. This should be considered a baseline, working software should is still the first port of call for an engineer, but we shouldn’t let it hinder creativity.
We can use the knowledge of these fractal patterns to inform our understanding of innovation through negation. Anything that is truly innovative will not fit an existing pattern within our systems. It will be out-of-step with what we have, and will take time for us to remap our thinking.
An example that moves perpendicular to existing patterns is the shift to serverless function-driven platforms, and away from cluster mindsets. This moves away from the application process and the multi-(VM | container) * availability zone requirement (although, still bound by region, a constraint that I imagine would eventually be removed).
This takes a familiar concept, that we require our systems to be resilient, and provides a solution that is not a magnification of an implementation happening on a different level.
New concepts with new implementations are more obviously considered innovative so we don’t discuss them here, however this does imply new concepts with old implementations can exist. This would be the rediscovering of dated technology and repurposing it for new reasons. Perhaps Amazon’s mechanical turk, the resurgence of the actor model and countless startup pivots fit this category.
This short essay has offered fractal engineering is a broad unifying method for a few schools of thought in software engineering. Classifying implementations or techniques under conceptual tags indicates patterns, which we can use to inform us to gaps or weaknesses within our system design, or to drive us to more creative action by defining innovation. More generally, it is a helpful addition to our toolkit and should make us more rounded engineers.