IT Systems that Scale
IT systems are unnecessarily complicated because of the way they evolved. Once upon a time, storage, computation and communications were expensive. Early computing systems were optimised to make use of these scarce physical resources. Moore’s Law has meant we are able to create large systems using techniques developed when hardware was expensive. However, the techniques used to optimise small systems do not scale well to large systems with millions of applications, billions of people, and trillions of objects in the Internet of Things.
Today, storage, computation and communications are inexpensive. Our constraints are different. We need to handle much greater complexity and have the ability to scale, integrate and reuse large complex systems.
One way to scale is to add a new layer to IT systems to increase their functionality, reliability, and security. If we add a new layer it should enhance existing systems without changing them. Somewhat like the prefrontal cortex of the brain, it should coordinate applications and data to extend capability without changing what already exists.
Semantics through the use of data
In early systems, it made sense to have one place to store each item of data and to have a single source of truth. When systems were small, it was useful to embed semantics in the name of the objects and their data values. (Figure 1)
Alternatively, we can give data meaning when an application uses it. (Figure 2).
Systems become complicated when we have many applications accessing the same data items. (Figure 3)
The meaning of data can change depending on the application. When many applications access the same data item, changes in the meaning of one application can impact other applications. It also becomes difficult to add new meanings and uses of data.
Complexity reduces if we isolate data by giving each application its own space for the same data item (figure 4).
Here each application has its own data storage space for each data item. The application using the data provides the meaning to each data item. Isolating the data reduces problems from unforeseen interactions. It modularised the system and makes it easy to reuse an application in a different IT system. Instead of reusing data, we reuse applications across different systems. The data used with the applications is tied to the application and has no meaning independent of the application.
This approach makes for secure systems. An intruder examining the data has no idea what the data means unless they also have access to the application operating on the data. Examining the data values does not reveal the meaning of the data. This means that massive security breaches of data become less likely and systems are able to quarantine security breaches and isolate intruders.
Client Server and Distributed Systems
Most computer systems are Client Server systems. (Figure 5)
We do this to reduce the amount of storage used for programs and the computational load on the total system. Using this approach means we put the meaning of the data with the data. Control of the data now gives control over the use of the data and it makes data mining and the extraction of data viable business models. Distributing the data means the data has to be the same wherever it is stored. To have distributed ledgers means we have to know that the data elements are the same in all ledgers. We achieve this with technologies like blockchain. But blockchain computation increases exponentially with each instance of the data.
By separating the meaning from the data, we can distribute systems by distributing the applications. (Figure 6). Distribute the application with multiple copies and distribute the storage and we get modular systems. More importantly we distribute control over meaning and, while it does not stop data mining and third party exploitation of data it provides for alternative business models.
Now the systems scale because we only need to make sure the copies of each application that provides each data item with its meaning are the same.
Once we have this structure, we can deploy the same application across multiple entities. We can represent the connections as in Figure 7.
Figure 8 shows a truly distributed system, connected by the same applications, rather than by data.
Within a distributed system, all entities have equal standing. New applications with new meaning for data can be introduced incrementally without changing the existing systems. Different organisations and people may use the same applications, with the data protected in silos. The meaning of the data is only known when we can retrieve it using the application which placed it there.
To ensure all the applications A, B and C are the same, we make copies of the code and give the copies to the entities. We only allow application A, for example, to communicate with another instance of application A, if we can prove that the application is a clone of all other A’s.
With this new structure, the required amount of computation, storage and communications may be one or two orders of magnitude greater than with a client/server application. However, the amount of computation power required for a given computation does not vary with the number of entities, applications or data in the system. Taking meaning from the data and putting it in the execution of applications is scalable, maintainable, secure, easily extended, and, for humans, it is private.
Importantly it opens the way for alternative business models where the applications that provide meaning to data are paid for the work they do. In contrast, today’s dominant business model is to monetise data meaning where third party controllers of data charge rent for access to data.
