Embrace Complexity — Part 3: Cloud

Tony Seale
12 min readApr 29, 2022

--

How to connect data in a decentralised private cloud

by Tony Seale

Connecting the Networks

In Part One of this article, we saw how the combined forces of data, cloud, and AI, are driving forward a transition where we move from the Industrial Age into the Information Age. It raised the outrageous yet exciting possibility that each of these forces is just a different type of ‘network’, and that because networks are built around connectivity, the three forces can be combined into one unified network.

Part Two grounded the first of Part One’s extravagant claims by providing us with the first practical tool: The Graph Adapter which allows us to take lots of individual and isolated data tables and turn them into a single easy-to-read network.

Part three now moves on to the next stage: it will show you how you can link all your networks together by giving each piece of data its own unique address in the cloud.

This is our first outrageous unification; it unifies the data network with the cloud network. This unification liberates data from the database or application in which it was born and allows it to mingle and socialise with all the other interesting and exciting pieces of data that exist within your organisation.

This unification allows different teams to independently produce data that has integration built into it from the start. Each team produces their part of a wider, distributed network that can span the entire organisation.

The Power of the Cloud

The Graph Adapters give us network-shaped data that treat connections as first-class citizens but, on their own, they cannot connect different databases together as the data has to be physically stored in one place. On the cloud, it does not matter where the data is physically stored.

For example, with Spotify, you no longer need to keep your songs on your PC and transfer them to your phone when you want to listen to them. So long as there is an internet connection then you can listen to all the songs you want — so who cares where they are physically stored?

The web of computers that makes up the internet is clearly already in a network shape but now, thanks to the Graph Adapter, so is our data and because they are both networks, we can unify the two together by embedding that cloud networking capability directly into the structure of the data itself.

This sounds complex but it is not. Remember how we split our rectangular box-shaped data into three-part statements? To unify data and cloud we simply give each piece of data a clickable address (just like we give each webpage a clickable address on the web). Let’s look at that with the example data that we have been working with:

As you see, each piece of data gets its own URL; the unification is as simple as that. Each piece of data in the data network is now also a clickable address in the cloud network.

Just like our Spotify songs and YouTube videos, it no longer matters where a piece of data is physically stored because it has a resolvable address on the cloud.

This means that on the unified data-cloud-network we can link two data items that are stored physically in totally different systems simply by using their network addresses.

The first part of the address https://abc.org/person could take us to a computer in France that holds the ‘person’ data, the second part (ben) is then handed over to that computer to deal with. Likewise, the first part of the address https://abc.org/order could take us to a computer in Japan that has the ‘order’ data on it and so on ad infinitum.

In this way, you can leave all your existing data where it is and use a network of HTTP servers to skim a thin layer over the top of all your existing systems. The HTTP web servers drive the cloud support right down into the very structure of the data itself.

To make the published ‘data pieces’ easily accessible across the organisation, it is necessary to build or buy some software that is equivalent to a search engine. This ‘search engine’ crawls over all your published datasets and builds an index of all the available data pieces (rather like Google does on the web). This means that you can easily find the URL of any piece of data that you are looking for, for example, you could search for a person by using their email address or by using their first and last name.

Publishers can use this service to easily find the unique URLs that they want to link themself to. This turbocharges the Graph Adapters and allows different publishers to all connect to one another.

You will also need many graph databases to consume the various bits of the network that all the different teams are interested in. The very essence of this approach is that your organisation’s data no longer needs to be confined to one giant central store. Instead, the publishers and consumers are all reading from and writing to one universal distributed network. Indeed we need not think of publishers and consumers as being separate at all.

We have what is called a peer-to-peer network where each node in the network will publish some ‘network-fragments’ into the cloud and download other fragments from the cloud.

Singing from the Same Song Sheet

In Part Two we introduced the notion of modelling abstract concepts like ‘Person’ and ‘Product’ and ‘Order’ to enable more generalised queries without having to get caught up in messy details such as the names of the individual people or the specific products that they ordered. In the data network, we treat the model the same as any other data piece so we use the same trick to share it: each abstract concept is given its unique address in the cloud. You can think of it as a three-stage process.

  • Take the words that describe and underpin the working of your organisation (for example, ‘tracks’, ‘trains’ and ‘passengers’ for a railway operator, and ‘pupils’, ‘books’ and ‘lessons’ for a school)
  • Get very precise about the definition of exactly what those words mean
  • And publish those definitions on the internal cloud

Let’s ground this idea by returning to our worked example: we have the three-part data statements, and now each part of the statement is also a clickable network address. So now, if I want to know what ‘placed order’ really means then I can click on it.

It is worth pausing here to call out the value of a healthy tension between order and disorder, centralisation and decentralisation, harmony and freedom. To strike the optimal sweet spot between the two, the model must be centrally controlled whilst also being extensible (bearing in mind that it is vital that any extensions are carefully monitored and regulated). This is similar to many other situations: a game of football operates under a strict set of rules, but players can express their genius within those rules, and markets provide a legal and regulatory framework, but individual companies can innovate within that framework.

Mathematically, these forces are referred to as ‘differentiation’ and ‘integration’ — the balancing point between them is sometimes referred to as the ‘edge of chaos’ — and it is on the edge of chaos that all the fun things happen!

When enough of the organisation’s data has been mapped to these shared concepts, they begin to form an interface layer over the messy details of the underlying applications and databases supplying that information. An interface layer is one of the oldest tricks in the computer programmer’s tool kit and it turns out to work very well for data too.

This interface layer is much simpler to understand than the spaghetti-like morass of databases and tables that hold the data. Consequently, this conceptual data interface can be understood by all members of the organisation and not just the IT department.

When the conceptual data interface is in place most people no longer need to bother with the painful intricacies of the underlying IT infrastructure to get hold of the data they need.

There is another hugely significant advantage to having an organisation-wide ‘data interface’. An interface of this nature gives you the freedom to work on the technology below it without breaking everything else. So, for example, an application supplying the ‘person’ data in an old mainframe could be transferred to a new system without it being necessary to rewire all the connections from the downstream systems that use that ‘person’ data. It is rather like an aeroplane undergoing significant maintenance and repair whilst continuing to glide seamlessly through the sky. This is a glimpse into one of the many commanding ways in which a networked organisation is much more agile than its industrialised counterparts. It hints at the emergent way that the network can rise above the organisations underlying complexity to form a simpler high-level layer of abstraction.

“Something is complex if it contains a great deal of information that has high utility, while something that contains a lot of useless or meaningless information is simply complicated” Steve Grand

Your Organisation’s Social Data Network

At the end of the day, organisations are made up of people; each organisation is a miraculous entanglement of free-minded individuals collaborating on a shared common purpose. But oftentimes collaborating and sharing data within an organisation feels much harder than it should.

Social networks like Facebook and LinkedIn allow individuals to connect in a way that forms self-organising, collaborative communities. In so doing, these platforms reduce peer-to-peer friction. When we network our data within an organisation, we essentially give the same advantage to both those producing and those consuming data. Networks allow people to collaborate at scale, and networkification liberates human collaboration around internal data within an organisation.

At the moment organisations collect data together in a central store such as a data lake, and it is from there that one central team attempts to integrate the data for reporting. The responsibility for data integration is the exclusive domain of a small team of data aggregators.

Decentralised data networks flip the current integration paradigm on its head, because in a peer-to-peer network on the cloud, it is the application teams (who, let’s face it, know their data better than anyone else) who take responsibility for publishing pre-integrated fragments of data to the larger network. Moreover, it is the business teams (who, again let’s be honest, have the deepest knowledge of the questions that need to be asked) who can pick the fragments that they are interested in and load them into their own personal store to query.

Therefore, in a decentralised model, there is no ‘man in the middle’ and quality data is everyone’s business. That is because the network’s organic structure allows people to self-organise and subdivide into bespoke clusters that link and enrich specific little pockets of the wider organisational graph.

This is the very self-organising, anti-entropic feedback loop that we began searching for in Part One. It is an Internal Data Market made from a peer-to-peer network of publishers and consumers.

To make the Internal Data Market work each application team must comply with a few simple standards:

· They must structure the key data that they wish to share as network fragments (see Part Two)

· They must publish those fragments on the internal cloud

· They should try to conform to a common shared model

· They should try to link their data to the data in the rest of the network

When it comes to the data consumer we need:

· A well-organised catalogue to easily discover the various data fragments that have been published

· A way to select and retrieve those fragments and keep them up to date

· A way to run queries that will drill into all the pre-integrated data pieces you have selected

Data as a Service

We have now come a long way in this series of articles:

· We have identified the three main forces driving the transition into the Information Age as Data, Cloud and AI

· We’ve shown how all data can be transformed into a network-shape and we have demonstrated how the data-network can be collapsed into the cloud-network so that they both become the same thing

· And finally, we have seen how this data-cloud network can enable a decentralised, looping, organic and emergent process that is theoretically capable of linking all information within an organisation

But perhaps you’re still thinking ‘so what? How is the bottom line affected by unifying data and cloud into one network? Who cares if all my organisational data is connected?’

Thinking about other networks may help you get an intuition of the kind of advantages that an organisation can realise through networkification. What’s the advantage of everyone being able to speak to each other on one telephone network? What’s the advantage of having all our documents connected on the Internet? What’s the advantage of connecting to all your colleagues on LinkedIn?

In each case, so-called ‘network effects’ reduce the cost and friction of ‘getting stuff done’. With a telephone network, I no longer need to travel miles to meet up with someone to have a chat with them. With the Internet, I no longer need to travel to some obscure library to discover information. And in the case of LinkedIn, I can widely share articles such as the one you are reading now with the simple push of a button.

In each case the reduction in cost and friction frees us up to experiment: we can call up a friend on a whim, search for some information just out of casual interest, and post a crazy idea just to see if anyone else is interested.

The same network effects come into play with your data. At the moment, if someone has an innovative idea about optimising a process or thoughts about a new product line or musings about new ways of mitigating risk, then the cost of obtaining the data (finding all the places where it is stored, contacting the IT teams, getting the necessary permissions, connecting it, cleaning it and making sense of it) is prohibitively high.

There is simply no way of knowing just how many innovative ideas are lost because it is just too expensive to experiment with our data. The unified data-cloud network vastly reduces the time and cost of simple data analysis. The opportunity to experiment and uncover completely game-changing ideas that are hidden within the connective tissue of your organisation’s data becomes real.

The bottom line is that the organisations with networks such as these are the innovators and the ones that will start to adapt at a bewildering speed. Consequently, they will be the ones who will lead the way and thrive in the unfolding technological revolution we are living through.

In the next article we will show how the knowledge embedded in the data-cloud network can be used to teach an organisation’s AI, but for now, let’s concentrate on what we have in front of us; our second concrete tool that can publish and consume pre-integrated network fragments in a peer-to-peer network that spans the entire organisation. Some people call this tool a ‘Data Product’, others call it a ‘Data Service’, and others still a ‘Pod’ or ‘Node’.

Tool Number Two: The Data Service

The data service is a specialisation of an existing and well established architectural pattern called a microservice. An individual data service can use the graph adapters to publish graph fragments into the data-cloud. A data service can also query data from the distributed graph and has some proactive local caching to ensure good performance. The data services combine to form a peer-to-peer network or what is coming to be known as a data mesh.

(There are some special data services that are worth highlighting: The Data Catalogue Service keeps track of all the datasets within the organisation. The URL Lookup Service has an index of all the URLs. The Schema Service shares all your models and the Root Service acts as a reverse proxy routing all calls to the correct underlying data service)

Part Four: AI

--

--