Executable (computational) graphs & diagrams
June 2024 release of Nasdanika Core Graph has introduced a new approach to constructing executable/compute graphs and diagrams. This story provides an overview of the approach and its possible applications. It starts with introducing graph concepts and terminology.
Table of Contents
· What’s in it for me?
· What is a graph
∘ Hello, World!
∘ Computing circumference
· Executable/Computational Graphs
· Metaphors
∘ PCB
∘ Cubicle
∘ City
· API
∘ Graph
∘ Processors & Configs
∘ Connection processors and pass-through connections
∘ Capability processor factory
· Diagrams
· Conclusion
What’s in it for me?
Before delving into the WHAT and HOW, let’s take a look at the WHY!
The approach explained in this story helps me to maintain 31 GitHub repository with 123 Java modules containing 2.5K Java files with 415K lines of code, as well as 27 web sites with about five thousand pages. Generated module graph helps me to understand dependencies between Nasdanika modules. Full module graph also contains external dependencies. I use generated class diagrams and graphs to quickly grasp relationships between model elements. All of this allows me to efficiently navigate between levels of abstraction, and manage complexity.
I also generate sites from diagrams, which are graphs: Nasdanika documentation site, architecture documentation sites such as Internet Banking System. While the Internet Banking System is a demo, I used the approach to document and communicate real-life applications with multiple runtime environments and dozens of components.
So, the approach explained here may help you to increase your personal efficiency and effectiveness, and efficiency and effectiveness of your team or organization.
What is a graph
Wikipedia defines a graph as:
a structure amounting to a set of objects in which some pairs of the objects are in some sense “related”. The objects are represented by abstractions called vertices (also called nodes or points) and each of the related pairs of vertices is called an edge (also called link or line)
Nasdanika’s term for vertex is Node and for edge is Connection. Connections are directional.
Many real life things can be represented as a graph, and may have multiple graph representations used for different purposes. Let’s take a look at a couple of examples!
Hello, World!
Let’s take a look at the classic “Hello, World!” and how it can be represented as a graph for different purposes. The diagram below shows a graph of character nodes connected by “Next” connections.
Instead of characters we can use tokens representing more than one character as shown below:
- 9906 — “Hello”
- 11 — “,”
- 4435 — “ World”
- 0 — “!”
And we can use words:
We can modify the words graph by introducing a dictionary. With the dictionary we can count occurrences of each word and navigate to that occurrence in the text (graph):
In the above graph we have two types of nodes:
- Dictionary entry
- Word in text
And two types of connections:
- Next connection between words
- A connection between a word and a dictionary entry
Then we can extend our dictionary with semantic connections — “Hello” is similar to “Hi”, and “World” is similar to “Universe”. Similarity may have a value, e.g. a number between -1 and 1 with positive values for similar concepts (synonyms) and negative numbers for opposite concepts (antonyms).
We can take it even further by generating an embedding and adding it to a Hierarchical navigable small world (HNSW) index, so we can perform similarity searches. HNSW is a graph-based search technique.
Computing circumference
In this section we’ll take a look at a graph representing computation of the circumference of a circle with a given radius:
r = 2.718
C = 2πr
The above computation can be represented as the below graph:
In this graph we have nodes of 6 different types. Some nodes types support values:
- Constant/Literal contains numeric value
- Variable contains name of the variable
- Variable reference contains name of the variable
There are 3 types of connections:
- Containment — statement list (Solution) contains statements
- Operand — assignment and multiplication have (contain) operands
- Variable reference
Executable/Computational Graphs
In an executable graph elements (nodes and connections) collaborate to perform some actions. In a computational graph the actions are computations of some result. In this story we will use executable and computational interchangeably.
Nasdanika uses 3-layered approach to build computational graphs:
- Value/data
- Graph element — node/connection
- Processor
For our circumference graph the value layer contains numeric literals and variable names. Values may have its own internal structure. For example, a PDF document contains pages, articles, paragraphs, lines and words. Azure Document Intelligence model also contains coordinates of text blocks.
We may say that the value level would typically contain “raw” data obtained from some external data source. That data needs to be enriched with additional information in order to compute result/perform actions, this is why we need the graph level on top of it. For example, we may need to cross-reference GitLab Contributor with a person in the company directory. Or we may need to link PDF or OCR documents paragraphs to named entities in those paragraphs.
In our case the variable reference node needs to be connected to the variable assignment node.
Finally, processors “sit” on top of the nodes and perform actions/computations. As we have seen in the “Hello, World!” section we may have different graphs on top of data/values. We may also have different processors on top of graphs. In our computational example we may have:
- Synchronous (pull) processors — a call to multiply would multiply the operands and return the result.
- Reactive (push) processors — client code would subscribe to the multiply processor and receive multiplication results via the subscription. The multiply processor would in turn subscribe to the operand processors and perform computations when a new value is published to it.
- Asynchronous (pull) processors — a call to multiply would return a CompletionStage and the result will be obtained via one of .then methods.
We may also use different data types for our computations — Float, Double, BigDecimal. Depending on the data type we’ll get different precision and performance.
Metaphors
Before we delve into low level mechanics of node wiring and processor configuration, let’s take a look at a few metaphors. Metaphors are a great communication tool, IMO, and different metaphors would resonate with different people. Pick a metaphor which resonates with you!
PCB
When I was a teenager I spent a lot of time building different electronic devices — amplifiers, radio receivers, computers. So my brain was wired to think in terms of signal propagation and transformation. Maybe this was the defined factor for designing the Nasdanika Graph API the way it is.
In this metaphor there is no value layer:
- The graph layer is a PCB
- Pass-through connections are wires
- Connection processors are 2-pin components such as capacitors, diodes and resistors
- Node processors are CPU’s, RAM, PCI cards such as video cards
Cubicle
In this metaphor a cubicle is a node connected to other nodes with a computer and a phone. A person working in that cubicle is a processor — they don’t need to worry about connectivity — it is provided by the node (cubicle)
City
In this metaphor roads are connections, buildings are nodes and people /organizations working/operating in those building are processors.
We may extend the metaphor by introducing waterways and saying that roads are bi-directional, but waterways are unidirectional — things go only downstream.
API
This section explains core concepts and classes/interfaces of the graph API.
Graph
The graph API has 3 interfaces:
- Element — super-interface for Node and Connection below. Elements may have internal structure which can be navigated by accepting a visitor.
- Node — extends Element and may have incoming and outgoing connections.
- Connection — extends Element and has source and target nodes.
There are object (value) flavors of node and connection:
as well as a number of specializations, e.g. EObjectNode.
There are different ways to construct graphs from data/values, consult documentation, sources and examples for details.
Processors & Configs
Previously it was mentioned that the approach to executable graphs has 3 layers — data, graph (nodes & connections), and processors. There is actually one more layer — config.
It was not mentioned because it is a helper layer taking care of establishing inter-processor communications given the fact that when, say, the Multiplication processor is created in the above diagram, the π Literal processor may not yet exist.
Using our PCB metaphor you may thing of config as a socket. Or, using the cubicle metaphor, it is a phone and a phone number — Multiplication can call the π Literal and wait for it to answer, which might not be right away. In the API it is implemented as a call-back — Multiplication would be notified when the π Literal becomes available.
There are 3 types of configs corresponding to 3 types of graph elements:
Using configs processors obtain endpoints to communicate with other processors and register handlers so other processors can communicate with them. Endpoints and handlers are objects of a certain type. Endpoints may be of the same type as handlers or of different type.
If an endpoint type is different from a handler type you may think of it metaphorically as an adapter. Say, VGA -> HDMI.
For example, in a distributed system an endpoint method may have data argument of type InputStream. That stream would be saved to some storage (say committed to Git) and a handler method corresponding to the endpoint method would have an argument of type URI or URL.
So, handlers and endpoints can be thought of as connectors (VGA/HDMI/…) with method corresponding to pins.
Connection processors and pass-through connections
Connections may have processors in the same way as nodes. You may think of a connection processor as a processor for a node with one incoming “connection” — source node, and one outgoing “connection” — target node.
Let’s introduce one more metaphor to exemplify — an electronic circuit as shown below.
If we wanted to create a simiulator for this circuit using a graph with processors we could represent wires as “pass-through” connections with no processors:
Capacitors and resistors could be represented either as nodes or as connections with processors:
Capability processor factory
As with graph elements, there is a number of ways to create processors for elements. June 2024 release introduces a new approach which leverages the Nasdanika Capability framework.
With this approach processors are provided via capability providers and client code uses CapabilityProcessorFactory to collect processors from capability providers.
This approach provides high level of decoupling, flexibility and reuse — the same factory can be used for multiple graphs and the same graph may leverage different factories by specifying different requirements and by specifying different dependencies in pom.xml.
Diagrams
Graphs are typically represented as diagrams. Diagrams are graphs, Nasdanika Drawio API is built on top of the Graph API.
However, diagram graphs AS-IS are typically not very useful because they reflect low-level diagram structure. In order to diagram a problem domain of interest you’d need to “map” diagram elements to the elements of your problem domain. There is a book on the subject — Beyond Diagrams. Core concepts are available in the free version, you’d need to pay for the full version only if you decide to use the approach explained in the book and want to accelerate getting up to speed.
One example of diagram mapping is the Internet Banking System.
In my opinion a great value proposition of executable diagrams is What You See is What You Get — diagrams can be progressively elaborated to make them executable. The same diagrams may be used to generate documentation and report system health by highlighting diagram elements using, say, runtime metrics or development metrics — code coverage, number of vulnerabilities, … This is drastically different from the “traditional” approach — draw a diagram, code something. Whether that something aligns to the diagram — who knows?
Conclusion
The goal of this story is to nudge you in the direction of “thinking in graphs”, thinking non-linearly. While text is a graph, it is a very simple form of a graph. AI gets better and better at dealing with text. How can humans stay competitive? By thinking on a high-level in graphs and using diagrams, not text, to grasp high-level concepts. Lower-level linear “thinking” can be delegated to AI. For example, there might be a node processor implementation which uses GenAI to generate code. GenAI can also be used for summarization using RAG on top of graphs and text similarity along with graph proximity — number of hops, connection “strength”.
The Nasdanika approach to graph execution provides flexibility in graph and processor creation and allows to bind decisions at appropriate points in development process — not too soon, not too late.
There is a lot more that I’d like to explain in this story, e.g. deployment of executable graphs as Maven modules, federated graphs built from multiple sub-graphs, … However, it would turn this story into a book. So I’d rather add chapters to the Beyond Diagrams book, when I get a chance.
When I started writing this story I created compute graph diagrams just for illustration purposes. Then I’ve realized that they can be used to perform actual computations. I’ve created Compute Graph GitHub repository for this effort — you may watch it if you are interested.
The above case with compute graph diagrams illustrates one of important points of the Beyond Diagrams book — a diagram/graph may be created without a backing model/notation and that notation may be added later, even elicited from a set of related graphs/diagrams. Such diagrams may be created by non-technical people and then technical people may create domain models from them, processors, reusable shape libraries, execution using Nasdanika CLI, graph inheritance, …
One of my former colleagues once said that he wanted “PowerPoint to code” solution. Well, it is quite possible PowerPoint -> Drawio -> Domain model -> Generated code or executable diagrams.