HyperGraph Transport Protocol

vito
6 min readJun 10, 2020

--

I’ve received a lot of questions around the unannounced Splunk x Constellation partnership the community has dug up. Mainly speculations on how they interact with each other and what the partnership entails. In order to explain my theories around that, some fundamental understanding of HyperGraph Network is needed. This article will elaborate on that.

Please note that I do not know all the “nitty gritty”, low level details of the HGTP, nor will I attempt a technical deep dive - but merely try to give you a high level understanding of how Constellation’s HyperGraph Transport Protocol connects to commonly used tech stacks, and the value it brings to the organizations adopting this technology.

First, consider a client-sever communication…

WTF?

Let me explain. Whenever you visit a website with your web browser, you’re basically viewing a file stored on another computer, through your web browser. You know this already. Good. That’s a client-server communication. Your browser is the client, and the computer housing the file (index.html) is the server. Your browser is communicating with that server, in order to present the contents of that file, to your browser. It looks something like this.

Client requests contents of the file index.html (GET). Server responds with the contents of that file (OK)

This is basically how all communication on a network is done. Of course there are both software and hardware layers to this to ensure that the communication works as expected. But this is triggered from the application layer, more specifically — the web browser. After you have entered a URL in the address field and click enter on your keyboard. The browser will initiate a HTTP GET request for the server IP “hiding” behind the URL that you just entered. google.com resolves to 216.58.207.238 for instance. You get the idea.

That HTTP GET payload will get sent through the layers of the OSI model (you don’t have to care about that) and eventually out on the internet, before arriving at the server housing the index.html file.

The server will receive the request (HTTP GET) and hopefully respond OK with a network packet also containing the index.html file contents, so that your browser can draw up the website that you’ve requested. Web browser does exactly this, interprets HTML markup (contents of index.html) and converts it to colors and text that is the website you requested.

To summarize: The client sends a request to a server. The server receives it and sends back the requested contents.

Great! We understand a client-server communication.

A server, client and a transport medium are required in order to establish a client-server communication. Depending on the types of applications and data that are used in this communication, these mediums may vary. This could be a web server (server) and web browser (client) communicating over the internet. In which case the transport medium is fiber, with a transport protocol called TCP (transport) which negotiates the communication using software logic between the client and the server. The TCP implementation is within the OS itself.

That logic is called a three way handshake and is basically:
1. The client sending some data to the server. (SYN)
2. The server responding to that data once received. (SYN+ACK)
3. The client acknowledging that it has received the response from the server.(ACK)

Now the handshake between the client and server is complete and a TCP connection is established between the two. So the request (HTTP GET) can get sent from the client and the server will acknowledge that request and the website will be served to the client by the server. Finally TCP will close the connection (FIN+ACK).

This is the full back and forth communication to send an HTTP request and get a response.

This type of communication is used in almost all software. Only difference being the types of clients, servers and transport mediums that they opt in for. But data is passed between processes on a single operating system. It is passed between software on different computers on your home network. It is passed between servers and clients on the internet. In this case we’ve looked into the characteristics of how TCP works to ensure data transmission. It is suitable for HTTP request, but for realtime video or voice — not so much.

Software shares data with other software, but problems may arise when there are millions of clients, retrieving and sending data to millions of servers. Especially when it comes to large quantities of data. Consider the below image.

This poses a ton of complexities for Software Engineers and Ops personal. For now — we’ve only covered the absolute bare minimum of a single client-server communication. In a production landscape, you’d have thousands of TB of data being processed, sent, managed, encrypted all in real time across thousands of different mediums.

Software Engineers and Ops personal needs to ensure that all this data is being secure, timely, intact and with zero downtime. It also requires software engineers to produce code for every individual client, server, transport medium that one specific application will be leveraging. Some of them are RPC, REST, gRPC, WebSockets, Raw Sockets, YAML, JSON, YANG, gNML, XML etc.

There could be hundreds of unique combinations of data formats, data structures and protocols at play. All requiring explicit code implementation. Across thousands of unique clients and servers. Deployed on millions of computers.

This is a problem that practically all companies deal with today. There are solutions such as Kafka, that addresses this exact problem. This has been adopted by the likes of Netflix and LinkedIn (they actually built it) because it enables them to use Kafka as a message bus (transport medium) for communication across all of their servers and clients. In other words, they went from this:

To this:

Kafka is an amazing piece of software. It truly is. But it obviously needs to be managed by the company adopting the technology. That comes at a cost of operational expenses and it obviously cannot ensure data immutability if the system has been compromised.

To leverage Kafka you need to first deploy a Kafka cluster using (most likely) costly cloud compute resources (like AWS, GCE or Azure). Additionally you need to ensure the availability of the Kafka cluster and patch security vulnerabilities of Kafka and associated packages and libraries over time. You need to manage security across all your data pipelines (client-server connections).

In other words — you need to invest time and resources to make sure it doesn’t break and the data doesn’t get spoofed by a malicious actor.

The HyperGraph Transport Protocol addresses this exact use-case, with the major differentiators from Kafka being:

  1. Guaranteed data immutability over the internet.
  2. High Availability. As long as there are nodes on the network, the network will never go down.
  3. No in-house Ops personal needed to guard the data and network.

This is just the HyperGraph Transport Protocol or HyperGraph Network. It’s a fundamental and frankly — revolutionary piece of technology. But it’s just the foundational layer. Even more value will be introduced with the applications built out on-top of the HGTP by the Constellation Team and others.

Next up I will produce an article on how Splunk can benefit from using HGTP. Thank you for reading!

--

--