The GhostNet Protocol

Technique Description — this section simply discuss the scaling problem for the graph approach to pattern matching, if that suffices, you can safely skip to the Solution

The program has a list of every word in the English language, and the possible part of speech (pos) candidates (noun, verb, etc) for each word. When it receives an input, the first task it does is convert each word into a 2-dimensional array of part of speeches. So:

I walked my dog in the park

Becomes the 2D syntax path:

[nn, vb-past, pn-1st-person, [nn,vb], pp, art, [nn,vb]]

Prior to this, we created a directed-tree graph of all valid pos paths (this is called a Context Free Grammar and it is what the community provides). An example of a grammar is:

<!-- PresentBeta3singVph (Pattern, disclosed) -->
<mlo:Clause rdf:about="&mlo;PresentBeta3singVph">
<mlo:disclosure rdf:datatype="&xsd;boolean">true</mlo:disclosure>
<mlo:rule>(&mlo;ActDescriptor) &mlo;3singBeEs &mlo;VbIng</mlo:rule>
<mlo:rule>&mlo;3singBeEs &mlo;VbIng &mlo;Intensifier &mlo;ActDescriptor</mlo:rule>
<mlo:rule>&mlo;3singBeEs &mlo;ActDescriptor &mlo;VbIng</mlo:rule>
<mlo:rule>(&mlo;Intensifier) &mlo;ActDescriptor &mlo;3singBeEs &mlo;VbIng</mlo:rule>
<mlo:rule>(&mlo;Intensifier) &mlo;3singBeEs &mlo;VbIng &mlo;ActDescriptor</mlo:rule>

The root of this tree has an array of parts of speech and that array represents all possibles parts of speech that can begin a sentence. Each node on the root forward links to an array of nodes (the branch), which specifies which parts of speech can be the second word in the sentence and follow the part of speech having the incoming link.

To disambiguate the syntax of the above input, we simply walk the tree, matching the first index of the syntax path to at the root to determine the search space for the following input node, and so on, until the terminal is reached. The “histories” that survive at that point are the valid part of speech interpretations. The lexicon provides the lexicographical data needed to disambiguate these syntax interpretations, and the semantic frames disambiguate ambiguity in lexical interpretations. This is a poor man’s AI, but it becomes rich when endowed with a CFG, lexical mappings, and frame semantics supplied by a world-wide army of authors and editors.

Question on Implementation
The amount of time it took for Cypher to return a response was sometimes minutes. Now you can see why all that time was spent mainly walking the syntax tree. The recursion that is allowed in the CFG is the biggest contributor to the explosion in paths. So my approach now is to distribute the walking across millions of nanoservices (smaller than micro). In a simplistic example, the tree would be a virtual file system and the part of speeches are URLs. Each URL contains a list of forward URLs. This allows my input to walk branches in parallel. My target latency is a few hundred milliseconds.

My question before I start writing too much code is, is this possible, the speed I mean? I can implement a (probably) slow one tonight, but it will likely involve me creating a web service for each and every branch in the tree. Can we somehow do this as a super fast, distributed file system? So that’s what I wanted your input on, is spinning up services the way to do what I described above, or can we somehow make a virtual file system that does it without the overhead of services? My first POC will use services and a very small CFG.

The Go language might be the answer. Go allows you to turn a single, 1kb file into a live web service with no added setup or overhead. The important point of this description is that files are assets that can be copied and distributed. I propose to model the graph as a physical file system, and traversal as extra quick visits to those files, which each file telling the walker where to go next. Traversal becomes parallel when forward links are traveled simultaneously. What if each node in the file system was a Go host, a ghost? Then “visit” becomes “nanoservice API invocation” and the node now encapsulates (ultra-lite) logic in addition to a forwarding list.

GhostNet protocol — a low-latency, highly scalable P2P network for recursive graph traversal, communicating over HTTP. Each node in the network is a Go host, a Ghost. Assignable port range: 49151–1024 (48127 ports) max ghosts per network interface? Go supports threading and parallel processing which can be used to dismiss the need to have the ghosts residing on a certain peer to run in separate processes (which, for now, buys us the local concurrency we need). Here is a description of the protocol that implements a phrase grammar tree traversal.

  1. Load the CFG, traverse the tree
  2. For each ghost, check if the ghost is registered at the current coordinate in the tree
  3. If the ghost exists, open the ghost file and modify it as follows
  4. If the ghost does not exist, create a ghost file and mint a (physical) address (host:port); make the file name the pos_type and its coordinate in the tree [Later for scaling beyond limited port range: If the ports have been exhausted, then mint a new address with a new domain name and set the ‘peer-mirror’ field value to the minted address, then switch to the peer-mirror file as the current ghost, then during matching, be sure each ghost also sends its work to its peer-mirror]
  5. In the file, make a pos_identity array field and hardcode its value to the values of the current ghost.
  6. In the file, if the ghost has no children (is a leaf), hardcode the context field value to the value of the loading context (the element’s ‘name’ property)
  7. For each child ghost, begin at 2
  8. In the file, make a forward_urls array field and set its values to the child ghost addresses
  9. FTP the file to its corresponding address (instead GoLang on the server if it is missing)
  10. Execute the file to activate the ghost
  11. Ping the file to ensure it is activated

When the ghost receives a request:

  1. If there is no originator (if the ghost is root)
     a. if there is a history, then dump the history onto the response stream (was the caller a ghost and not a GhostNet client?)
     b. if there is no history set the originator value to the ghost’s address (was the caller a GhostNet client?)
  2. If there is an originator, strip off the last index of the originator parameter value (are we servicing a ghost?)
  3. Strip off the first index of the value of the ‘pos_path’ parameter and
  4. Match the removed index against the ghost’s own pos_identity, if the match fails drop the request (meaning, don’t forward it)
  5. If the match succeeds, then append the matched pos identity to the ‘history’ parameter
  6. If the ghost has no children then prepend the ghost’s context to the pos_path and invoke the removed originator with the updated history, pos_path and originator parameters; if the originator parameter is non-empty, then invoke the first originator also (is this a leaf ghost, a context terminator?)
  7. If the ghost has children (if the end of a context has not been reached)
     a. if the path is empty, drop the request
     b. if the path is not empty, then
    [match at the primitive level]
     1. for each primitive child, build the forward URLs by appending the new pos_path, history and originator parameters; invoke each forward URL
    [match at the meta level]
     2. for each meta child, append the child’s forward address to the originators parameter and invoke the first originator address
  8. Return empty string to the response string

A simple phrase grammar:

<rule name=”NPH”>art n (pp NPH)</rule>
<rule name=”VPH”>adv (be) vb</rule>
<rule name=”CLAUSE”>NPH VPH (NPH)</rule>

This grammar will correctly disambiguate the parts of speech of the constituents of the following sentence:

The raiders of the lost ark will set sail

The ghost network is the directed tree graph that stores the grammar. Each node in the graph is reactive to stimulus. Walking is achieved by the propagation of a signal from one reactive node to the next upon stimulation of the root node. This transfers the problem of tree traversal from the foggy domain of theory and the province of algorithms to the well charted domain of networking. How those physical nodes are linked has become the intelligence (physical in the sense that they are bonafide network citizens). The algorithm has been ephemeralized by a protocol. The network has replaced the logic board. There is now a ghost in the machine. But that sort of intelligence is just smoke and mirrors, it’s all emptiness. Surely GhostNet would not have originated on its own, not even after an endless eternity of raw entropy. That emptiness had to have been put there by more intelligent beings. The logical conclusion is that we too are just smoke and mirrors, but more convincing than the smoke and mirrors we created. To be consistent, we must also assume that that emptiness (the sort of emptiness we are) was put there by someone who is surely more intelligent than ourselves. We assume that One is also just smoke and mirrors, but a better display of smoke and mirrors than ourselves, and put in its place by… who? One observes a lengthy pedigree of smoke and mirrors, each less real (less authentic) than the one that came before, but where does the emptiness end? Or is emptiness itself authenticity (is pure emptiness the ultimate archetype of authenticity)? If it were, then it is the source of all the lesser grades of authenticity (“reality”) that followed.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.