Transcography Overview

This is a proposal for a Listener model for RDF quad stores. Imagine if the bot could accept a set of arbitrary RDF triples from the user, and then respond to those triples (instead of user chatter), meaning its responding to (is stimulated by) the semantics in the triples, not to a series of mainly opaque string tokens entered by a user. This is disruptive. This would allow my arbitrary agent to send a quad store some triples which my agent transcographed in the course of its operations, then “subscribe” to the semantics in those statements. This is the more flexible way to model stimulus/response. In the domain of NLP, the input triples are generated by transcographers (a program that executes transcography) but really the magic comes in when all types of programs can subscribe to (or Listen) to quad stores and be sent syndicated triples to “handle”. The magic is in the diverse sources of those triples, and because of transcography, that source will many times be free-form text and chatter.

With a transcographers, the lexicon maps natural language to semantics. Here is a simplified “eat” lexeme:

<vlo:Sense rdf:about=”&vlo-terms;EAT”>
<vlo:partOfSpeech rdf:resource=”&vlo-terms;v”/>
<vlo:requires rdf:resource=”&vlo-terms;Clause/subject”/>
<vlo:allows rdf:resource=”&vlo-terms;Clause/directObject”/>
<vlo:ref rdf:nodeID=”eatFrame”/>
<vlo:EatFrame rdf:nodeID=”eatFrame”>
<vlo:eater rdf:resource=”&vlo-terms;Clause/subject”/>
<vlo:food rdf:resource=”&vlo-terms;Clause/directObject”/>

The second set of triples is the way you say somebody ate something in RDF (this is called a semantic frame). The first set maps transcographed inputs to participants of the eat frame. Now, if my semantic web agent wants to know if somebody ate something, or who ate, or what folks are eating, etc. then it just queries the vlo:eater and vlo:food of the vlo:EatFrame. Pause to think of what that would mean. This let’s me use machine language (SPARQL) to query (transcographed) natural language text… newspaper articles, blog posts, emails, and yes, user chatter. But it access all this content by way of well-defined, succinct semantics, not hairy, ambiguous NLP. This is disruptive.

The paradigm shift is “arbitrary triples as input, the stimulus/response logic is crowdsourced”, instead of “user chatter as input, stimulus/response logic provided by a individual (or small team) of botmasters”. This is the case of natural language statements. For natural language questions, the transcographer simply puts the transcographed triples in a SPARQL query, turning the interrogatives (who, what, when, where, how, etc) into the projection of the query.