“Figuring out” vs “Telling”

Dean Allemang
5 min readMar 3, 2023

When I was a grad student studying Artificial Intelligence back in the 80’s, Expert Systems were all the rage. Our “Holy Grail” was to create a system that could do something difficult, like medical diagnosis. Given a bunch of symptoms and signs, figure out a diagnosis for a patient. The point was to have a computer figure out something we didn’t already know.

A much more popular topic in those days was “databases”. We weren’t specific in those days about whether we were talking about database design, construction or use; that was all still being worked out. As an AIer, I wasn’t interested in databases; after all, all they are doing is telling you stuff that you already told them.

But then I found myself being impressed with so-called “tell and ask” systems, where you could “tell” the system something like

:Shakespeare :wrote :Hamlet .

then “ask” something like

?who :wrote :Hamlet .

and get the answer “Shakespeare”. But you could also ask

:Shakespeare :wrote ?what .

or even

:Shakespeare ?didWhatTo :Hamlet .

and you’d get “Hamlet” as an answer for the first question, and “wrote” as an answer for the second question. Isn’t this just telling us what we already told the computer? Where is the intelligence?

If we got even more data, couldn’t we answer questions like “Who was married to a playwright who wrote a play about the royal family of Denmark?”

?who :married ?playwright .
?playwrite :wrote ?somePlay .
?somePlay a :Play ; :setIn :Denmark ; :about ?Character .
?Character :nationality :Danish ; :occupation ?someOccupation .
?someOccupation oneOf :King, :Queen, :Prince, :Princess, etc.

I might have never thought of Anne Hathaway (Shakespeare’s wife, not the actress) in this way. Is the machine just “telling me what I already told it”? Or is it “figuring out something new?”

The situation gets even messier if the information we’re working on comes form different sources. Suppose one source tells us about the play called “Hamlet”, but another one tells us about Shakespeare’s family? How do we bring that data together, so that we can answer this question (regardless of whether this constitutes “figuring out” new information, or “telling me” something I already knew)?

abstract image of a sphere seen from different vantage points.
Figuring Out vs Being Told

Nowadays, I often hear a Semantic Web enthusiast say that the Semantic Web is interesting because it allows for inferences, that is, the inferential system can tell you things that you didn’t tell it. If you tell your inference-enabled semantic web database

:Socrates a :Man .
:Man rdfs:subClassOf :Mortal .

then if you ask

?who a :Mortal .

“Socrates” will be among the answers.

This kind of “figuring out” is based on logic, and so it doesn’t seem particularly innovative (after all, logic is centuries old), but it does help you make sure that your datasets are consistent, either internally or even with one another.

Over the course of my career, my attitude has changed. When I was into Expert Systems, I thought that a system that just repeated back what we told it couldn’t be interesting; to be interesting, it had to come up with something new. Now I am pretty excited about systems that can store knowledge — why the change?

Here’s an example of something I think is interesting, where the machine (or more specifically, the web) is just repeating something someone told it. In the Life Sciences data community, there are a lot of collections of data that describe chemicals that are important to biology. One of them is called UniProt; a catalog of proteins. There’s an interesting chemical whose UniProt index is P01308. This is a pretty well-known and interesting chemical, so it isn’t surprising that it appears in other chemical indices, such as CHEMBL, where it has the index 5881 . You can see this cross-refence on the display page for UniProt or CHEMBL. If you paste the following query into the SPARQL endpoint for UniProt

SELECT * 
WHERE {<http://purl.uniprot.org/uniprot/P01308> rdfs:seeAlso ?o .
?o <http://purl.uniprot.org/core/database> <http://purl.uniprot.org/database/ChEMBL> .
}

You’ll find a link between UniProt’s P01308 and Chembl’s 5881.

But how did we know that these two chemical references actually refer to the same chemical? We could observe that they have the same common name, i.e., Insulin. We could have a look at their chemical structure; we could look at the trade names, and what conditions they are prescribed for. We could run a machine learning algorithm that suggests that these are the same. These are all good ways to figure out that two things are the same. But here’s the point; once we do that, how do we write it down, so that nobody else has to do it again? And how do we help someone trust this information, so that they can confidently use it again? If we make everyone in the world figure it out again, we aren’t advancing human knowledge, and indeed, we are going against the whole idea of the scientific method, where new results build upon old ones.

This is the beauty of the Semantic Web — no matter how you figure something out, whether that’s using a machine algorithm, or a human agent, or a human being using an algorithm to do research, we need a way to record the fact, make an attribution to that fact, and assign some trust to it. The Semantic Web (in particular, RDF) let’s us do the first part of this (recording the fact). For the rest, sharing attribution and assigning trust, this is something that is inconvenient to do with RDF, but is made much easier using RDF Star.

So there you have it; back in the 80’s, we were fighting over the different ways one might “figure things out”, and even how those would compete with some human being “figuring it out”. But one of the lessons we learned from the World Wide Web is that information is a lot more useful when it is shared, and the more widely it is shared, the more useful it can be. In addition to having ways to “figure things out”, we need a way to “tell” what we’ve figured out, and to tell that in a way that other people can find it, access it, interoperate bits of it, and re-use it. The beauty of FAIR data is that we can bring together all the wonderful ways we have for “figuring things out” by making it possible for them all to “tell” us what they learned.

--

--

Dean Allemang

Mathematician/computer scientist, my passion is sharing data on a massive scale. Author of Semantic Web for the Working Ontologist.