1. On the role and the whatabouts of Ontology

Robert Engels
Data & AI Masters
8 min readMar 15, 2022

--

And why it is relevant for discussions on Knowledge Graphs.

Intro on why there are ontologies

As technology for automating information-intensive tasks mature and organisations strive to become data-powered, data amounts in a vast pace nearly everywhere. This data is not necessarily in a shape so that it can be found, understood and reused by everyone (anyone?) in the organisation, let alone that it can be integrated, merged or aligned with other data.

Enter the headache of many a CxO, Enterprise (Information) Architect, Data Engineer/Scientist and actual users of data. Many solutions have been coined, some have been implemented, several technology providers claim to have built the holy grail, consultancies alike. But in reality the issue of how to get a good and common understanding of data with varying content, format, origin, type, whereabouts, access rights and rights-of-existence is not solveabl euniversally. And maybe it will never be. But what we can do is use existing mechanisms for information publication, sharing, merging, alignment and analytics and let you pick those which cover your needs. Despite an increasing complexity caused by globalisation of data creating pipelines and increasing complexity in suppliers and platforms, there still are good solutions out there for staying in control of your data ecosystem. But, as so often, without a clear idea of the end goal in mind it will be hard to build a good strategy for such an ecosystem.

So, in order to be able to build a long-term vision and belonging strategy for your Data Ecosystem which holds over time with a basic flexibility, a bit of an insight in “what´s there” might be useful so you can decide on what level of attack is the right level of attack for you. Surely you have heard about vocabularies, thesaurus, knowledge graphs and maybe even about semantics and ontology. But what does it all mean? And how is it related? What should I choose? In this article series we will try to disambiguate and explain some of the concepts and the relevance for dealing with information in today's data ecosystems.

The internet, if regarded a network of connected computers with storage capability, principally enables information publishing and sharing. But data stored in this network is not necessarily easy to interpret or merge together, reason being that it is of largely varying content, originating from different domains, using different technologies and formats, and with many (unrelated?) goals as “raison d´etre”. In order to be able to make sense out of many, probably interconnected but maybe not, data sets, we would like to have a kind of mechanism that can take concepts from the different domains and connect those to each other. If we in addition can do this in a manner that interconnects the many data sets which are distributed over the internets´ many “nodes” we would have taken a good step closer to our goal. To this end, a concept we might want to explore is the concept of graph.

Graphs are quite intuitive. They consist of nodes and links which combine into a network, not unlike the human mind represents things. You will probably have some idea of concepts and the relations between them and how to use them to interpret and understand a particular context. In such a way you can find a course of action, participate in dialogues on a specific topic or even provide context for AI model execution and explanation. Graphs can connect information from different domains or areas with each other and they might be distributed virtually and physically. One strong advantage of graph based systems is that one can easily extend a graph in a live system or solution without influencing performance. Other advantages (depending on type of graph solution) can be ease of fully automated graph alignment for graphs maintained in different systems, federated querying and result merging and so on.

But how to represent infomation in such graphs? What are these various types of modeling? Let´s explore some of the philosophical issues of ontology before we proceed to their usage for knowledge graphs.

Already in the antiques, philosophers like Plato, Socrates and Aristotle had many and deep discussions on how knowledge and meaning should be represented. One of the concepts they discussed intensively is on how knowledge is structured so that it´s meaning can be conveyed and it´s context understood between actors. To that end, they defined the concept of ontology. Ontology, in its philosophical sense, is “a branch of metaphysics concerned with the nature and relations of being”. This sounds rather theoretical, but we do want to do similar things within a computer, do we not? We need some way to make sure information that is stored in a computer can be retrieved, understood, shared, merged, related, published and maybe reused by other programs, probably running on other computers. Therefore, and not surprisingly, the concept of ontology also popped up in computer science. One of the early researchers in this field was Tom R. Gruber, a computer scientist who worked with information representation and cognitive intelligence since the 80s. Gruber came with maybe the most cited definition of ontology for computer sciences (Knowledge Acquisition, 5(2), 1993):

“an ontology is a formal specification of a conceptualisation” — Tom Gruber, 1993

Now chew a bit on this. It is quite a neat interpretation of Platonean discussions and it forwards the rather spongy philosophical concept as a specification. And specifications are things computer science likes! It implies also that a digital ontology should use some kind of underlying formal semantics, as various logical frameworks could offer. By using such a formal specification of (a part of) the world around you, you can describe a machine-readable conceptualisation and share this between computers and programs. That is not bad, but is not this overly complex? Yes and no, and it depends 100% of your use case. So, being aware of what you need (now and in the future) and the implication on choices you make (and thereby: possibilities you will exclude) is of the essence. In some cases a full-blown use of ontology in all it´s mights is definitely too complex, in other cases you will not be able to “live without it”. It is therefore not surprising that there is a smooth transition of “formal specifications of a conceptualisation”, from very low/no formal semantics towards a fully specified formalised semantic ontology.

Figure 1: conceptual frameworks with increasing semantics and logical expressiveness.

Let´s consider possibilities (cf figure 1). Simple and easy, Folksonomies are basically nothing more than a bag of user-defined annotations or tags which can be used to retrieve something. Think for example of hashtag annotations like “#folksonomy #nosemantics #easy. Users can add to the bag whatever they want, no restrictions posed. Social media are typical extensive users of this. Freedom and no restrictions, everybody wants that!

Everybody? Certainly not. If everyone can tag what they want, how do you make sure you can find back to all things you need? If you search for #boat, but people have tagged with #ship or #vessel, you will not find it back. So archivists and others defined lists. Often an experienced archivist curates such lists. Lists are predefined, you can choose, but not change or add (at least not at will).

Does that solve the problem then? Sometimes, but not always. Why not give someone the freedom to search for #boat and retrieve #ship and #vessel at the same time? Enter “synonym networks” like WordNet (Princeton.edu). They offer a nice synonym network for a vocabulary, relating synonyms, meronyms and antonyms. Thus now you can search for a specific word, and through resolution of all synonyms, you can get results for the whole set. And when used in combination with an ontology, you can get a wider context around those synonyms too. Great!

Adding set-theory to the game provides ways to define class membership and find set/subset relationships and order them as taxonomies. For example, the well known (and sometimes irritating, but always usefull) fruit fly (Drosophila Melanogaster) is of the taxonomic order Diptera in the family Drosophila.

Figure 2: Biological Taxonomies even contain Kingdoms! And Drosophila Melanogaster is a very productive member of one.

In that sense all flys that are “Drosophila Melanogaster” are “Drosophila”, but not all “Drosophila” are “Drosophila Melanogaster”! Think set/subset & Venn diagrams. Taxonomies are used for classification system like placing and finding books in a library, the earliest implemention of the Yahoo search (or better: find back?) pages, and for engineering products, parts catalogues and many more. So a little semantics (set/subset) can have quite an impact.

A thesaurus consists of much of the above: synonyms, antonyms, taxonomy. In addition it provides concept defintions and as such they can really become the basis for a good semantic definition of relations in a domain of knowledge.

But there are many more relational types to find in the real world. All the conceptual frameworks mentioned so far are restricted to specific relation types and definitions. Just imagine if you can allow to define all types of relations, and put these into relation to each other in their turn? That you can publish what types of relations there are, which properties they can have and which type of concepts can be used in a relation? All based on logics, thereby defining a world that can be automatically processed, aligned, shared and reasoned upon? Then enter the world of Ontology. It is important to realize that an ontology describes the world abstracted from facts, i.e. there might be a description of dogs befriended with humans, but no mentioning of Pluto befriended with Ola Nordmann.

Figure 3: Ontology can connect the world together. Even if we do not agree.

Sometimes semantic models contain both the ontology and the factual knowledge in a large, combined model with semantics added to concepts, links and facts. In such cases the word Knowledge Graph is often used, with the Google Knowledge Graph as a great example of one. The label Knowledge Graph is currently used widely (as often with trending words), and can be refering to many of the frameworks mentioned above, with or without formal semantics, and with or without a clear seperation between model and facts. That is not a problem, just something you have to be aware about.

Finally back to the concept of Ontology, which is essential in modern architectural patterns in order to ensure data quality, governance, findability, interoperability, accessibility and reusability. Bringing together holistic approaches on organisation, information architecture, technology, frameworks and patterns like Data Mesh & Data Fabrics with Ontology seems to be essential for the success of both.

So now you have some insights in all the possibilities for defining information models, schemes, semantics and frameworks. And you are set to build a data and information ecosystem for your organisation or customer. What to do? How do you proceed practically? Stay tuned for the next blog where we dive deeper and work with a practical guide on how and when to use what.

Next: practical guide to how and when to use which concept definition schema

--

--

Robert Engels
Data & AI Masters

Broad interest in topics like Semantics, Knowledge Representation, Reasoning, Machine Learning and putting it together in intelligeable ways.