Bob van Luijt
SeMI Technologies
Published in
3 min readApr 20, 2021

--

Taxonomies, ontologies, and schemas. How do they relate to Weaviate?

Because Weaviate has a graph-like data model, people often ask questions about how Weaviate deals with taxonomies, ontologies, and schemas. And to make things even more complicated, Weaviate is adding terminology like vectorizers and contextionary to the mix.

Confusing? fret not! It’s actually quite simple…

Taxonomies, ontologies, and schemas

  1. A taxonomy has a hierarchy (e.g., an elephant is of the order Proboscidea, which is of the class Mammalia and of the kingdom Animalia)
  2. An ontology distinguishes concepts and their relationships (an elephant with the name Alice that lives in a zoo that is located in Amsterdam).
  3. Ontologies focus more on the semantic relationships whereas schemas focus more on the data structure (e.g., the data class Elephant that has the data properties: name and livesIn).
  • Because Weaviate is a database (or a vector search engine to be more ontologically correct), it uses a schema.
  • Rule of thumb I — any taxonomy or ontology you have can be translated into a Weaviate schema.
  • Good to know — Weaviate’s schema structure is inspired by the Semantic Web and RDF. That means that classes have capitalized first characters and properties have lowercased first characters.
  • Rule of thumb II — almost any graph ontology can be translated into a Weaviate schema. We are even looking at more fancy things likes RDF2Vec, but that’s for later…

Classes and properties in Weaviate

Now we know Weaviate works with a schema, you need to know one more thing. It uses a (by RDF inspired) Class/property structure.

  • A class describes a worldly thing (e.g., Elephant, Document, Email, Animal, Photo, House, Rocketship, etc)
  • A class can have one or more properties, for example, a Document has a filename, an Email a subject, an Animal a name, etc)
  • Properties have a data type, e.g., a name is a string, a datum a date, a price a float, etc.

When you run a clean Weaviate installation, the first thing you need to do is create a schema. The demo dataset in the Weaviate Documentation contains Publishers and Articles that you can checkout as an example, you can also inspect the schema of the demo dataset.

  • Good to know I — Weaviate ❤️ GraphQL. Therefore, the way GraphQL is structured ❤️ Weaviate.
  • Good the know II — inside Weaviate (like in any other database), you can create any class and any property you like.

Sometimes users create super simple schema’s for example only the class Document with the properties title and content. Sometimes users create more complex schemas. You can create whatever schema your use cases need.

When you are done creating the schema, you can add data to Weaviate. These individual data objects are called… data objects 😎

Vectorizing data objects

What makes Weaviate unique, is that every data object is accompanied by a vector representation. You can set the vector yourself or you can use one of the modules (we have an ever-growing amount of them available or you can create your own).

For example, the following query shows all vector representations for the objects in our news article dataset that represent the news outlets (e.g., The Economist, New York Times, etc). You can run the query here yourself.

  • Good to know — the difference between data objects and graph nodes in Weaviate is the same. So you can interchange “node” with “data object” depending on how you like to use it.

Vector searches on data objects

Because the vector representations function as coordinates in a space, you can do nearest neighbor searches or automatic classifications! This is the juicy part of Weaviate! The gif below shows example queries. You can also try these queries out yourself on our news article demo dataset.

--

--