The “Ontology” in Machine Learning

Published in

Multimodal Data Training

4 min readFeb 10, 2023

In Machine Learning, Ontology is increasingly used to provide ML models based on similarity analysis and scenario knowledge.

In traditional label-based definitions, objects are often isolated, with poor scalability and the possibility of duplication, and the relation between objects cannot be reflected. In Ontology-based definitions, objects no longer exist in isolation, and functions such as scenario search, Ontology fusion, and Ontology recommendation can also be achieved through labeling of relations.

Ontology was first introduced by Xtreme1, the world’s first open-source multisensory training data platform, to abstract the definition of AI problems from various model requirements. It can be reused and extended to build the knowledge base of AI algorithms, thus accelerating model development.

What is Ontology?

The Ontology is a structured way of describing everything in the world, including three elements:

Class — representing a type, label, or abstract class that represents an instance;

Relation — representing the relation between descriptions, which can be directed or undirected graphs. For example, in an autonomous driving scenario, cars may have “parallel,” “overtaking,” etc. relations;

Properties — representing the attributes of a Node or Relation. For example, the attributes of a “car” may include “color,” “window open/close,” etc., and the attributes of a “pedestrian” may include their “gender” or “mask on/off.”

Autonomous driving is one of the most promising and challenging research topics for AI companies and the automotive industry. Currently, mainstream autonomous vehicles are equipped with some highly sensitive sensors such as cameras, LiDAR and Radar devices. Although these sensors may already be able to accurately identify specific objects, such as a car or a no-turning traffic sign, the vehicle cannot understand the meaning of the driving environment without a comprehensive understanding of the data scenario. Therefore, a machine-friendly knowledge representation method is needed to bridge the gap between perceiving the driving environment and processing knowledge.

What is the role of Ontology in data curation?

After defining classes and properties in Ontology Center, users can easily search for scenarios such as “Chage Lane”. The Ontology Center can also deduce new annotations based on rules between classes, properties and relations. As the amount of Ontology data increases, the Ontology Center can also recommend better-performing Ontology models in different domains.

2.1 Scenario Search

Scenario search solves the problem of how to define and find data that occurs in a specific scenario.

Traditionally, when we curate data by defining data through a hierarchy of labels, a problem arises that the data labels are too general and cannot pinpoint specific problems in the data scenario. At the same time, in traditional label-based definitions, objects are too isolated and cannot avoid duplication or show connections with other objects.

The Scenario Search function defines objects through classes and properties, defining scenarios through relations and properties between objects. It is easy to define and find scenarios such as lane change, parking, turning, and runway invasion.

2.2 Ontology Reasoning

In annotation and QC, new labeling results or problematic annotations can be inferred based on rules between properties, classes and relations. For example, in the autonomous driving scenario, the red, green, and yellow lights belong to the same traffic light. If the state of the red light is “On,” then it can be inferred that the green and yellow lights are definitely not lit. If both the red and green lights are labeled as “On,” then the labeling result may be problematic.

2.3 Ontology Disambiguation

In annotating, it is common to encounter different definitions in the same data batch. Ontology fusion can help users resolve these inconsistencies.

2.4 Ontology Recommendation

In the SaaS version of the open-source Xtreme1, when data accumulates to the PB level, higher-performing Ontology models can be provided for common model needs in various domains to facilitate customized solutions.

*Core Ontologies for Safe Autonomous Driving*

Highlights of Xtreme1 v0.5.5:

· The new Ontology Center is aimed at cross-dataset management of Ontologies and data, and refining industry templates and solutions based on scenarios for model training;

· Classes and classifications of CRUD Ontologies in the Ontology Center;

· Ontology Fusion between classes in datasets and those in the Ontology Center;

· Export and import of Ontologies in the Ontology Center and datasets;

· Copying of classes and/or classifications from the Ontology Center;

· Push/pull of Classes in datasets and Ontologies;

· Scenario search across datasets for the same data type;

· Exporting search results as a JSON file or a new data set.

Planned features in future versions of Xtreme1 include:

· Annotating relations and searching by classes and scenarios;

· Property search (in scenario search) by classes, relations and/or properties.

Website | Xtreme1.io

Docs | docs.xtreme1.io

GitHub Repo | github.com/xtreme1.io/xtreme1

Slack | xtreme1io.slack.com

Reference:

Core Ontologies for Safe Autonomous Driving: https://ceur-ws.org/Vol-1486/paper_9.pdf

* Not supported in the current version v0.5.5. Please keep following the updates.