Semi-automatic generation of a Reliable Knowledge Graph for Space Mission Design with Grakn

Francesco Murdaca
Oct 3, 2018 · 6 min read

New challenge
In the digital era the amount of data generated is increasing continuously and it is stored every fraction of a second. Every domain is affected by this phenomenon, among which the space domain. Specifically, this amount of stored data is made of datasheets of space parts, final mission reports of past missions, web data, textbooks, publications, and so on.

These data are an invaluable source of information and knowledge, but despite the large volume and availability, it is underutilized. The source of data can be of three types: structured, semi-structured and unstructured. The latter one requires more effort to be handled because it needs to be converted to machine-readable data in order to be used in a software technology. These data shall be stored in an intelligent way in order to allow knowledge management, knowledge reuse and knowledge discovery from several sources of data.

Knowledge graphs are an efficient medium to carry out those tasks.

The manual construction of a knowledge graph is a complex and time-consuming process. The new challenge is an automatic or semi-automatic creation of the knowledge graph.

This is the main focus of this article and it is currently the research topic of Francesco Murdaca in the frame of the Design Engineering Assistant (DEA) project [1], [2], [3]. This project aims at enhancing the productivity of Human experts by providing them with new insights on large amount of data accumulated in the field of space mission design.

The project started in January 2018 together with another PhD Student in charge of the front-end side, Audrey Berquand, at the University of Strathclyde within the Intelligent Computational Engineering (ICE) lab, under the supervision of Dr. Annalisa Riccardi. The project is done in collaboration with the European Space Agency (ESA) and industrial partners: Airbus, RHEA and satsearch.

Illustration by Kim Carney / Fred Hutch

SMART-DOG (Strathclyde Mechanical and Aerospace Research Toolbox for Domain Ontology Generation)

smart-dog is a framework that can be used for the semi-automatic generation of a knowledge graph. It allows its development and validation. The entire framework is built on top of Grakn, the intelligent database, which provides the interface for the creation of the knowledge graph, thanks to its API, using the Graql syntax and Grakn data model, and the possibility to validate the knowledge graph with the Graql reasoner.

Integration of smart-dog in the development of an intelligent system [3]

In the frame of the DEA project, the factors to be considered for the back-end part are:

  • Source of data and user requirements
  • Data modelling
  • Rules
  • Inference engine

The lifecycle for the development of the backend part consists of 3 high-level phases:

  • Definition Phase: Statement of the requirements coming from data source and user, selection of the technologies and language for the data modelling.
  • Implementation Phase: once the sources have been identified, the generation and population of the knowledge graph can be started. Verification of each stage shall be performed because the process is iterative. In this phase the rules will be introduced and tested.
  • Validation Phase: for the final integration of the system.

Once the definition phase is concluded, the data modelling construction starts. This is the most critical and time-consuming task. This is where smart-dog comes into play. The process to build the knowledge graph is iterative. smart-dog’s architecture, shown below, is modular due to the different algorithms adopted and the main modules can be listed below [3]:

  • Raw Text Extraction Module, this will extract the raw text from several formats (e.g. .pdf, .html, .docx, .pptx).
  • Natural Language Processing (NLP) Module, this will perform NLP techniques on the raw text.
  • Context Identification Module, this is used for two purposes mainly, to understand the domain context of the documents, but also to avoid the introduction of sources out of the domain, so as filter.
  • Ontology Learning (OL) Module, this applies OL techniques ([4], [5]), for the generation of the Knowledge Graph Structure.
  • Ontology Population Module, this performs Knowledge Graph Information Extraction to populate the Knowledge Graph.
  • Grakn Interface Module, this is the API with Grakn.
  • Validation Module, this performs integration tests to validate the results provided from the Knowledge Graph.
smart-dog modules

Choice of Grakn
Knowledge graphs can be used in several applications and for different purposes, for example, expert systems. The creation of intelligent systems really starts at the database!

Four main components of an expert system have been considered:

  • Knowledge Graph (KG), contains the structured knowledge arising from the data modelling created for the specific domain.
  • Database of Rules, store the rules extracted for the data that will be used by the inference engine.
  • Inference Engine (IE), reason over data using the information in the knowledge graph and in the database of rules to allow discovery of new insights, to extract the requested information.
  • User Interface (UI), general term to describe a component that will change depending on the application.
Components selected for the expert system [3]

The choice of Grakn arose after a technical analysis considering the impacts on the project (timeline and success). In general, Grakn fulfils our purpose of building an intelligent system with an intelligent database to gather insights, recommend, and allow knowledge discovery. Grakn is fast to learn and to use as well. It covers all the main components requested to build intelligent systems in one single technology. It eases the automation of building knowledge graphs for the Graql data modelling.

Grakn selection for the expert system
Grakn basic schema for the Space Mission Design Knowledge Graph. Purple nodes are entities, green nodes are relationships.

DEA overview

smart-dog’s final output will be the space mission design knowledge graph stored in Grakn and ready to be reused through the graql reasoner and query language.

But smart-dog is a complementary part of the DEA project. The other fundamental part of the project is not described in details in this article, my colleague Audrey will take care of it in a future article and it will rely on Grakn as well. Her part is called smart-squid, the smart query interface of the DEA. Smart-squid will:

  • manage the different levels of uncertainties of the information and queries
  • analyse the context of the User queries to extract information from the Space Mission Design Knowledge Graph
  • post process the outputs extracted with additional analytics
  • provide the most reliable and useful type of output targeted for the user depending on the phase of the design considered, together with recommendation and related information to boost knowledge discovery
  • provide human-machine feedback loop to capture tacit knowledge and allow experts to continuously contribute to the DEA learning process

Audrey will also work in the integration of structured data in the Knowledge Graph. These data will be extracted from the mission design environment tools.


  • [2] F. Murdaca, A. Berquand, K. Kumar, A. Riccardi, T. Soares, S. Gerené, N. Brauer: “KNOWLEDGE-BASED INFORMATION EXTRACTION FROM DATASHEETS OF SPACE PARTS”, SECESA 2018.
  • [3] A. Berquand, F. Murdaca, A. Riccardi, T. Soares, S. Gerené, N. Brauer, K. Kumar: “Towards an Artificial Intelligence based Design Engineering Assistant for the Early Design of Space Missions”, IAC 2018.
  • [4] A. Al-Arfaj, A. Al-Salman: “Ontology Construction from Text: Challenges and Trends”, International Journal of Artificial Intelligence and Expert Systems (IJAE), 2015.
  • [5] S. Staab, R. Studer: “Handbook on Ontologies”, Springer 2015.


We will come back in the future with new articles to show updates on the DEA project, and more specifically on smart-dog and smart-squid. If you are interested in discovery more about the DEA project and discuss with us, don’t hesitate to contact us at:,,


PhD student



the Intelligent Computational Engineering Laboratory


Aerospace Centre of Excellence

Department of Mechanical & Aerospace Engineering

Strathclyde University

James Weir Building, 75 Montrose Street

G1 1XJ Glasgow


Creators of TypeDB and TypeQL