AI and Natural Language Processing and Understanding for Space Applications at ESA

Part II: Answering questions about the design of space missions and spacecraft concepts

6 min readNov 2, 2022

By José Manuel Gómez-Pérez, Andrés García-Silva, Rosemarie Leone, Mirko Albani, Moritz Fontaine, Charles Poncet, Leopold Summerer, Alessandro Donati, Ilaria Roma, Stefano Scaglioni

This post is a brief overview of a paper that is currently under review in a journal (see preprint here) where we describe the joint work between ESA and expert.ai to bring recent advances in NLP to the space domain.

We have split the post in several parts:

Part I: A methodological framework to develop NLP-based applications for space documents
Part II: Answering questions about the design of space missions and spacecraft concepts
Part III: Generating quizzes to support training on quality management and assurance in space science and engineering
Part IV: Information extraction for Long-Term Data Preservation in space
Part V: Assisted evaluation of the innovation potential of OSIP ideas

Introduction

CDF studies establish the technical, programmatic, and economic feasibility of ESA’s endeavors ahead of industrial development. Typically, a CDF report is a long (200 to 300 pages) document in English that covers a large variety of technical topics.

Finding specific pieces of information in such documents using traditional search engines is a cumbersome and prone-to-error task. In this case we propose to build a question answering system (SpaceQA) to answer questions about facts in CDF studies.

QA systems that aim to answer a natural language question against a collection of text documents are known as Open-domain QA. Based on recent advances in neural reading comprehension, open-domain QA systems have experienced an accelerated evolution, where complex pipelines have been replaced with modern approaches that combine information retrieval and neural reading comprehension for passage retrieval and question answering, respectively.

Analysis

This case study falls in the category of comprehension-related problems. To increase the user’s confidence in the system some ability to justify why a particular answer is proposed for a given question is required. As we will see, this is addressed by providing the user with a combination of the confidence score about a particular answer produced by the question answering model and the visualization of the actual document context where the answer appears, contributing to the plausibility of the answer.

In this case, we count with a labeled dataset for the question answering task (SQuAD). However, SQuAD is not domain-specific. Similarly, the state of the art in NLP also offers (general-purpose) pre-trained language models that can be fine-tuned over such data.

To model this problem following a knowledge-based approach, e.g. as a rule-based production system, we would need to anticipate all the questions that a potential user could pose to the system, which is not feasible. Given the combination of all these factors, we opt for a machine learning-based approach with transformer language models at its core.

SpaceQA system architecture

SpaceQA adopts a two-stage retriever-reader architecture (see Figure 1) consisting of: i) a passage retriever component that finds the passages that may contain an answer to the question from a collection of CDF reports and ii) a neural reader component that extracts the answer from some of such candidate passages.

For the retriever we evaluate different methods including traditional sparse vector space methods based on TF-IDF, BM25 or cosine similarity, as well as dense representations using bi-encoders, like Dense Passage Retrieval (DPR), ColBERT and CoCondenser.

The reader is based on state-of-the-art reading comprehension models built on modern transformer architectures. Once a reduced set of the top-k potential passages have been identified by the retriever, the reader attempts to spot the answer to the question as text spans from any of the passages, assigning a score to each of the extracted candidate spans, ranking the set of potential answers.

Our first candidate reader model is based on a RoBERTa model fine-tuned on SQuAD2.0. Since RoBERTa was pre-trained on a general-purpose corpus, there could be vocabulary mismatch between the set of questions about space mission design in our evaluation dataset and the knowledge encoded in the language model that could affect performance. To bridge that potential gap, we also evaluate SpaceRoBERTa, a version of RoBERTa pre-trained on documents from space science and engineering.

We index the text from the passages extracted from the CDF reports in Elasticsearch. We also use FAISS, an efficient library for similarity search and clustering of dense vectors, to index the passage according to the retriever representation. .

Evaluation

To evaluate SpaceQA we use a manually crafted dataset of factual questions produced by ESA,. Such a test set contains 60 questions, answers, and corresponding paragraphs from CDF reports. Tables 1 and 2 show the evaluation results for the retrievers and readers.

We build an open-domain QA system using the retriever and reader that performed best in our evaluation: ColBERT as retriever and RoBERTa-base fine-tuned on SQuAD2.0 as reader.

The SpaceQA system

Figure 1 shows a screenshot of the SpaceQA web application. When the user makes a question, the system displays the answer with the highest score and the passage it was extracted from. To support the plausibility of the answer produced by the model, we highlight the text span corresponding to it in the passage where it was extracted from, providing the user with its context. We also display and link the source document of the passage and the answer score, obtained by multiplying the probabilities of the span start and end tokens generated by the reader. If SpaceQA identifies other possible answers, they are displayed ranked by score under the option “other possible answers”.

In table 3 we show some example questions and the answers to those questions produced by SpaceQA. The system deals effectively with different types of wh-questions (what, which, where, when, why, and how), and provides appropriate answers in the form of instruments like rockets, units of measure, descriptions, locations, things, and time periods.

Nevertheless, there are questions for which the system does not provide an answer, or the answer is wrong. While some of these are poorly specified questions lacking details that can help to increase the confidence of the reader, others are just not properly answered.

About expert.ai

Expert.ai is a leading company in how to apply artificial intelligence to text with human-like understanding of context and intent.

We have 300+ proven deployments of natural language solutions across insurance, financial services and media leveraging our expert.ai Platform technology. Our platform and solutions are built with out of the box knowledge models to make you ‘smarter from the start’ and get to production faster. Our Hybrid AI and natural language understanding (NLU) approach accelerates the development of highly accurate, custom, and easily explainable natural language solutions.

https://www.expert.ai/