A Brief Introduction to Question Answering from Tables

What is QA from Tables?

Question answering from tables describes a task that takes a question and a table (some structured text data) as input and returns the answer to the question as output.

Fig 1: Example (question, table, answer) pair from the SQA dataset

QA from tables differs from general question answering over free-form text in that the model can use the structure of the tabular data to inform its response. For example, the model might need to find the highest value in a particular column or the row with the largest difference between two column values.


At first glance, QA from tables might seem like a strangely narrow field of study. In truth, table-specific QA models represent a highly valuable and widely applicable branch of NLP research.

Models capable of answering questions over tables make possible Natural Language Interface to Database (NLIDB) systems, which replace complex structured queries (ie. SQL queries) with queries in natural language. Such a system would allow business decision-makers to make database queries without going through the costly and time-consuming process of involving technical intermediaries. NLIDB has the potential to dramatically improve access to the increasingly large stores of structured data collected by businesses and other organizations.

QA from tables models can also be used as part of multi-modal question answering systems that consider various sources of information. Attempting to answer many questions about a Wikipedia page, for example, will be very difficult without an understanding of how to read the tables embedded in that page.

These two applications alone make QA from tables both a valuable standalone product and a key element of future holistic QA systems.

Problem Variations

There are a number of variations on the archetypal QA from tables system. Some models are designed to output structured queries, which are written in a query language like SQL and should produce the correct answer to the question when executed on the table. Other models are designed to be end-to-end; they directly produce a list of cells (and a possible aggregation operator like SUM) without an intermediate program step.

Another distinction is made between weakly supervised models, which are trained only on the correct final answer, and strongly supervised models, which are allowed to see the gold-label program (or aggregation operators) that produced the final answer. Strongly supervised data is much more expensive to collect, as it requires human annotators to write structured queries for each question. Weakly supervised data is more plentiful but suffers from the problem of sparse rewards: unless the system gets the intermediate program exactly right, it is unlikely to produce anything close to the correct answer. Therefore it can be hard to know how the model should be updated during training.

Current State of the Art

The current state of the art in single-table QA is TAPAS, a model published by Google research in 2020. TAPAS is a weakly supervised, end-to-end model, meaning it selects cells and an aggregation operator directly (without the use of an intermediate structured program) and is trained on only the correct final answer to each question. The system is based on the pre-trained BERT encoder, with several variations.

The model takes as an input the sequence <question> + [SEP] + <flattened table>. In order to preserve information about the structure of the table after it has been flattened, additional embeddings are added on top of the standard BERT positional embeddings.

  • Column embeddings denote the column in which the cell initially resided.
  • Row embeddings denote the column in which the cell initially resided.
  • Rank embeddings denote the rank of the cell in its column if the column contains numerical data (for example, a rank of 2 indicates that a value is the second smallest value in its column).

Fig 2: TAPAS embeddings

The BERT model is additionally modified to include output layers that select table cells and an optional aggregation operator.

Fig 3: TAPAS architecture

This architecture has proven relatively successful, achieving state-of-the-art results on the WikiSQL and SQA datasets. It also demonstrates some success at transfer learning, with TAPAS models pretrained on the SQA dataset achieving significant improvements on the WikiTQ dataset.


Recent research has explored a number of interesting extensions to the basic single-table QA problem. One such extension is Open Domain QA, in which tables are not given alongside questions. Instead, the model must select the appropriate table from a large corpus before answering the question.

Because it is prohibitively costly to run a QA model like TAPAS over each table in a large corpus, most open domain models follow a two-step approach in which a retriever model first selects a subset of likely tables and then a QA model is applied to each of those tables individually. In Open Domain Question Answering over Tables via Dense Retrieval, the authors present a retrieval method called Dense Table Retrieval (DTR). DTR uses TAPAS to encode both the question q and each table, title pair (Title(T), T), then selects the table vectors which are closest to the question vector. DTR is pretrained on a variant of the inverse cloze task where it tries to predict the Wikipedia text span which originally contained a masked table. It is then trained on (question, table) pairs from a table-specific subset of the NQ-Questions dataset.

Once a subset of tables has been selected by DTR, TAPAS is applied to each of them and the answer with the highest score is selected.

Another extension to the QA from tables problem concerns multimodal question answering: question answering over a variety of data modes, including free text, tables, and images. The model must combine information from these sources to produce the correct answer to the given question.

Fig 4: Example question from MMQA

The 2021 paper MultimodalQA: Complex Question Answering over Tables introduces MMQA, the first multimodal QA dataset. The dataset is built by automatically linking Wikipedia tables to associated images and freetext, then building composite questions using a variety of standard formats (ie. Compose(TableQ, TextQ)).

The authors of the paper also introduce a model for answering questions in this dataset. First, single modality models are trained for images, tables, and text. The table QA model, for instance, is based on TAPAS. In order to combine these answers, they use a model they call ImplicitDecomp. A classifier is first used to predict the question format (ie. Compose(TableQ, TextQ)) and then the ImplicitDecomp model is repeatedly given the following inputs: (question, question type, hop number, context, answers from previous hops). Note that the question is not explicitly decomposed into parts, hence the Implicit in its name.

This model scores 51.7% on the F1 metric over multimodal questions. This significantly underperforms the human baseline (90.1%) but represents a strong starting point on a highly complex task.


Question answering over tables is a highly promising area of research with the potential to revolutionize both human-database interaction and general-purpose QA. The current state-of-the-art in single-table QA is based on the BERT encoder, with the addition of embeddings to encode the table structure. Further applications build on this system to produce models that can answer questions over an open domain of input tables or across multiple data modes.




Love podcasts or audiobooks? Learn on the go with our new app.

General Game-Playing With Monte Carlo Tree Search

Classifying League of Legends Champion Archetypes with Neural Networks

Interpretable Machine Learning with Serg Masis

Predict car gas mileage — Machine Learning Regression prediction problem

MAFAT Challenge: 1st Place Solution

xResNet From Scratch in Pytorch

CluStream — A Framework for Clustering Evolving Data Streams

Comprehending Models of Comprehension: An Overview

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eli Sage-Martinson

Eli Sage-Martinson

Yale CS '22

More from Medium

Cogs and the Machine

Do you know what a relapse is?

Year of the Tiger

What 6 Years of Success in a Global Takedown Operation Looks Like, and How You Can Do It, Too