E20 : Chain-of-Table

Published in

Research Papers Summarized

4 min readApr 7, 2024

Dynamically planning, resulting in transformed tables that are used in the chain of reasoning steps help improve the reasoning abilities of LLM over tables for question and answering tasks.

Paper Name : Chain-of-Table : Evolving Tables in the reasoning chain for Table Understanding

Paper URL : https://arxiv.org/abs/2401.04398

Authors : Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii,
Jingbo Shang1 Chen-Yu Lee2 Tomas Pfister

Please find annotated paper here

Problem Statement :

Reasoning over text data requires understanding the semantics of the free form question, whereas reasoning over tabular data requires understanding the semantics of the free form text question and the semi-structured tabular data.
Existing solutions like Chain-of-Thought(CoT) treat reasoning over tabular data as generic reasoning problem like text data reasoning and solutions like program aided techniques generate SQLs based on user query to retrieve responses from table.
These solutions do not include dynamic planning and dynamic modification of tables or inclusion of modified tables in reasoning steps to generate the final answer.
Instead, they use few-shot examples to do single pass reasoning or they use few-shot examples to decompose the tables and perform reasoning based on pre-defined examples

Solution :

Using dynamic planning to transform the tables depending upon the complexity of the question, rather than using predefined examples for decomposing the table to generate reasoning steps
Using the transformed tables as part of chain of reasoning helps the model exhibit better reasoning capabilities than the existing methods

Approach :

Assume a question Q to be answered using the table T. The approach consists of 3 modules - dynamic planning, generating arguments, final query
Dynamic Prompting - Prompt an LLM (explanation about operations,T,Q) to identify the atomic operation from a pool of operations to be performed on a table based on the complexity of question and required response
The pool of atomic operations include - add column, select column, select row, group_by, sort_by
The atomic operation identified is appended to the chain of operations to be used during further processing.
Generating arguments - Prompt an LLM (T, atomic operation identified, Q) to identify the attributes required to execute the operation on the table and generate the transformed table
The operation and arguments identified are used to transform the table from T to T’. The above two steps are continued in iteration until (E) - end token is generated.
The final prompt to LLM contains (T’,Q) which generates the answer A.

Experimental Setup :

Datasets evaluated - WikiTQ,FeTaQA,TabFact
WikiTQ - table based question answering dataset with short span text answers
FeTaQA - table based question answering dataset with long free form text answers
TabFact - table based fact verification dataset
Models evaluated - PaLM2-S, GPT-3.5(turbo-16k-0613), LlaMA2–17B
Baselines considered - generic reasoning, program aided reasoning
Methods tested under generic reasoning include End-End QA, Few-shot prompting, Chain-of-Thought prompting
Methods tested under program aided reasoning include Text-to-SQL, Binder,Dater

Observations :

Chain-of-Table technique outperformed all 6 approaches under generic reasoning and program aided reasoning technique on WikiTQ and TabFact across all the LLMs used

PaLM2 Vs GPT3.5 Vs LLaMA2 on TabFact and WikiTQ dataset

Chain-of-Table technique outperformed other approaches on FeTaQA dataset as well but the difference was very marginal. This is attributed to the n_gram similarity used in metric like ROUGE - 1/2/L to measure the performance

Observations show that most of the data in WikiTQ and TabFact used 2–4 atomic operations to generate the final response

Length of operation chain Vs Data distribution

Results also show that the accuracy of Chain-of-Table on WikiTQ was better compared to Chain-of-Thought (generic reasoning), Dater (program-aided reasoning) for all lengths of chain of operations. The results were far bettwe when the length of operation chain was between 2 to 4

Observations also show that the performance of generic reasoning techniques and program-aided techniques over longer tables (>4000 tokens) drastically reduced compared to performance over smaller tables (<2000 tokens). On the other hand, even with Chain-of-Table there was degradation in performance but the reduction was not as drastic like other methods.

Conclusion:

Unlike text based reasoning where a single pass of reasoning step can help elicit reasoning capabilities in LLMs, tabular reasoning is a complex task as the system has to capture the semantics of free form query text and the semi-structured tabular form.
Results clearly indicate that rather than static few-shot examples or static table decomposing prompts, tabular reasoning approaches require dynamic planning of operations on table based on the complexity of questions.
It also shows that using the transformed table in chain of reasoning steps help the technique achieve better performance compared to other techniques.

E20 : Chain-of-Table

Written by Praveen Thenraj