Photo by Diego PH on Unsplash

E20 : Chain-of-Table

Praveen Thenraj
Research Papers Summarized
4 min readApr 7, 2024

--

Dynamically planning, resulting in transformed tables that are used in the chain of reasoning steps help improve the reasoning abilities of LLM over tables for question and answering tasks.

Paper Name : Chain-of-Table : Evolving Tables in the reasoning chain for Table Understanding

Paper URL : https://arxiv.org/abs/2401.04398

Authors : Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii,
Jingbo Shang1 Chen-Yu Lee2 Tomas Pfister

Please find annotated paper here

Problem Statement :

  • Reasoning over text data requires understanding the semantics of the free form question, whereas reasoning over tabular data requires understanding the semantics of the free form text question and the semi-structured tabular data.
  • Existing solutions like Chain-of-Thought(CoT) treat reasoning over tabular data as generic reasoning problem like text data reasoning and solutions like program aided techniques generate SQLs based on user query to retrieve responses from table.
  • These solutions do not include dynamic planning and dynamic modification of tables or inclusion of modified tables in reasoning steps to generate the final answer.
  • Instead, they use few-shot examples to do single pass reasoning or they use few-shot examples to decompose the tables and perform reasoning based on pre-defined examples

Solution :

  • Using dynamic planning to transform the tables depending upon the complexity of the question, rather than using predefined examples for decomposing the table to generate reasoning steps
  • Using the transformed tables as part of chain of reasoning helps the model exhibit better reasoning capabilities than the existing methods
Chain-of-Table reasoning steps

Approach :

  • Assume a question Q to be answered using the table T. The approach consists of 3 modules - dynamic planning, generating arguments, final query
  • Dynamic Prompting - Prompt an LLM (explanation about operations,T,Q) to identify the atomic operation from a pool of operations to be performed on a table based on the complexity of question and required response
  • The pool of atomic operations include - add column, select column, select row, group_by, sort_by
  • The atomic operation identified is appended to the chain of operations to be used during further processing.
  • Generating arguments - Prompt an LLM (T, atomic operation identified, Q) to identify the attributes required to execute the operation on the table and generate the transformed table
  • The operation and arguments identified are used to transform the table from T to T’. The above two steps are continued in iteration until (E) - end token is generated.
  • The final prompt to LLM contains (T’,Q) which generates the answer A.

Experimental Setup :

  • Datasets evaluated - WikiTQ,FeTaQA,TabFact
    WikiTQ - table based question answering dataset with short span text answers
    FeTaQA - table based question answering dataset with long free form text answers
    TabFact - table based fact verification dataset
  • Models evaluated - PaLM2-S, GPT-3.5(turbo-16k-0613), LlaMA2–17B
  • Baselines considered - generic reasoning, program aided reasoning
  • Methods tested under generic reasoning include End-End QA, Few-shot prompting, Chain-of-Thought prompting
  • Methods tested under program aided reasoning include Text-to-SQL, Binder,Dater

Observations :

  • Chain-of-Table technique outperformed all 6 approaches under generic reasoning and program aided reasoning technique on WikiTQ and TabFact across all the LLMs used
PaLM2 Vs GPT3.5 Vs LLaMA2 on TabFact and WikiTQ dataset
  • Chain-of-Table technique outperformed other approaches on FeTaQA dataset as well but the difference was very marginal. This is attributed to the n_gram similarity used in metric like ROUGE - 1/2/L to measure the performance
  • Observations show that most of the data in WikiTQ and TabFact used 2–4 atomic operations to generate the final response
Length of operation chain Vs Data distribution
  • Results also show that the accuracy of Chain-of-Table on WikiTQ was better compared to Chain-of-Thought (generic reasoning), Dater (program-aided reasoning) for all lengths of chain of operations. The results were far bettwe when the length of operation chain was between 2 to 4
  • Observations also show that the performance of generic reasoning techniques and program-aided techniques over longer tables (>4000 tokens) drastically reduced compared to performance over smaller tables (<2000 tokens). On the other hand, even with Chain-of-Table there was degradation in performance but the reduction was not as drastic like other methods.

Conclusion:

  • Unlike text based reasoning where a single pass of reasoning step can help elicit reasoning capabilities in LLMs, tabular reasoning is a complex task as the system has to capture the semantics of free form query text and the semi-structured tabular form.
  • Results clearly indicate that rather than static few-shot examples or static table decomposing prompts, tabular reasoning approaches require dynamic planning of operations on table based on the complexity of questions.
  • It also shows that using the transformed table in chain of reasoning steps help the technique achieve better performance compared to other techniques.

--

--