Text2SQL — Part 4: State-Of-The-Art Models

Understanding State-Of-The-Art models in Text2SQL domain

Devshree Patel

Follow

Published in

VisionWizard

5 min readAug 2, 2020

--

In the last blog, we explored a few baseline models used in Text2SQL domain. These models with relatively less architectural complexities are unable to capture the larger the picture in Text2SQL task. However, these models prove to be beneficent for constructing and designing of new complex models.

Several different complex models like RAT-SQL make use of different concepts in order to solve the problems in Text2SQL task.

1. Text2SQL — Part 1: Introduction
2. Text2SQL — Part 2: Datasets
3. Text2SQL — Part 3: Baseline Models
4. Text2SQL — Part 4: State-of-the-art Models
5.Text2SQL — Part 5: Final Insights

Recent models challenge the natural language to SQL query task using SPIDER dataset. In order to learn more about SPIDER, refer my 2nd article on datasets.

Some state of the art models to be covered in this blog:

IRNet
EditSQL
RAT-SQL

Figure 1: A challenging text-to-SQL task from the Spider dataset from [3]

1. IRNet

A neural approach called Intermediate Representation(IRNet) for complex and cross-domain Text-to-SQL aims to address 2 main challenges in Text2SQL tasks.
The challenges include the mismatch between expressed intents in natural language and the challenge in predicting columns caused by a greater number of out-of-domain words.
IRNet decomposes natural language into 3 phases instead of end-to-end synthesizing of SQL Query.
During the first phase, the schema linking process over a database schema and a question is performed.
IRNet employs SemQL domain-specific language, which serves as an intermediate representation between SQL and natural language.

Figure 2: An illustrative example of SemSQL from [1]

The model includes Natural Language (NL) encoder, Schema Encoder and Decoder.

Figure 3: An overview of the neural model proposed in [1]

Every component of the model provides a different function for accomplishing Text2SQL task.
NL encoder takes the input of natural language and encodes it into an embedding vector. These embedding vectors are there used to construct hidden states using a bi-directional LSTM.
Schema encoder takes database schema as input and outputs representations for columns and tables.
Finally, the decoder is used to synthesize SemQL queries using a context-free grammar.
IRNet yields a significant improvement of 19.5% over previous benchmark models and achieves 46.7% accuracy on SPIDER dataset.
Incorporating IRNet with BERT shows a substantial improvement in performance and yields 54.7% accuracy.

Figure 4: F1 scores of component matching of SyntaxSQLNet, SyntaxSQLNet(BERT), IRNet and IRNet(BERT) on the test set from [1]

2. EditSQL

EditSQL focusses on the cross-domain context-dependent text-to-SQL task.
The authors exploit the fact that adjacent natural language questions are dependent and corresponding SQL queries overlap.
They utilise this dependency by editing the previously predicted query to improve the generation quality.
The editing mechanism takes the input of SQL as sequences and reuses generation results at the token level.
In order to deal with complex tables in different domains, an utterance-table encoder and a table-aware decoder are used to incorporate the context of the natural language and the schema.

Figure 5: The model architecture of EditSQL from [2]

The utterance-table encoder encodes user utterance and table schema. A bi-LSTM is used to encode tokens of utterances.
For each token, attention weighed an average of column header embedding is used to obtain the most relevant columns.

Figure 6,7: An example of user utterance and column headers and Utterance Encoder from [2].

An attention layer is incorporated to capture the relationship between the table schema and utterance.
For capturing information across different utterances, an interaction-level decoder is built on the top of the utterance-level encoder.
The last step of the conversion includes generation of SQL queries by incorporating interaction history, table schema and user utterance using an LSTM decoder.

The model is evaluated on SParC dataset, a large scale cross-domain context-dependent semantic parsing dataset developed from SPIDER.
The model performs well on both SPIDER and SParC but fails to surpass the previous state of the art model- IRNet.
The model achieves 32.9% accuracy in cross-domain text2SQL generation. Additionally, utilising BERT embedding yields significant improvement in accuracy, achieving 53.4%.

3. RAT-SQL

One of the main challenges in translating natural language questions into SQL queries is to generalize to unseen database schemas.
The generalisation depends on encoding the database relations in an accessible way and modelling alignment between relevant database columns in the query.
The proposed framework is based on the relation-aware self-attention mechanism which enables address schema encoding, feature representation and schema linking within a text2SQL encoder.
To get a quick overview of the encoder-decoder structure of RAT-SQL, go through the flow chart below.

Figure 9: A flow chart of RAT-SQL model (Source: Author)

RAT-SQL yields a significant improvement of 8.7% over previous benchmark models and achieves 57.2% accuracy on SPIDER dataset.

Figure 10: Accuracy on the Spider development and test sets, compared to the other approaches at the top of the dataset leaderboard as of May 1st, 2020 from [3]

Incorporating RAT-SQL with BERT shows a substantial improvement in performance and yields 65.6% accuracy.

Figure 11: Accuracy on the Spider development and test sets, by difficulty from [3]

Currently, RAT-SQL+BERT tops the leaderboard of the benchmark dataset-SPIDER!

In this article, we explored a few state-of-the-art models in the field of Text2SQL.
In the next part of the series, we will learn the famous AllenNLP library along with some final insights.

References

[1] Guo, Jiaqi, et al. “Towards complex text-to-SQL in the cross-domain database with intermediate representation.” arXiv preprint arXiv:1905.08205 (2019).

[2] Zhang, Rui, et al. “Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions.” arXiv preprint arXiv:1909.00786 (2019).

[3] Wang, Bailin, et al. “Rat-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers.” arXiv preprint arXiv:1911.04942 (2019).