QueryCraft: Fine-tuning to your Data for NL2SQL

Step 3. Deep dive into finetuning strategies of QueryCraft

Published in

Towards Generative AI

5 min readJun 10, 2024

In this blog post, we will delve into the process of fine-tuning LLMs specifically for Text-to-SQL tasks. We introduce the fine-tuning component within the QueryCraft. We’ll cover what fine-tuning is, why it’s necessary, and the key parameters and settings involved in the process.

What is Fine-Tuning?

Fine-tuning refers to the process of taking a pre-trained language model and adapting it to perform a specific task or set of tasks by further training it on domain-specific data. Instead of training the model from scratch, fine-tuning leverages the knowledge already encoded in the pre-trained model, allowing for quicker and more efficient training.

Text-to-SQL is a challenging task in NLP that involves converting natural language questions into SQL queries, which are used to retrieve information from relational databases. Fine-tuning LLMs for Text-to-SQL tasks allows for more accurate and efficient query generation, enabling better interaction with databases and improved performance in applications like virtual assistants and data analysis tools.

Parameter Settings for QueryCraft Fine-Tuning

We use the Hugging Face Transformers’ Trainer class to fine-tune any of its pre-trained models on our dataset. We can configure the training using several parameters of the trainer class. Some key parameters that we use in our implementations are explained below.

1. Data collators

Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of the same type as the elements of train_dataset or eval_dataset.

To be able to build batches, data collators may apply some processing (like padding). Some of them (like DataCollatorForLanguageModeling) also apply some random data augmentation (like random masking) on the formed batch. In our framework, the user can select either of the three data collators:

DefaultDataCollator: Very simple data collator that simply collates batches of dict-like objects and performs special handling for potential keys named:

label: handles a single value (int or float) per object
label_ids: handles a list of values per object

Does not do any additional preprocessing: property names of the input object will be used as corresponding inputs to the model.

DataCollatorForSeq2Seq: Data collator that will dynamically pad the inputs received, as well as the labels.

DataCollatorForLanguageModeling: Data collator used for language modeling. Inputs are dynamically padded to the maximum length of a batch if they are not all of the same length.

2. Gradient Accumulation

Gradient accumulation is a technique used during training in neural networks to overcome memory constraints, particularly when dealing with large batch sizes or limited GPU memory. Instead of updating the model’s parameters after processing each batch, gradient accumulation involves accumulating gradients across multiple batches before performing a single parameter update.

By accumulating gradients over multiple batches, gradient accumulation allows for more stable updates, especially when dealing with large batch sizes that may not fit into GPU memory. This enables training with larger effective batch sizes without requiring additional memory.

The number of batches over which gradients are accumulated, known as the gradient accumulation step, is a hyperparameter that can be adjusted based on memory constraints and performance considerations. Typically, larger accumulation steps lead to more efficient memory usage but may result in slower convergence due to less frequent parameter updates.

3. LoRA Parameters

LoRA (Layer-wise Relevance Adjustment) introduces several key parameters and concepts crucial for fine-tuning large language models (LLMs) efficiently. Let’s delve into each of these:

Rank (r): The rank parameter determines the dimensionality of the low-rank approximation used in the decomposition of weight matrices. Higher rank values result in a more accurate representation but may increase computational complexity and memory requirements. Selecting an appropriate rank involves balancing model performance with computational constraints.
Dropout: Dropout is a regularization technique used to prevent overfitting during training by randomly setting a fraction of input units to zero. It helps improve model generalization by encouraging robustness and reducing reliance on specific features. The dropout rate determines the proportion of units to drop during each training iteration.
Alpha (α): Alpha refers to the learning rate, a critical hyperparameter in the optimization process that controls the step size used for updating model parameters during training. A well-tuned learning rate ensures stable convergence and efficient optimization. Adjustments to alpha may be necessary based on the specific task, dataset, and model architecture.
Target Modules: Target modules specify the components of the pre-trained model that are fine-tuned during the training process. Depending on the task and desired level of adaptation, target modules may include certain layers or parameters while keeping others frozen. Target module selection is crucial for optimal performance and efficiency in fine-tuning LLMs.

QueryCraft Implementation & Preset for NL2SQL

QueryCraft Framework presets the above parameters for the best configuration needed for NL2SQL task to help non-expert data scientists.

The finetune service in QueryCraft Repository provides an easy way to finetune LLMs.

This service provides a way to finetune pre-trained LLM on a given dataset. Our implementation supports PEFT-based — LoRA and QLoRA techniques for finetuning.

To use the finetune service the user needs to bring :

Training dataset in the form of a CSV file with columns — question, query, and context.
A prompt template in the form of a text file

In conclusion, fine-tuning large language models (LLMs) for text-to-SQL tasks holds significant promise for enhancing the accuracy and efficiency of database query generation. By leveraging techniques such as LoRA and QLoRA one can finetune LLMs in low compute environment.

Furthermore, the careful selection of hyperparameters such as rank, dropout, learning rate, and target modules is crucial for achieving optimal results during fine-tuning. These parameters influence the model’s ability to learn from the data while mitigating issues such as overfitting and optimization instability.

Ready to take your NL2SQL models to the next level? Explore QueryCraft’s pipeline.

Follow Towards Generative AI for more on the latest advancements in AI.