Week 5— FlashCards

İlkim Aydoğan

Published in

AIN311 Fall 2022 Projects

2 min readDec 25, 2022

Hi, we are a two-student group trying to create an ML model for their AIN311 course.

This is the fifth blog post of our project. Stay tuned for a new post every Sunday.

You can go to the fourth week from here.

Last week in our blog post we explained the basic BERT model without its complicated mathematical formulas. This week we started to implement our model and to our surprise too we changed a little bit of our project's path and decided to use T5 Transformers for our model.

T5 (Text-To-Text Transfer Transformer)

T5 is a state-of-the-art natural language processing model developed by Google. One of the reasons we decided to use T5 is that it is capable of performing a wide range of language tasks, including translation, summarization, question-answering, and text generation. It is based on the transformer architecture and uses a unified text-to-text format, which allows it to be trained on a large, diverse dataset and perform well on a variety of tasks.

T5 has achieved impressive results and has the potential to improve the performance of natural language processing systems in a variety of applications.

Why use T5 (BERT vs. T5)

Even though BERT is a model that is more specifically used for the task at hand we decided to use T5 because:

T5 is a larger model than BERT. T5 uses significantly more parameters than BERT. Even though these parameters make it more computationally expensive to train and deploy they also allow T5 to perform a wider range of tasks and achieve better results on many tasks. And for the sake of performance, we can give up on the training cost efficiency.
Both models use self-attention mechanisms to process input sequences and generate output sequences and also they both use the masked language model. So T5 being the newer transformer which shows a better performance in related benchmarks we decided it can be the more fitting model for our project.

The Implementation

We implemented our model using the hugging face library. Then fine-tuned it.

Our training arguments are as above.

training_args = TrainingArguments(output_dir="./models",
                                  per_device_train_batch_size=4,
                                  per_device_eval_batch_size=4,
                                  gradient_accumulation_steps=16,
                                  learning_rate=1e-4,
                                  num_train_epochs=7,
                                  logging_steps=100,
                                  run_name="flash-cards",
                                  evaluation_strategy="steps",
                                  save_steps=500,
                                  report_to="wandb",
                                  push_to_hub=True,
                                  push_to_hub_model_id="flash-cards")

Work Plan

This week we implemented T5 and fine-tuned it.

In the next week, we will explain our evaluation metrics.See you.

İlkim İclal Aydoğan

Görkem Kola