Recipe: Practical Transformers

Aggregate Intellect
Aggregate Intellect
2 min readNov 9, 2020

Creators: Suhas Pai, Nabila Abraham

Objective: Enable users to fine-tune transformers for their own NLP tasks

Audience Level: Intermediate

Main Concept: Transformer based architecture is the main concept you will learn through the following resources

Background Concept: You need to know attention mechanism in order to learn the main concept

Subsequent Concept: Once you know the main concept you can learn Longformer, DistillBert, TransformerXL

Resources

You should go through the following resources in the order that is provided:

  1. The Illustrated Transformer

Type: Blog post; Theory, Main Concept

Estimated time commitment: 60 mins

Why is this a good resource: This blog introduces an intuitive way to visualize transformers and the Query, Key and Value matrices

How to use this resource: Read the entire blog post, pay attention to the visuals

Instructor: Jay Alammar

Link: http://jalammar.github.io/illustrated-transformer/

2. [Transformer] Attention is All You Need

Type: Video; Theory, Main Concept

Estimated time commitment: 60 mins

Why is this a good resource: This video is a group discussion of transformer networks which should help you find answers to most frequently encountered confusions about this topic

How to use this resource: watch the entire video

Speaker: Joe Palermo

Date Created: Oct 2018

Link: https://ai.science/e/transformer-networks-attention-is-all-you-need--2018-10-22

3. PyTorch Transformers Tutorials

Type: GitHub Repo; Implementation, Main Concept

Estimated time commitment: 90 mins

Why is this a good resource: This is an easy-to-understand code tutorial implementing Transformers

How to use this resource: Run the notebooks

Language: Python

Creator: Abhi Mishra

Computational Resource Need: Medium

Repo Last Updated: 2020

Link: https://github.com/abhimishra91/transformers-tutorials

4. BERT: State of the Art NLP Model, Explained

Type: Blog Post; Application, Subsequent Concept

Estimated time commitment: 20 mins

Why is this a good resource: This blog post outlines some of the most important use cases that leverage. It also provides some good estimates about what kind of compute is necessary for using BERT in real life

How to use this resource: Read the post for different use cases of BERT

Authors: Rani Hover

Link: https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html

--

--