My Arty Writer Uncle Was Unfazed About Transformers— Till Yesterday

Ed Springer

Published in

ThoughtGym

3 min readMar 26, 2023

A jargon-less conversation about models, transformers, attention, etc

My uncle is a writer and a voracious reader, as writers are.

He has been asking me about the “new tech toy that would make writers redundant”.

“It feels like your Artificial Intelligence is finally coming home to roost”, he said.

Indeed. He was comfortably numb to Machine Learning or Computer Vision use cases.

Till now.

Models as single-purpose utilities

“As a start, transformers are pre-trained models.

Let us say if there were only two languages (A and B) spoken in the world, there would be two models — one to translate from A to B, and another to translate from B to A”

What are models, though?

“Consider a model like an expert in one thing. Its entire life has been devoted to getting continuously better at one goal — given input in a format that it understands, it gives you the best possible outcome in a pre-agreed format”.

Ah ok. I get it. All that a “translation model” can do is translate, and get increasingly better at it.

“Almost. All that a translation model A to B can do is translate from language A to B, and get increasingly better at it.

There may be similar models for rewording, simplifying, summarising, or editing.

Pre-training

What does pre-trained mean?

“It just means that these models have been given a substantial amount of texts to read from and build their knowledge base. Pre-training helps these models understand the structure, semantics, words, word associations, synonyms”

Hmm. He nodded admiringly

“Pre-training also enables the models to respond faster”

Why are they called “Transformers”?

That explains models and pre-training. Why are they called transformers?, Uncle asked.

“Transformers transform text or image information (for now) into other forms of representation. The representation may be in the form of restructuring, reading off an image, summaries, translations, etc”

“There have been means of transforming text or image information into other forms of representation before — but they were not effective”.

What do you mean?

Uncle speaks another language, (let’s say Language F), that belongs to the Indo-Germanic family. So I started from there.

“Let’s say we had to translate the below text to Language F. How would the word structure look in language F?”

“Last Sunday, I went to a wedding and had lunch there”.

Uncle thought for a moment; and spelled it out, slowly, word for word.

Last Sunday, I a wedding went, after that, I from there lunch ate.

“Do you see the issue? If words are processed sequentially, the result may not be useful and accurate. This is one of the things that constrained language models before Transformers”

Of course. What is new about Transformers then? It consumes text multiple words at a time? Uncle asked.

Transformers use a technique called Self-Attention to process information.

Self-Attention

Self-attention, er what?

“Self-attention is very similar to the attention we give when we read. Our attention approach differs whether we are doing a word-to-word translation or summarisation. In Self-Attention, the input words are not processed one after another, but they are processed in various lengths of words, based on the processing objective.

“The model mathematically, figures outs what parts of a sequence of words are important — while processing a certain word in the same sequence.

I could see my Uncle deep in thought. Partly relieved that he has had a successful career as a writer without the threat posed by AI; and somewhat pondering over what he would be, if not a writer.

My Arty Writer Uncle Was Unfazed About Transformers— Till Yesterday

Models as single-purpose utilities

Pre-training

Why are they called “Transformers”?

Self-Attention

Written by Ed Springer