Understanding BERT Transformer: Attention isn’t all you need

A parsing/composition framework for understanding Transformers

Damien Sileo
Feb 26, 2019 · 9 min read
Image for post

Why BERT matters

Image for post
Image for post

A framework for language understanding: parsing/composition

Image for post
Image for post
A constituency-based parse tree of the sentence “Bart watched a squirrel with binoculars”
Image for post
Image for post
Representations of “white wine” and “white cat” in a two-dimensional semantic space (with color dimensions)
Image for post
Image for post
Another constituency-based parse tree of the sentence “Bart watched a squirrel with binoculars”

How BERT implements parsing/composition

Image for post
Image for post
A transformer block, seen as successive parsing and composition steps

Attention as a parsing step

Image for post
Image for post
Visualization of attention values on layer 0 head #1, for the token “it”.
Image for post
Image for post
Visualization of attention values on layer 2 head #1; which seems to pair related tokens
Image for post
Image for post
Visualization of attention values on layer 3 head #11; where some tokens seem to attend to specific central words (e.g. have, keep)
Image for post
Image for post
Visualization of attention values on layer 5 head #6; where combinations seem to be more focused (we, have), (if, we), (keep, up) (get, angry)
Image for post
Image for post
How several layers of attention can represent tree structures
Image for post
Image for post
A more realistic look of attention values in BERT
Image for post
Image for post
Coreference resolution occurring in head #0 of layer 6
Image for post
Image for post
Each word attends all other words in a sentence. This might allow a rough contextualization of each word.

The composition phase

Image for post
Image for post
How attentions heads could be used to pave the way for specific composition such as adjective/nouns
Image for post
Image for post
Disambiguation as a composition

Wrapping up

References

synapse_dev

Synapse Développement — Chatbots/NLP experts

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store