TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

What is Query, Key, and Value (QKV) in the Transformer Architecture and Why Are They Used?

An analysis of the intuition behind the notion of Key, Query, and Value in Transformer architecture and why is it used.

Ebrahim Pichka
TDS Archive
Published in
10 min readOct 5, 2023

--

Image by author — generated by Midjourney

TL;DR

QKV is used to mimic a procedure, in order to find the pairwise similarity measures between tokens. Further, these similarity measures act as weights, to get a average of tokens’ meanings, to make them contextualized.

Query acts as the part of a token’s meaning that is . Key acts as the part of all other tokens’ meanings . This comparison is done by the dot-product of their vectors which results in the pairwise similarity measures, which is then turned into pairwise weights by normalizing (i.e. Softmax). Value is the part of a token’s meaning that is combined in the end using the found weights.

Lastly, you might have noticed me saying each of QKV being “part of a token’s meaning”. By that, I mean each of the QKV is obtained by a of a token’s initial embedding. Hence, each can be to from the embedding. And by employing, , we allow multiple different

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Ebrahim Pichka
Ebrahim Pichka

Written by Ebrahim Pichka

Engineering Graduate Student & Research Assistant, interested in ML, and Optimization. https://epichka.com/

Responses (2)