Journey to BERT — Before Bert Era

Goknur Ercan
Sahibinden Technology
3 min readJan 29, 2024

This is the first part of the BERT Algorithm article and continues as Journey to BERT — After Bert Era.

Bert (i.e. Bidirectional Encoder Representations from Transformers) is an open source framework for natural language processing (NLP) developed by Google. In order to understand Bert and its capabilities we need to consider the problem and earlier approaches to solve it. Since, I’m working as a Software Engineer focused on SEO, I will explain in the view of SEO and Search Engines (such as Google).

Natural Language Processing and Earlier Approaches

Researchers and developers have tried to understand human interactions and communications for many years. There are many approaches to understand what is the semantic and contextual meaning behind a simple text. For example, Google and other search engines are trying to understand the user query and find the relevant web pages according to it. Google holds information on billions of web pages and finds relevant ones according to user query.

The most basic approach is to calculate the number of query terms inside the web page. For each term inside the query, we can calculate how many occurrences we found in a page. If terms are appearing more inside a page, we can conclude that the page is relevant to the query. The idea behind this approach is called measuring the term frequency.

However, some words appear in most of the web pages (ex. is, a, the) and the user query might include such words. A more elegant approach, TF-IDF (Term frequency — inverse document frequency) is used for this problem. The inverse document frequency defines the importance of each term for the collection (in our case the whole WEB).

TF and TF-IDF can be considered as naïve approaches. Although these methods are still in use, there are some steps between the Bert and earlier approaches. To make things easier, we can jump to the word embeddings.

Word Embeddings

The word embeddings are the ground-breaking methodologies for the NLP area. Most famous word embedding models are word2vec and GloVe. The idea behind word embeddings is to find a representation to each word in a vector space or in other words numerically represent each word in order to make calculations.

Word Embeddings: Explaining Words in Vector Space

If we check the example representations above, we can see that the distance between “man” and “woman” is equal to the distance between “king” and “queen”. This allows us to make calculations between words. If we have the vector representation (numerical representation) of “Ankara” and we need to find the capital of the “Russia”, we can apply the following equation:

“Ankara” — “Turkey” + “Russia”

The vector we find after the calculation will be close to the vector of the “Moscow” word.

Google and other search engines used to apply this approach to improve search functionality until the Transformers became popular.

Transformers

The problem in the word embeddings is the vector representation of a word is not defined by the context of the sentence. For example, the word “bank” has the same embedding for both “bank account” and “river bank” in which the definition of the word is completely different. Transformers (again introduced by Google) solves this problem by also checking the context. Bert is using the transformers in order to solve this huge problem for the Search Engines.

References

1- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, May 24). Bert: Pre-training of deep bidirectional Transformers for language understanding. arXiv.org. https://arxiv.org/abs/1810.04805

2- Hugging face — the AI community building the future. Hugging Face –. (n.d.). https://huggingface.co/

3- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, September 7). Efficient estimation of word representations in vector space. arXiv.org. https://arxiv.org/abs/1301.3781

4- Pennington, J., Socher, R., & Manning, C. D. (n.d.). Glove: Global vectors for word representation. ACL Anthology. https://aclanthology.org/D14-1162/

--

--