TTEs vs. Traditional Embeddings: What’s the Difference?

Mastan Pasha Shaik
asurion-product-development
5 min readAug 16, 2023

Word embeddings are a type of machine learning technique that is used to represent words as vectors. This allows computers to understand the meaning of words and their relationships with each other. There are two main types of word embeddings: TTEs and traditional embeddings.

TTEs (Transformer-based Token Embeddings)

TTEs are trained on a large corpus of text using a transformer neural network architecture. This allows them to capture more complex relationships between words than traditional embeddings. TTEs are also contextualized, which means that they take into account the context in which a word is used. This makes them more accurate for tasks that require understanding the context of a word, such as natural language inference.

Example:

The word “king” can be represented as a TTE vector with 100 dimensions. Each dimension of the vector represents a different aspect of the meaning of the word “king.” For example, one dimension might represent the concept of royalty, another dimension might represent the concept of maleness, and another dimension might represent the concept of authority.

The TTE vector for the word “king” would be close to the TTE vectors for other words that are related to royalty, such as “queen” and “prince.” It would also be close to the TTE vectors for words that are related to maleness, such as “man” and “boy.” And it would be close to the TTE vectors for words that are related to authority, such as “boss” and “leader.”

Here is an example of how to train a TTE model with TensorFlow.js:

import * as tf from "@tensorflow/tfjs";

// Load the text data
const textData = tf.data.TextLineDataset("data/text.txt");

// Tokenize the text data
const tokenizer = new tf.TextTokenizer({
numWords: 10000,
});
tokenizer.fit(textData);

// Create the TTE model
const model = tf.keras.Sequential({
layers: [
tf.keras.layers.Embedding(10000, 128),
tf.keras.layers.LSTM(128),
tf.keras.layers.Dense(100, activation: "softmax"),
],
});

// Train the TTE model
model.compile({
optimizer: "adam",
loss: "categorical_crossentropy",
metrics: ["accuracy"],
});
model.fit(textData, epochs: 10);

// Save the TTE model
model.save("model.h5");

This code first loads the text data from a file called "data/text.txt". Then, it tokenizes the text data using the tf.TextTokenizer class. The tokenizer will convert each word in the text data to a unique integer ID.

Next, the code creates the TTE model. The TTE model consists of three layers:

  • An embedding layer that converts the integer IDs to 128-dimensional vectors.
  • An LSTM layer that learns long-term dependencies in the text data.
  • A dense layer that outputs a probability distribution over 100 classes.

The code then trains the TTE model on the text data for 10 epochs. After the model is trained, it is saved to a file called "model.h5".

Traditional Embeddings

Traditional embeddings are trained on a smaller corpus of text, using a simpler neural network architecture. This means that they are not able to capture as complex relationships between words as TTEs. Traditional embeddings are also static, which means that they do not take into account the context in which a word is used. This makes them less accurate for tasks that require understanding the context of a word.

Traditional Embedding Example:

The word “king” can also be represented as a traditional embedding vector with 100 dimensions. However, the meaning of the word “king” is not as well-represented in the traditional embedding vector as it is in the TTE vector. This is because the traditional embedding vector is not trained on a large corpus of text, and it does not take into account the context in which the word is used.

The traditional embedding vector for the word “king” would be close to the traditional embedding vectors for other words that are similar in spelling, such as “kingly” and “kingdom.” It would also be close to the traditional embedding vectors for words that are related to power, such as “authority” and “control.”

Here is an example of how to use pretrained traditional embeddings with TensorFlow in Python:

import tensorflow as tf

# Load the pretrained embeddings
embeddings = tf.keras.models.load_model("https://tfhub.dev/google/tfjs-text/universal-sentence-encoder/4")

# Create a text classification model
model = tf.keras.Sequential([
embeddings,
tf.keras.layers.Dense(10, activation="softmax"),
])

# Train the text classification model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(textData, epochs: 10);

# Save the text classification model
model.save("model.h5");

This code first loads the pretrained embeddings from the TensorFlow Hub. The pretrained embeddings are a set of 100-dimensional vectors that represent the meaning of words.

Next, the code creates a text classification model. The text classification model consists of two layers:

  • An embedding layer that uses the pretrained embeddings to represent the text data.
  • A dense layer that outputs a probability distribution over 10 classes.

The code then trains the text classification model on the text data for 10 epochs. After the model is trained, it is saved to a file called “model.h5”.

This is just a simple example of how to use pretrained traditional embeddings with TensorFlow in Python. There are many other ways to use pretrained embeddings, and the specific approach that you use will depend on the specific task that you are trying to solve.

Which One is Better?

TTEs are a more powerful approach to word representation than traditional embeddings. However, they also require more training data and computational resources. For this reason, traditional embeddings are still often used for tasks where accuracy is not critical, such as text classification.

In general, TTEs are a good choice for tasks that require understanding the context of a word, such as natural language inference and question answering. Traditional embeddings are a good choice for tasks where accuracy is not critical, such as text classification and sentiment analysis.

If you notice the above examples, the TTE vector for the word “king” is more accurate and informative than the traditional embedding vector. This is because the TTE vector is trained on a large corpus of text and takes into account the context in which the word is used.

Here is a table that summarizes the key differences between TTEs and traditional embeddings:

I hope this blog post has been helpful! Let me know your comments.

--

--

Mastan Pasha Shaik
asurion-product-development

Principal Software Engineer, Cloud Architecture, Big Data, ML, AI, Docker, kubernetes, API Architecture