NLP Looks Scary Until You Understand Bag of Words

Mohitkumar Mahto — Fri, 15 May 2026 12:34:51 GMT

When I first started learning NLP, one of the easiest text representation techniques I came across was Bag of Words (BoW).

The idea is pretty simple:
Instead of understanding the meaning of a sentence, Bag of Words only focuses on which words are present and how many times they appear.

It completely ignores grammar and word order.

Simple Definition

Bag of Words is a text representation technique where a sentence is converted into numerical vectors based on the frequency or presence of words in the vocabulary.

Let’s Understand with an Example

Suppose we have these sentences:

S1 = "Cats are cute"
S2 = "Dogs are cute"
S3 = "Cats and dogs play"

Text Preprocessing

Usually, before applying BoW, we do some preprocessing:

Convert all words to lowercase
Remove stopwords if needed
Apply stemming/lemmatization

After preprocessing:

S1 = "cats cute"
S2 = "dogs cute"
S3 = "cats dogs play"

Create Vocabulary

Now collect all unique words.

["cats", "cute", "dogs", "play"]

Create Vectors

Now represent each sentence using the vocabulary.

SentencecatscutedogsplayS11100S20110S31011

This is called a Binary Bag of Words because values are either 1 or 0.

1 → word is present
0 → word is absent

Frequency Based Bag of Words

Instead of only checking presence, we can also store frequency.

Example:

Sentence = "cat cat dog"

Vocabulary:

["cat", "dog"]

Vector:

[2, 1]

Because:

“cat” appears 2 times
“dog” appears 1 time

Advantages of Bag of Words

Simple and easy to understand
Works well for basic text classification tasks
Converts text into fixed-size numerical vectors

Limitations of Bag of Words

1. Ignores Word Order

These two sentences get almost the same representation:

"dog bites man"
"man bites dog"

But their meanings are completely different.

2. Sparse Matrix Problem

If vocabulary becomes huge, vectors become very large and mostly filled with zeros.

3. No Semantic Meaning

BoW cannot understand relationships between words.

For example:

“car” and “vehicle” are related
But BoW treats them as completely different words

4. Out Of Vocabulary (OOV)

If a new word appears during testing that was not present during training, the model cannot handle it properly.

Final Thoughts

Bag of Words may look basic today, but it was one of the foundational ideas in NLP.

Understanding BoW makes it much easier to learn advanced techniques later like:

TF-IDF
Word2Vec
GloVe
FastText
Embeddings

Sometimes the simplest ideas are the best place to start 🚀

One Hot Encoding

Mohitkumar Mahto — Fri, 15 May 2026 07:03:40 GMT

Simple Definition

One Hot Encoding is a technique used to convert words or categories into binary vectors where only one position is 1 and all other positions are 0.

Let’s Understand with a Simple Example

Suppose we have these sentences:

D1 = “I love pizza”
D2 = “I love burgers”
D3 = “Pizza is tasty”

Step 1: Create Vocabulary

First, collect all unique words from the sentences.

["I", "love", "pizza", "burgers", "is", "tasty"]

Vocabulary size = 6

Step 2: Assign One-Hot Vectors

WordVectorI[1 0 0 0 0 0]love[0 1 0 0 0 0]pizza[0 0 1 0 0 0]burgers[0 0 0 1 0 0]is[0 0 0 0 1 0]tasty[0 0 0 0 0 1]

Each word gets its own unique binary vector.

Step 3: Encode Sentences

D1 = “I love pizza”

I      -> [1 0 0 0 0 0]
love   -> [0 1 0 0 0 0]
pizza  -> [0 0 1 0 0 0]

Encoded form:

[
 [1 0 0 0 0 0],
 [0 1 0 0 0 0],
 [0 0 1 0 0 0]
]

D2 = “I love burgers”

[
 [1 0 0 0 0 0],
 [0 1 0 0 0 0],
 [0 0 0 1 0 0]
]

D3 = “Pizza is tasty”

[
 [0 0 1 0 0 0],
 [0 0 0 0 1 0],
 [0 0 0 0 0 1]
]

Limitations of One Hot Encoding

Vectors become very large when vocabulary size increases.
It does not understand word meaning.
“Pizza” and “Burger” are treated as equally different as “Pizza” and “Car”.
No semantic relationship exists between words.

What Came After One Hot Encoding?

To solve these limitations, more advanced techniques were introduced:

Word2Vec
GloVe
FastText
Word Embeddings

These methods capture semantic meaning and relationships between words.

Stories by Mohitkumar Mahto on Medium

NLP Looks Scary Until You Understand Bag of Words

Simple Definition

Let’s Understand with an Example

Text Preprocessing

Create Vocabulary

Create Vectors

Frequency Based Bag of Words

Advantages of Bag of Words

Limitations of Bag of Words

1. Ignores Word Order

2. Sparse Matrix Problem

3. No Semantic Meaning

4. Out Of Vocabulary (OOV)

Final Thoughts

One Hot Encoding

Simple Definition

Let’s Understand with a Simple Example

Step 1: Create Vocabulary

Step 2: Assign One-Hot Vectors

Step 3: Encode Sentences

D1 = “I love pizza”

D2 = “I love burgers”

D3 = “Pizza is tasty”

Limitations of One Hot Encoding

What Came After One Hot Encoding?