Understanding Text Chunking and Overlapping with an Example

NoCode AI
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
4 min readMay 25, 2023

--

In Natural Language Processing (NLP), it’s common to break down large texts into smaller pieces, or “chunks,” for easier processing. Sometimes, these chunks might overlap to ensure that important features aren’t missed at the boundaries. This article will help you understand these concepts by using a famous speech by Steve Jobs introducing the iPhone as an example.

Photo by Patrick Tomasso on Unsplash

What is Chunking and Overlapping?

Chunking is the process of dividing a text into smaller pieces, usually to make it more manageable for computation. The “chunk size” refers to how many characters are included in each chunk.

Overlapping refers to the practice of allowing adjacent chunks to share some amount of data. The “chunk overlap” is the number of characters that adjacent chunks have in common.

Let’s see how this works using an example text.

The Example Text

Our text is a part of Steve Jobs’ 2007 iPhone launch speech:

“This is a day I’ve been looking forward to for two-and-a-half years. Every once in a while, a revolutionary product comes along that changes everything. And Apple has been — well, first of all, one’s very fortunate if you get to work on just one of these in your career. Apple’s

--

--

NoCode AI
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

Understanding #AI tools with #nocode ⚒️ Instantly, turn your idea to App 🚀