What is the inverted index in elastic search?

Sujatha Mudadla
2 min readNov 26, 2023

--

In Elasticsearch, the inverted index is a core component that enables efficient and fast full-text search.

  1. Tokenization: When you index a document in Elasticsearch, the text content is processed through a tokenizer. The tokenizer breaks down the text into individual units called tokens. These tokens are usually words, but they could be other units depending on the tokenizer and the language analyzer in use. During this process, common words like “a,” “an,” and “the” (known as stop words) are often removed.
  2. Inverted Index Creation: The inverted index is a mapping structure that stores the relationship between terms (tokens) and the documents they appear in. It “inverts” the normal structure of storing documents with their terms. Instead, it stores a list of documents for each term.

— — — — — — — — — — — — -

For example, if you have two documents:

Document 1: “The quick brown fox”
Document 2: “A quick brown dog”

The inverted index might look like:

Term | Documents
— — — — — — — — — — -
The | 1
quick | 1, 2
brown | 1, 2
fox | 1
A | 2
dog | 2

  1. This way, when you search for a term, Elasticsearch can quickly look up the term in the inverted index and retrieve the list of documents that contain that term.
  2. Term Frequency and Document Frequency: The inverted index may also store additional information, such as the term frequency (how often a term appears in a document) and the document frequency (how many documents contain a specific term). This information is crucial for relevance scoring in search queries.
  3. Posting Lists: The lists of documents for each term are often referred to as posting lists. These lists contain the document IDs or pointers to the documents that contain the respective term.
  4. Query Processing: When you perform a search query, Elasticsearch analyzes the query terms, looks them up in the inverted index, and retrieves the relevant posting lists. The intersection of these lists provides the set of documents that match the query.

The use of an inverted index allows Elasticsearch to perform complex search operations quickly and efficiently, making it a powerful tool for handling large volumes of textual data and providing relevant search results in near real-time.

--

--

Sujatha Mudadla

M.Tech(Computer Science),B.Tech (Computer Science) I scored GATE in Computer Science with 96 percentile.Mobile Developer and Data Scientist.