A practical use of an inverted Index

Tinus Wagner
Mar 29, 2018 · 4 min read

It’s amazing how much development you can do without ever really touching computer science topics. But our job as engineers (generally speaking), is to break complexity down into simple, manageable chunks. learning is no different. so I’ve taken it upon myself to at least peek into topics when I cross paths with them.

Recently I’ve been working with ElasticSearch quite a bit and come across inverted indexes.

Warning!

I’m absolutely amazed by the level of detail ElasticSearch’s documentation goes into on the topic, and I won’t be doing nearly as good a job of explaining the concept. Rather my approach is to skim the surface, and find a practical use for a concept that grabs my fancy. Definitely dig into the documentation for full-text search and the use of inverted indexes.

Where to start

We’ll the full-text search feature of ElasticSearch is summarised as consisting of two aspects:

Relevance
The ability to rank results by how relevant they are to the given query (…)

Analysis
The process of converting a block of text into distinct, normalized tokens (see
Analysis and Analyzers) in order to (a) create an inverted index and (b) query the inverted index.

Inverted Indexing — What is it?

An inverted index is a method of breaking down documents/data/text into a list of unique items(a process sometimes referred to as tokenization), and for each respective item, creating a reference to its source. For example taking the sentences below we could craft an index list:

This sentence is our example text. it is mostly unique

An individual crafted our exemplary sentence.

Now as you can see if we were to search for the term sentence we’d be able to take these logical steps:

Consider this as the bare minimum example of the concept, just enough to lead you into our example if you know nothing about the topic.

if you want to see how much work goes into making this concept function as well as it does in ElasticSearch, there’s a great series called Elastic Search from the bottom up.

Our Example — Yaaaay Code

Now let’s apply the concept to an example

  • We have some sort of API we can use to make queries.

The API allows us to pass a string as our query and an array as the attributes to look in for the specified string. So let’s say our request is:

  • A collection is returned along with a field that tries to calculate the likelihood of match, for our search term, across all attributes provided.
  • We decide for whatever reason that a match found in one attribute might be more important than another.

For example, articles with the topic of animals will in all likelihood return the most accurate results for a user looking for articles about animals, descriptions fields will in all likelihood return the least accurate.

  • So we decide to rank or weight the priority of our fields, so matches found in more important fields are ranked higher. In the end we create an index list for this:

Let’s do this

Ok so there’s a few things we’ll need to do here, and I’m going to go through them step-by-step, and function by function, then we’ll tie it all together:

  1. Determine attribute where match was found in
  • Map each key from our object to an array using Object.keys
  • Use array.find to search through our object and return the key/attribute where match was found.

2. Take that key/attribute and lookup it’s weighting in our lookup table.

  • Now that we have the key we dive into our lookup_table to find the respective weight.

3. Apply the weighting to our results

4. Sort our matches from highest to lowest

Annnnd All together…

Or on codepen for your pleasure.

A practical use of an inverse index to apply a weighting to a collection of results!

Yell out if you spot any mistakes or inconsistencies!

I’m constantly learning, feel free to be a part of that process :)

Thanks for reading!

Source Material & More to read

Tinus Wagner

Written by

Developer, Producer, Drummer, Synth lover.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade