Making your search not suck with Elasticsearch — Part 6: Totally irrelevant

10 min readJul 19, 2017

This is part 6 of a multi-part series. In this series I will be explaining important concepts in Elasticsearch and using a demo app I built to demonstrate these concepts. You can follow me for updates as they come out over the next few weeks. If you’d like to start at the beginning click here. You can also find a full index of the series at the end of this post.

In my last post I showed how we could use phrase matching to improve our search results. More specifically we used phrase matching to make Elasticsearch rank documents that have content that more closely resembles our query phrase be ranked over those that just happen to contain the words in our query. In our example we wanted a document which contained the exact phrase “star wars” to be ranked over one which contained the phrase “star of wars”. And we were able to do so very easily by using a phrase match query instead of a plain old match query. While that did fix our immediate problem it still didn’t quite get us the results that you and I expect when searching for “star wars” in a database of movies. Today we’re going to talk about why that is.

Be warned this part is a little longer and get’s a little bit more in the weeds than some of the other parts. It might seem a bit boring and wonkish but understanding exactly how Elasticsearch calculates relevance is important for solving our problem. It’s also generally important for debugging Elasticsearch results.

So how does Elasticsearch calculate relevance scores?

At a high level, it’s actually pretty simple. Elasticsearch uses three factors to calculate relevance:

How many term matches there are in the document (term frequency)
How common those matched terms are (inverse document frequency)
How long the field is (field-length norm)

It might not be obvious yet why those factors matter for search relevance. That’s okay. We’ll break it down.

First is term frequency and this probably the most obvious. Term frequency is just the number of times the query term appears in the document. For example, if you search for “star” and the document contains the word “star” 3 times then the term frequency for that document is 3. Easy, right?

Second is inverse document frequency. This is a little more complicated but still pretty straightforward. Inverse document frequency basically represents how many times this term appears in the entire index. Why does that matter? Well, the idea is we want give more weight to terms that are more rare and less weight to terms that are more common. For example, if a document contains the word “hippopotamus” that’s probably more significant than if the document contains the word “and”. Makes sense, right?

Last is field-length norm. It sounds scary but it just represents how long the field is. The idea here is the longer the field is the less weight it should carry. For example, if you have a name field and a description field the name field is probably going to be less than ten words long whereas the description could be hundreds of words long. You probably want to weight the shorter name field higher.

As I said these concepts are actually very simple. To calculate relevance you combine them like so:

relevance = (tf * idf) / norm

If you look at the structure of this formula you can see how we apply these intuitions in mathematical form. If the term frequency (tf) is greater, the relevance will be higher. If the the inverse document frequency (idf) is greater, the relevance will be higher. And last if the field-length norm (norm) is greater, the relevance will be lower.

How can we apply this to our search results?

Below I have pasted the full contents of the first search result. This is just for reference. I don’t expect you to read it and I’ve highlighted the important parts.

Saving Star WarsThough a Star Wars fan as a child, life holds no magic or adventure for Woody Garrison . Divorced and working two jobs to pay medical bills for his terminally ill son, Star Wars is now just a movie. Only at the request of his son  does he set off with his childhood buddy, Hank , on a quest to find filmmaker George Lucas  and convince him to continue making Star Wars movies. Through a series of mishaps, Woody and Hank accidentally kidnap Lucas and allow the script for Episode III to fall into the hands of an unbalanced fan , a murderous producer, and a certain Dark Lord-portraying actor . To his surprise, Woody finds himself in the middle of an adventure with the fate of the Star Wars movie-making empire hanging in the balance.

Let’s analyze this using the three factors we’ve just learned: term frequency, inverse document frequency, and field-length norm.

First we’ll start by looking at term frequency. We have a total of 10 term matches in the content of our first search result. Loosely speaking we could say the term frequency for this result is 10. Technically we should count for each term separately (5 matches for “star” and 5 matches for “wars”) but we’ll set that aside.

Next we’ll look at the field-length norm. Remember that field-length norm is just the number of terms in the field. Running the content through a word counter I get 135 total words. Again, technically we should count the name and plot summary separately but we’re just doing a high level analysis so we’ll say our field-length norm is 135.

Lastly, there’s inverse document frequency. For simplicity we’ll pretend that the word “star” and the word “wars” have roughly the same level of popularity and therefore roughly the same idf and we’ll just ignore its contribution to the relevance score and call it 1.

Using the formula above to calculate the relevance score looks like this:

relevance = (tf * idf) / norm
relevance = (10 * 1) / 135
relevance = 0.074

This is obviously a massive approximation but it’ll be good enough for our purposes. Let’s take a look at the second result:

The Making of Star WarsThe special was hosted by C-3PO  and R2-D2.{{cite news}} A voiceover narration was additionally supplied by William Conrad. It features behind-the-scenes footage from Star Wars, and interviews with writer/director George Lucas, producer Gary Kurtz, and castmembers Mark Hamill, Carrie Fisher, Harrison Ford, and Alec Guinness.{{cite news}}  Premiering four months after the release of the film, the special was the first Star Wars documentary ever made. It is also notable for showing footage not seen in the film, particularly the scene between Han Solo and Jabba the Hutt in its original form in which Jabba was played by Irish actor Declan Mulholland .A brief glimpse of another deleted scene between Luke Skywalker and Biggs Darklighter on Tatooine is also included.

If we apply the same procedure we applied to the first result we get a term frequency of 6, a field-length norm of 125, and again we ignore the inverse document frequency and call it 1. Plug that into our equation and we get:

relevance = (tf * idf) / norm
relevance = (6 * 1) / 125
relevance = 0.048

So using our fuzzy math we see the first result (“Saving Star Wars”) gets a relevance score 0.074 and the second result (“The Making of Star Wars”) gets a relevance score of 0.048. 0.074 is greater than 0.048 and so “Saving Star Wars” is returned above “The Making of Star Wars”. This is definitely not exactly how Elasticsearch calculates relevance but it is a good high-level way of understanding why one document is ranked over another. If you want to understand in more detail or you need to debug a specific query you can use the Explain API. It’s a little involved so I don’t want to talk about it right now but maybe I’ll cover it more in a future blog post.

Alright, just to drive the point home let’s take a look at one of the actual movies:

Star Wars Episode VI: Return of the JediLuke Skywalker initiates a plan to rescue Han Solo from the crime lord Jabba the Hutt with the help of Princess Leia, Lando Calrissian, Chewbacca, C-3PO and R2-D2. Leia infiltrates Jabba's palace on Tatooine disguised as a bounty hunter and releases Han from a block of carbonite, but she is captured and enslaved. Luke arrives soon afterward and allows himself to be captured. After Luke survives a battle with the Rancor, Jabba sentences Luke and Han to be executed by the Sarlacc. Luke breaks free and a large battle erupts, during which Leia strangles Jabba to death, Han knocks Boba Fett into the gaping maw of the Sarlacc and Luke destroys Jabba's sail barge. While Han and Leia meet with the other Rebels, Luke returns to Dagobah, only to find that Yoda is dying. With his last breaths, Yoda confirms that Darth Vader is Luke's father; he also mentions "another Skywalker". The spirit of Obi-Wan Kenobi reveals that the "other Skywalker" Yoda spoke of is Luke's twin sister, who Luke discovers is Leia. Obi-Wan then tells Luke that he must confront Vader again to defeat the Empire. The Rebel Alliance learns that the Empire has been constructing a new Death Star, and hatches a plan to destroy it. Han leads a strike team to destroy the battle station's shield generator on the forest moon of Endor, allowing a squadron of starfighters to enter the incomplete superstructure and destroy the station from within. The strike team, with Luke in tow, travels to Endor in an Imperial shuttle; Vader senses Luke's presence on the shuttle, but lets them through so that they will be ambushed by the Imperial forces lying in wait on Endor. Sensing Vader's presence, Luke fears he is endangering the mission. On Endor, Luke and his companions encounter a tribe of Ewoks and form a partnership with them. Later, Luke confesses to Leia that she is his sister, that Vader is their father and that he is leaving to confront him. Luke surrenders to Imperial troops, so that they will bring him to Vader. He unsuccessfully tries to convince Vader to turn from the dark side of the Force, but Vader takes Luke to the Death Star to meet Emperor Palpatine, his Sith master and leader of the Empire. Luke learns that the Death Star is fully operational and set to destroy the Rebellion. On Endor, the Rebels are captured by Imperial forces, but a surprise counterattack by the Ewoks allows the Rebels to launch an attack. Meanwhile, Lando leads the Rebel fleet in the Millennium Falcon to the Death Star, only to find the station's shield is still up and the Imperial fleet waiting for them. Palpatine tempts Luke to give in to his anger and join the dark side, and Luke and Vader engage in a lightsaber duel. Vader discovers that Luke has a sister, and threatens to turn her to the dark side. Luke snaps and attacks Vader, severing his father's right hand. Palpatine entreats Luke to kill Vader and take his place; Luke steps back from the brink and refuses, declaring himself a Jedi. Palpatine attacks him with Force lightning. Unable to watch his son suffer, Vader turns on Palpatine and throws him down a reactor shaft to his death, dooming himself to die in the process. With his dying breaths, the redeemed Anakin Skywalker asks Luke to remove his mask so he can look on his son, just for once, with his own eyes instead of through the mask, and tell Leia that there was good in him after all.  On Endor, the strike team, with the help of the Ewoks, defeats the Imperial forces and destroys the shield generator, allowing the Rebel fleet to launch a final assault on the Death Star. Lando leads the remaining ships into the station's core and destroys the main reactor. Luke escapes on Palpatine's Imperial shuttle with his father's body before the Death Star explodes, while Lando escapes in the Falcon. On Endor, Han tells Leia that he knows she loves Luke and offers to step aside; she tells him that Luke is her brother and kisses him. That evening, Luke returns to Endor and cremates his father's body and armor on a funeral pyre. As the Rebels celebrate the end of the Empire, Luke sees the spirits of Obi-Wan, Yoda and Anakin Skywalker watching over them.

Before we even do our full analysis you probably notice a couple things by just glancing at it. First it’s a lot longer and second there aren’t that many matches. When you just scan over it you notice the matches look pretty sparse. In fact, sparseness/density of query terms is a good way of eyeballing and thinking about relevance. Let’s apply the same procedure and see what approximate relevance score we get:

relevance = (tf * idf) / norm
relevance = (8 * 1) / 730
relevance = 0.01

So we see the relevance score for one of the actual movies is quite low in relation to our first two results.

What does it all mean?

This is great but what story is all this number crunching trying to tell us? Well, the short story is this:

Elasticsearch only knows about the textual relevance of the content. It has no concept of cultural relevance.

It doesn’t care that the Star Wars is one of the highest grossing film series of all time. It doesn’t care whether it’s a bunch of post-it notes, David Bowie lyrics, or the Declaration of Independence. To Elasticsearch it’s all just words, man.

So how do we make our search results better? We use machine learning of course! I know that probably sounds really lofty and intimidating but I promise it’s actually pretty simple. I’ll talk about that in the next installment of this series “Making your search not suck with Elasticsearch — Part 7: Machines that learn”

The “Making your search not suck with Elasticsearch” series:
Part 1: What is an index?
Part 2: Elasticsearch is not magic
Part 3: Analysis Paralysis
Part 4: Overanalyzing it
Part 5: Are we still doing phrasing?
Part 6: Totally irrelevant
Part 7: Machines that learn

Making your search not suck with Elasticsearch — Part 6: Totally irrelevant

So how does Elasticsearch calculate relevance scores?

How can we apply this to our search results?

What does it all mean?

Written by Alex Denton