Understanding Calculation of TF-IDF by Example

Jerry An
Analytics Vidhya
Published in
3 min readMar 17, 2020

--

Photo by ThisisEngineering RAEng on Unsplash

TF-IDF (term frequency-inverse document frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents.

It plays an important role in information retrieval and text mining.

A survey conducted in 2015 shows that 83% of text-based recommender systems in digital libraries use TF–IDF.

However, many people don’t truly understand the basic calculation. Here we will use an example to demonstrate the calculation step by step.

Formula

TF-IDF formula

Explanation:

  • Term Frequency: the number of times that term t appears in document d
  • Document Frequency: number of documents where the term appears
  • n: number of documents

Example

Step 1: Prepare two documents

documents = [ 
"The quick brown fox jumped over the lazy dog's back",
"Now is the time for all good men to come to the aid of their party"
]

--

--