Understanding Calculation of TF-IDF by Example
Published in
3 min readMar 17, 2020
TF-IDF (term frequency-inverse document frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents.
It plays an important role in information retrieval and text mining.
A survey conducted in 2015 shows that 83% of text-based recommender systems in digital libraries use TF–IDF.
However, many people don’t truly understand the basic calculation. Here we will use an example to demonstrate the calculation step by step.
Formula
Explanation:
- Term Frequency: the number of times that term t appears in document d
- Document Frequency: number of documents where the term appears
- n: number of documents
Example
Step 1: Prepare two documents
documents = [
"The quick brown fox jumped over the lazy dog's back",
"Now is the time for all good men to come to the aid of their party"
]