What is the difference between CountVectorizer, HashingVectorizer & TfidfVectorizer?

Crystal X
Geek Culture
Published in
6 min readAug 20, 2021

--

In my last several posts I have been discussing sklearn’s functions regarding natural language processing, or NLP, because these algorithms cover a niche in machine learning that is not very heavily represented in the Kaggle competitions I have the capability to enter. As a result of this, a weakness in this genre of machine learning has been identified that needs to be remedied. NLP covers various types of programs, such as identifying classification text, developing question and answer systems, developing recommendation engines, or even creating a chatbot. My most recent post on the subject of NLP can be found here:- https://medium.com/geekculture/how-sklearns-countvectorizer-and-tfidftransformer-compares-with-tfidfvectorizer-a42a2d6d15a2

In my most recent post I discussed how sklearn’s TfidfVectorizer performs the same tasks as both CountVectorizer and TfidfTransformer together. In this post I will endeavour to discuss how HashingVectorizer can perform the same tasks as CountVectorizer.

Although HashingVectorizer performs a similar role to CountVectorizer, there are some similarities that need to be addressed. HashingVectorizer converts a collection of text documents to a matrix of token occurrences. This text vectorizer implementation uses the hashing trick to find the…

--

--

Crystal X
Geek Culture

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.