Exploring Different Keyword Extractors — Statistical Approaches

Ishan Shrivastava
May 11 · 11 min read
Image for post
Image for post
Image Source

Introduction

Statistical Approaches

TF

TF-IDF

Image for post
Image for post

YAKE (Yet Another Keyword Extractor)

1. Text Pre-processing and candidate term identification

Image for post
Image for post
Image for post
Image for post

2. Feature Extraction

2.1 TCase (Casing)

Image for post
Image for post

2.2 TPos (Term Position)

Image for post
Image for post
Image for post
Image for post

2.3 TFNorm (Term Frequency Normalization)

Image for post
Image for post

2.4 TRel (Term Related To Context)

Image for post
Image for post
Image for post
Image for post

2.5 TSentence (Term Different Sentence)

Image for post
Image for post

3. Computing Term Score

Image for post
Image for post

4. n-gram generation and computing candidate keyword score

Image for post
Image for post

5. Data deduplication and ranking

Feature Importance

Summary


gumgum-tech

Thoughts from the GumGum tech team

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store