TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Implementing Various NLP Text Representation in Python

7 min readMay 16, 2022

--

Image By Amador Loureiro on Unsplash

Natural language processing (NLP) is a subset of machine learning that deals with language and semantics. A machine learns the semantics of words by being trained, like how typical machine learning works. A problem arose when we realised that almost all commonly used machine learning models can only take numeric inputs. So to train a machine using text data, we have to find a way to represent a text as a numeric vector. This article will demonstrate some simple numeric text representations and how to implement them using Python.

For this tutorial, we are going to use the following data. The context of this data is a review of a university subject. I have preprocessed the data, namely stop-word removal, punctuation removal, and lemmatisation. These texts are all fictitious.

Let’s begin with our most straightforward representation. We call the data above using Pandas,

df = pd.read_csv("data.csv")

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Gerry Ongko
Gerry Ongko

Written by Gerry Ongko

Data Science and Analytics | Machine Learning | FinTech and Markets | Equity Trustees

Responses (1)