Steps to Classify documents in R

Text Analytics with R

Ibtissam Makdoun
6 min readFeb 26, 2022

Classification is another technique of text mining. In data science classification is a branche of supervised machine learning. The goal of classification is to create classes for a specific document or entity. It builds the model that uses feature variables available in the dataset to build these models and identify classes based on a target variable in the data set.

The build up algorithm is then used to predict the class of new data. It predicts the target variable based on other feature variables available in the new data.

In classification we split the data into training data and test data. Training data is used to build the model, and test data is used to test its accuracy of the model.

How can we use classification for text mining?

In text mining, words in a document become feature variables. Also each document of the training set needs to be tagged with a specific class. We use features and classes to build the model. Most classification algorithms require feature and target variables to be in numeric values. That is why we convert textual data to numerical values using TF-IDF technique. In this tutorial we will follow a list of steps to build a classification algorithm to classify courses into three categories using Naive Bayes algorithm.

--

--

Ibtissam Makdoun

Researcher in Data Science and content creator. Find therapy in Notebooks and Pencils.