A Step-to-Step Guide for Feature Engineering on Textual Data- NLP

Harshmeet Singh Chandhok
Data And Beyond
Published in
6 min readJan 9, 2023

--

Photo by Amador Loureiro on Unsplash

“Good features are not born, they are engineered.”

— Kaggle Grandmaster and Data Scientist, Dr. Ben Hamner

⚙️Feature engineering is the process of selecting and creating the most relevant and useful features to input into a machine learning model. It is a crucial step in the machine learning process that can significantly impact the model’s performance, complexity, and ability to generalize to new data. By carefully selecting and constructing the features used as input, it is possible to improve the accuracy and effectiveness of the model and avoid overfitting.

Meme made by me
Image by Author

One of the major sources of text is Twitter’s tweets. Tweet data is a rich source of information that can be used to build machine learning models for various tasks such as sentiment analysis, topic classification, and more. To train a machine learning model on tweet data, we first need to extract features from the tweets. In this blog post, we’ll look at different types of features that can be extracted from tweets and how to extract them in Python.

1. Text

--

--

Harshmeet Singh Chandhok
Data And Beyond

GenAI RA @UTS | AI Master's Student at @UNSW Australia 📈 Medium Blogger 🖋️ Future Skynet whisperer 🤖 Lets Collaborate 💡 https://linktr.ee/techno_paji_