🔥Lit or Arson? Disaster Tweet Classification Part Two: Starting Feature Engineering & Selection

7 min readApr 27, 2020

--

Recap of Part 1

In Part 1 we:

Looked at the Kaggle problem
Imported the data using Pandas
Looked at high-level statistics of our dataset
Checked for class imbalances
Took a close look at the Keyword feature

Plan for Part 2

Now we’ll:

Look at the Location feature
Tweet character and token length
Use Mann Whitney U test for feature selection

Part 1: Data Exploration

Part 2: Starting Feature Engineering & Selection — You’re here!

Location exploration

We’ll next take a look at the Location feature in a very similar fashion to what we did earlier for Keyword.

# Value counts
train_df['location'].value_counts(dropna=False)

Alex Lau

Written by Alex Lau

Data scientist, cat foster father, D&D wannabe — California

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams