Recap of Part 1
In Part 1 we:
- Looked at the Kaggle problem
- Imported the data using Pandas
- Looked at high-level statistics of our dataset
- Checked for class imbalances
- Took a close look at the
Keyword
feature
Plan for Part 2
Now we’ll:
- Look at the
Location
feature - Tweet character and token length
- Use Mann Whitney U test for feature selection
Part 2: Starting Feature Engineering & Selection — You’re here!
Location exploration
We’ll next take a look at the Location
feature in a very similar fashion to what we did earlier for Keyword
.
# Value counts
train_df['location'].value_counts(dropna=False)