Great post, clean code and interesting issue! I’m in love now ^^
Seriously it’s fascinating that the city didn’t add to the prediction… maybe cities are too balanced categories in term of rich/poor professionals grouped under them.
NB: I’ve had improved results on a similar set of features but a different task when i added a column of job families… Also imho additional bag of word counts could be done on other text segments than those from the original data. Indeed, since language is a normative “shared good” of sorts, the counts observed on a larger set of text segments should converge to a relative importance of the word or expression… thus they could represent a fitted lexicon that you can add to the model as a second-step transform (as if weighting words overall importance in language before weighting their role in the task results)
Also you can use gensim for this work if you want to avoid the overhead of scipy and sklearn.
Cheers and thank you for this excellent post !