Man Makes Fire — How To Find Important Words in Text

A 30 Day Writing Challenge

Josh Sephton
3 min readApr 2, 2017

It’s Day 2 of my 30 Day Writing Challenge on the run up to starting my new job building a machine learning team. I’m going to have a huge amount of spoken and written documents available to me. Today, let’s look at how we can reduce the amount of words without reducing the meaning.

Much of the English language is redundant. It’s full of niceties to help prose flow together without giving extra detail. These extra words are going to get in the way as we start to use our data to build machine learning models.

Consider the sentence “the man successfully makes a fire”. The same meaning can be conveyed with “man makes fire”. You’ll sound like a neanderthal if you actually talk like that, but we’ve not lost the gist of the message by picking out only the important words.

At the highest level, we want to find all the words in a particular document which don’t appear in all the other documents. This means that we’ll immediately discount words like “the”, “and”, and “I”. We’ll also give less importance to common words which appear in every document — that is words which are common in our domain.

I haven’t invented this technique. I didn’t even find this technique through my own research. I attended Canvas Conference in…

--

--

Josh Sephton

Founder of Pritchatts Consulting Ltd., making companies more profitable by making their data work for them.