🌳📖💻#4:😶 — POS-Deletion

Unconditionally Universal Speeches

Martin Breuss
Dec 21, 2016 · 8 min read

Look at the thing, check out the code, read below if you’re interested : )

inspiring JFK-essentials

People speak a lot.

Political speeches, for example, tend to be lengthy (but honestly, everyone’s speeches are).

So today, playing with the linguistic concept of Language Universals, I wrote some code that weeds through speeches, taking out everything except nouns and verbs.
Reading the speech in the aftermath allows a maybe pensive, maybe revealing, but most probably just ten-seconds-fun-amusing digest of some past US president’s mumblings.

Hope you’ll enjoy : )

What are Language Universals

Linguistics defines two types of Language Universals for natural human languages: unconditional ones and conditional ones.

And actually it seems their difference is smartly explained in the derivation and semantics of the two words. (Oh, those linguists… 😉 )

While conditional Language Universals rely on some conditions to hold up (e.g. “if a language has inflection, it usually also has derivation”), unconditional Language Universals are true without further prerequisites.

In my code I will focus on one of the unconditional LUs, namely:

Every language has nouns and verbs.

Ok. Easy. So, let’s think this forward…

DISCLAIMER: I’m just self-studying (and having fun), so THE REST OF THESE MUSINGS FROM HERE DOWNWARDS ARE NOTHING BUT SELF-MADE MIND-GAMES FOR FUN AND PROGRAMMING PRACTICE. Just keep that in mind. Also, if you have a comment: feedback and corrections are very welcome!

For today’s project I assumed that nouns and verbs are the essential Parts-of-speech in every language, since they are common to them all.

Next I wanted to see what would happen to my text understanding when stripping away all those “non-essential” POS.

Ready for some process talk? Here we go!

Getting to know NLTK

NLTK is full of pre-loaded corpora, and after watching a introduction I went to work forward with the presidential speeches.

I aimed to practice two basic concepts of NLP:

  • Tokenization and

NLTK has great wrappers for it all, so that both can be achieved in just a few lines of code (check out tokenize_text() and tag_POS())

After segmenting the text using word and sentence tokenizers, I attached the POS information to each word. And that’s where the serious preprocessing ends and the fields open up for applying stupid ideas. 😁

I went ahead and substituted all non-nouns/verbs with inconspicuous dots.
. <- yep, like that one.
Nouns and verbs were allowed to stay.

Getting to know presidents

Then I stitched the speech back together. Here’s how the beginning of JFK’s 1962 speech looks like in its essence:

PRESIDENT JOHN F. KENNEDY . ANNUAL ADDRESS TO A JOINT SESSION OF CONGRESS ON THE STATE . THE UNION . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. Mister Sam . Rayburn . . .
. . House . . . . . . . . .
. . . Congress . . Constitution . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . State . . Union . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . North . . South . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . ECONOMY . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . Mr. Khrushchev . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . Congress . . . First . . Manpower Training . Development Act . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second . . Youth Employment Opportunities Act . . . . . . . . . . . . Americans . . . . . . . . . . . . . . . . Americans . . . . . . . . . Third . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . First . Presidential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second . Presidential . . . . . . . . . . . . . . . Federal . . . . . . . Third . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Congress . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . World War II .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . Government . . . . . . . . . . . . . . . . . . . . Federal Pay Reform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Federal Budget .
. . . . . . . . . . . . . . . . . . . . . First . . . . . . . . . . . . . Secondly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Third . . . . . . . . . . . . . . . . . . . . . . . . . . .
GETTING AMERICA MOVING . . . . . . . . . . . Budget .
. . . . . . . . . . . . . . .
. A . America . . . . . America . . . . . America . . .
. . . . . . . . . . . . . . . . .

I actually found it surprisingly interesting to look at different president’s speeches after applying the cloze deletion pipeline.

There is quite a lot of meaning that can be deduced after erasing everything but nouns and verbs — even in the above example where only nouns are left over, a certain general topic and mood of the speech can be deduced.

Check out the “full” essential version live on AWS.

It’s also an example for how the auto-generated pages from my code look like. Nice parchment, eh?

Of course, missing words open up a hallway of doors to misinterpretation, so better don’t use it to claim for anything substantial. On the other hand, also an abundance of words does. So I guess we’re just looping back to the fact that languages are messy. 😉


That’s it for today. If you feel like pushing some speeches to go for a run and have them return dotted and exhausted, please go ahead and fetch the code.

Let me know if you’ll find something fun or exciting after filtering for POS!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store