Fake Data Science

I’m stuck on an airport ✈️ , and also learning NLP, so — just for the fun of it — I made this today while reading about NLTK: The Flight Oracle.

Two takeaways:

1. These graphs are interesting

They are extremely easy to create, if you follow the NLTK book. Basically:

text.dispersion_plot([“catch”, “flight”, “tomorrow”, “stay”, “another”, “day”])

The graph shows for each word the positions of its appearances in the text. (in NLP-speech: the word tokens in their positions in the running text of the corpus)

There are a ton of seriously interesting things that can be done with that. However:

2. I only made a pseudo-analysis, and it made me think

I didn’t have the intention to do a proper analysis in this swift airport hack, plus I really enjoy playing around with a concept without being too serious about it.

But after finishing and looking at it, graphs and text and code together (just scroll over it, don’t read (which you might have anyways done!)), I thought about how easy it is to (seemingly) correlate completely unrelated things, draw utterly unfounded conclusions, and still make it look like something. 
I was reminded of this book (that I never read, but now skimmed, both smiling a bit and also feeling put off by the obvious different times it was written in).


I like education for everyone, and also tools that are easy to use.
There’s a lot of data out there, and a lot of powerful tools that don’t require much knowledge or rigor to use.

So I hope that the literacy and attention span of, well, everyone will positively correlate with the ease of creating sound-looking analyses.

And now I’m just messing with you 😜

(but not really)