🌳📖💻#7:⚗💬 — Language Creativity
How often do we write the same sentence multiple times?
Inspired by Chomsky’s dismissal* of an approach another linguist to assess the probability of sentences to occur, I built a small project yesterday.
The probability of a sentence would nearly always be close to zero, because most sentences get created on-the-go. They have never existed in that exact constellation before.
For me this is an extremely interesting aspect of language! Why are we able to create an endless amount of always new sentences, and yet can still be able to understand each other (more or less)?
Chomsky reasons that there must be a common deep structure to related sentences, that can generate multiple surface structures. These surface structures are digitally infinite — meaning that a limited set of words can produce an unlimited number of meaningful expressions.
So I made an attempt to check up on that while practicing some NLP. Using NLTK, I wrote some code that looks into 18 different literary texts, picks a random sentence from each, and compares it to the rest of the text. It keeps track of the doubles (or multiples) it finds and prints them for inspection.
You can run the code a few times and check the random results!
Usually there are zero doubles.
I’ve also encountered some exceptions, such as the 17-times-use of ‘said the little Jackal .’ in a book of Bryant, but for me this is an example of the intended use of repetition — a literary method that I think draws its power exactly from the fact that usually such repetition does not occur.
The fact that we don’t repeat ourselves much on the surface, even though as member of humanity, we live so similar lives and must therefore also think in similar concepts, is stunning. Language Creativity has interesting implications both for linguistics and also for NLP.
It also sheds a light on why plagiarism analysis can be indeed very effective.
*Just in case someone wonders: Ideas sprang from reading in this book (p.24), which is very outdated. I found it among the study materials of my mum 😅 — however I think there are interesting sparks for thoughts and projects to be found everywhere!