Stylometry and the right to anonymity

Enrique Dans
Enrique Dans
Published in
3 min readAug 7, 2013

--

The recent revelation that J. K. Rowling, the best-selling British author of all time, has published a crime novel, The cuckoo’s calling, under a pseudonym has prompted me to read up on the subject of stylometry, the study of linguistic style, usually applied to written language. Although J.K. Rowling was outed following a leaked tweet from a law firm, and eventually settled after she accepted an apology and a donation to charity, the story was only published by The Sunday Times after it applied stylometric techniques to verify the story.

Anonymity is regarded by many people as a fundamental right, and in many cases is guaranteed by law to protect freedom of expression from reprisals, pressure, or censorship. In the case of the authorship of a book, anonymity or the use of a pseudonym allows for creative freedom and the possibility of a work being judged on its merits, and has been used by many writers over the years.

Stylometry uses a variety of analytical techniques to identify the main characteristics of a text. The use of certain grammatical rules (articles, pronouns, conjunctions, auxiliary verbs, interjections, etc) provides, through an examination of the main components, a fingerprint that can help identify an author. Other techniques involve the use of neuronal networks, genetic algorithms, or the analysis of associated words in search of recognizable guides.

Stylometry can also be an important weapon for governments that want to analyze information on the internet. To be able to identify a suspected activist on the basis of his or her writing, to be able to attribute certain texts, to store the stylistic fingerprints of everybody on the basis of what they write online, or to be able to use these techniques as evidence in a trial only strengthens the surveillance state we now live under.

In response, students at Drexel University’s Privacy, Security and Automation Lab have developed JStylo-Anonymouth, a software that analyzes a text and suggests changes to it to prevent its key characteristics being identified through the use of stylometry techniques (described in greater detail in this study). In short, it is a kind of “reverse stylometry” for people who wish to remain anonymous.

The subject has captured my imagination, prompting a number of questions. I imagine that stylistic traits would be lost in translation, for example, or would generate others that might be completely different. I also wonder what would happen in the case of texts written by two or more authors, with shared conclusions: whenever I have been involved in drafting documents for a campaign I have always worked with several other people. The idea that one’s writing carries a fingerprint and that this could be used to identify us with reasonable precision is, to say the least, intriguing, as is the possible use of stylometry by government agencies that operate on the fringes of power. This is definitely a subject that deserves much greater investigation.

--

--

Enrique Dans
Enrique Dans

Professor of Innovation at IE Business School and blogger (in English here and in Spanish at enriquedans.com)