How Your Texts Reveal More About Your Personality Than You Think (And More Than You Might Like)

And Why That Questions Our Current Understanding of Data Protection

Imagine a world in which every word you write could be one word too much. In which every sentence you type could reveal something about you that you’d rather leave concealed.

A TED talk by Mariano Sigman unmercifully depicts the likelihood that such a world could be the world of tomorrow. Through the example of reliably predicting the development of schizophrenia in risk patients, the neuroscientist introduces the possibility of using a rather simple yet efficient method to predict individuals’ future mental health — based on nothing more than an analysis of the semantics of their speaking. While Sigman focuses his talk on the application of such techniques in relation to forecasting the development of schizophrenia, researchers elsewhere found that such methods can actually be used to generate knowledge about individuals’ personality and character traits more generally. An article in the Scientific American which comprehensibly presents some of the possibilities of language analysis that have found academic attention so far, lists not only gender and age but also suicide risk, depression or personal reaction to trauma as some of the variables that already can be detected through an assessment of individuals’ use of writing style.

Sigman ends his presentation on an overly optimistic note, leaving the audience with the vision of a future that sees a “different form of mental health, based on objective, quantitative and automated analysis of the words we write, of the words we say.” Firstly, I would like to emphasize that my reaction is by no means intended to depreciate Sigman’s presentation, which I found greatly thought-provoking. Thought-provoking in a good way: I am convinced that an open communication between those who create and understand the fancy algorithm-stuff and those who sit in armchairs and think about the code’s bigger consequences, is one of the crucial prerequisites of a global technological development that allows humankind its humanness, society its values and every individual their right to dignity.

But although I thus appreciate the insights into the land of fading boundaries in the department of psychological speech analysis given by Sigman, I had to listened to this last sentence of him again and again. I understand that, ending a presentation on a positive, hopeful and stimulating note might support both the persuasiveness of the message as well as the mood of the crowd. But I can’t help being left in confusion as to what I, what anyone is supposed to expect from the forecast of a “different form of mental health”. Simply, what is that?! If that, what Sigman means, falls anywhere close to those assumptions I was able to come up with, I am left confused — slightly confused about the trusting hope and optimism in these last words with which he leaves the stage.

Of course there’s an appeal to using (physically) un-invasive methods such as a simple text- or speech analysis in order to reliably detect mental health hazards and thus enable accurate prophylactic responses before any harm is done. Especially when recalling that in the US alone the annual costs associated with mental health disorders mount as high as $100 billion, the prospect of such early diagnoses and consequential preventive treatments will not only raise hopes of improving the lives of those concerned but also elicit pleased smiles on the faces of those delighted at the thought of these saved dollars.

Furthermore, the technique of analyzing text- or speech patterns has long entered the stage of every-day business in various other scenarios: algorithms that are asked to identify or verify the author of a text, spam filters making use of natural-language-processing methods or Wikipedia employing similar methods in the fight against vandalism on their website. And although I heard of all of this before, I can feel a resentment building up in my chest when I think about the consequences the widespread use of text-based psycho-profiling algorithms such as the one described by Sigman could have.

These consequences go further than the usual concerns associated with the buzzwords of big data, algorithms or profiling; further than the typical fear of potential discrimination if sensitive information about your (mental) health would fall into the hands of your insurance, your boss or even ‘only’ a friend who might (subconsciously) start to look at you in a different light from that moment on. We’ve all heard those scary stories and although their frequent circulation makes them not a pinch less serious, the accompaniment of text-analyzing algorithms, implemented in an increasingly broad spectrum of social configurations leading to progressively more personal insights and consequences, will actually go further.

If it was possible to expose deeply personal insights concerning an individual such as their emotional well-being or future medical risks simply by running an algorithm over a text of that individual then…Well, when did YOU publish your last text on this website? Or a longer Facebook post? When did you send your last unencrypted email? And didn’t you have to write a cover letter for that job application?

The truth is, if text analysis becomes sophisticated enough to reveal personal details in split-seconds through an instant assessment of any longer text document of yours, we must admit that we have reached the point where the idea (or illusion) of a personal right to control your own private information has ultimately gone out the window. I am not going to tap into the discussion surrounding the question in how far current legal approaches to regulate the use and administration of personal data is already doomed anyhow; but to put out a rough description of the effects you may envision: if every longer piece you write entails an inventory of sensitive information about you which can be read by those with the right zeros and ones…protection of personal data and a right to informational self-determination will not be eroded; it will be blown up.

This might sound dramatic. But maybe some drama is necessary here and there to awaken that generation of data-disseminating social media marionettes who, benumbed from the flippancy and usualness with which private information is nowadays send through the fiber optic cables looping around this world, still believe that their personal encounter with the topic of privacy protection ended the moment they placed sticky tape over their webcam-lens. Of course this is not about YOU. This is about the fact that it becomes increasingly more difficult to control your own informational footprint in a world where bits and bites of it easily slip away in the general flow of data, which seeps across and beyond the boundaries of basically all areas of an individual’s existence. This is about the fact that – let’s face it – in light of increased application of natural language processing technologies, the usage of analyzing algorithms that can generate sensitive insights concerning us based on any substantial text of ours, then even those most conscious about their digital traces will have to admit that protecting yourself from the grasp of such examinations seems rather hopeless. Too essential is the use of writings in every day communications, expression of ideas and sharing of knowledge – too much text of ours is out there already; too much additional material we necessarily are going to produce every single day in the future - online and offline. Capturing thoughts, knowledge and experience through writing has been a major achievement of humanity, which allows us to communicate over distance and time, pass on information and thus “stand on the shoulders of giants” in the quest of continual innovation and improvement. No one would want to limit themselves in their use of writings, although that seems to be the only way to escape mind-reading, text-analyzing algorithms of the future. Also, no one SHOULD limit themselves in their use of writing – before we know it the algorithms advanced further so that they are no longer limited to written text but succeed in real-time analyzing spoken words. And then? Are we going to limit our use of language? Is the only way out a move towards a society in which humans can only communicate through texts or words that have been generated by anonymizing algorithms which take the intended message of the author, cut out any hint of originality and transform it into a smooth, universal expression of its essence? Standardization of all communication – devitalizing the soil of individual expression which could nourish practices of intrusive text analyses? But where’s the future of literature in such a scenario? What happens to passionate blogging, provoking journalism or the art of old-fashioned letter writing?

I intentionally exaggerate here and I am aware of that. But what I hope to illustrate is that clasping on to our understanding of personal data protection despite constantly new winds panting over from the ocean of technological development will not work in the long run since it might eventually demand an end to the information age as we know it. Those breezes of innovation which rattle at the walls of our legal and ethical constructions won’t die off. Right now it seems as if we were haphazardly trying to control the draft by building dikes in front of our door, but maybe we should consider changing the structure of our building instead, replacing some of those legal or ethical ideals with more flexible elements. Possibly we could even use some of those elements we replace, recycle them to build something new, something like a windmill in the front yard – ready to capture the strong current of air blowing from the ocean of innovation. I don’t ask to abandon our values and ideals related to the protection of personal data and privacy, but I am afraid that as long as people have the right to speak and write and consent to sharing their personal data with whomever they want, the current approaches of data protection built on individual self-determination will fail. Technology keeps advancing, and more and more things will become a possible source of sensitive personal data. Legally speaking, sensitive personal information is a special category of personal information which entails those facts, which upon their disclosure are likely to lead to major negative consequences for the individual concerned. Think topics like political affiliations, sexual orientation or information concerning you health. Since they are so sensitive these type of information have to be handled with extraordinary care and (at least in the EU) it is for example not allowed to process them without explicit consent from the person they concern. So, leaving out all those emotions and ideological remarks for just a moment – if more and more, and eventually every snip of what we do, write and talk as we walk our way through this world leaves behind a trace of sensitive personal information in the hands of those who have means to generate such insights (with time such technology will get cheaper so that eventually the possibility to process such data traces will be accessible to nearly everyone), well, how is that supposed to work bureaucratically? And who’s able to monitor that any longer?

My plea for the need of reconsidering the entire structure of data protection as we currently know it is not build on resentment or the wish to abandon those values and ideals it is built upon. Rather the opposite. I am afraid that clinching on to an old model which is based on the idea that each individual should and sufficiently could be the guardian of their own personal data, a model which is not flexible enough to cope with the developments on a long term, will lead to a gradual erosion of exactly those values. Thus, an open, international discussion about alternative approaches which allows the consideration of completely different angles, allows a rethinking of the concepts everyone is throwing around and taking for granted – such an discussion seems to me the most honest, the most painless and the most promising solution to the challenge of those storms that will inevitably sweep over the land of law and ethics from the ocean of technological innovation.