Coursera and keystroke biometrics

MOOC provider Coursera claims it can identify test takers uniquely through its patented keystroke biometrics system. I look under the hood.

Paul-Olivier Dehaye
PersonalData.IO
4 min readFeb 13, 2016

--

As allowed by the Safe Harbor transatlantic data protection agreement, I asked MOOC provider Coursera for access to my personal data.

png pictures of the author, each approximately 50Kb in size

After some back and forth, I received copies of the mugshot pictures and typing samples taken prior to the tests. These are routinely collected by Coursera to make sure the same person is consistently taking tests. In this case the data was collected around July 2015 as I was taking the course Internet Giants: The Law and Economics of Media Platforms, offered by the University of Chicago (or rather by Prof. Randal Picker). Take the course, it is excellent, even if its silence is deafening on privacy issues.

The pictures are each 50Kb. I received only four, with two of those duplicates, despite Coursera taking many more. I doubt these would be all that useful to identify someone reliably, given the low quality and the poor framing (I was toying around to see how resilient the system would be). I have heard that Coursera has since given up on taking pictures for the purpose of identifying students for each test.

More interesting were the keystroke dynamics data. Coursera has a patent for this, and contracts with a German company called KeyTrac.

Biometric credentials tend to be very sensitive information, as they are irrevocable. Keystroke biometrics might initially sound innocuous but they have in fact the potential to be very sensitive. Potentially any website could identify you if they know your dwell time for each key and your flight time between each pair of keys. Heck, the exact lengths and gaps when typing Morse code were already enough 150 years ago to give telegraph operators unique signatures!

Coursera didn’t provide column headers. That’s not nice.

The data is here. It consists of a csv file (originally an Excel spreadsheet), i.e. a simple table. Coursera did not provide headers, so it is hard to guess what some of them are. One column is clear though, and contains the core data (column K). The idea is that the student has to type a sample sentence: I certify this submission as my own original work completed in accordance with the Coursera Honor Code. Coursera logs all the keypresses and depresses, which appear in one column.

I certify I am almost done with typing my typing sample.

Here is an example entry, stripped from its datestamp at the beginning (Medium’s letter o and number 0 look exactly the same, only the number appears):

0d73|8u73|2224dSPACE|40uSPACE|960d67|40u67|1401d69|47u69|705d82|63u82|577d84|47u84|665d73|39u73|761d70|55u70|576d89|64u89|857dSPACE|63uSPACE|865d84|47u84|585d72|39u72|553d73|31u73|1889d83|55u83|609dSPACE|55uSPACE|1417d83|47u83|729d85|71u85|432d66|48u66|504d77|63u77|394d73|63u73|712d83|72u83|377d83|63u83|857d73|79u73|336d79|64u79|672d78|56u78|1920dSPACE|56uSPACE|816d65|56u65|384d83|56u83|1024dSPACE|32uSPACE|1911d77|41u77|576d89|63u89|432dSPACE|48uSPACE|1193d79|56u79|664d87|64u87|640d78|48u78|576dSPACE|64uSPACE|568d79|56u79|567d82|32u82|657d73|48u73|432d71|71u71|450d73|62u73|385d78|64u78|560d65|56u65|648d76|47u76|393dSPACE|63uSPACE|1553d87|56u87|680d79|39u79|505d82|55u82|537d75|55u75|393dSPACE|55uSPACE|385d67|54u67|570d79|64u79|232d77|63u77|401d80|64u80|184d76|55u76|512d69|80u69|296d84|80u84|209d69|79u69|161d68|71u68|897dSPACE|63uSPACE|729d73|47u73|249d78|39u78|273dSPACE|63uSPACE|536d65|64u65|401d67|79u67|137d67|71u67|585d79|55u79|449d82|63u82|153d68|71u68|209d65|63u65|497d78|63u78|313d67|87u67|161d69|80u69|360dSPACE|87uSPACE|641d87|79u87|497d73|55u73|297d84|79u84|337d72|70u72|434dSPACE|71uSPACE|545d84|63u84|201d72|63u72|448d69|64u69|337dSPACE|71uSPACE|225d67|63u67|544d79|48u79|233d85|55u85|336d82|40u82|200d83|72u83|169d69|79u69|192d82|72u82|201d65|62u65|345dSPACE|64uSPACE|296d72|64u72|376d79|56u79|273d78|63u78|368d79|56u79|425d82|46u82|345dSPACE|72uSPACE|209d67|47u67|505d79|47u79|472d68|72u68|105d69|71u69

The character | clearly acts as a separator. Within each blob, the middle character is always either a “d” or a “u”, for down and up. So these blobs translate directly into key presses and depress. The early part of a blob is timing information, the later part indicate which key is activated, in ASCII code or plain text for special characters (SHIFT, SPACE, BACKSPACE, TAB). The first six blobs are 0d73|8u73|2224dSPACE|40uSPACE|960d67|40u67. Together these blobs , and indicate that I pressed down on ASCII character 73 (“I”) at time 0, then up 8 units of time later, then down on the space bar 2224 units of time later, then up 40 units later, then “C”, etc. One can recover progressively I certify… What I entered was hugely inconsistent in the timing (on purpose), but it still got me through. On some of the other entries which Coursera has accepted (which are strangely not among the ones Coursera has given me), I have been uncannily consistent through the use of a Chrome extension called Keyboard Privacy. This is an extension developed by Paul Moore and Per Thorsheim. The extension works by offering the option to increase or decrease the noise associated to your typing. Possibly it maintains a buffer of pressed keys to delay entry, and thereby you can shift mean and variance of your dwell and flight times, and render yourself anonymous and/or interchangeable.

If one wanted to cheat on Coursera, it would be very easy indeed.

If you enjoyed this post please press the recommend button so more people can as well!

Published in Higher Education Revolution

--

--

Paul-Olivier Dehaye
PersonalData.IO

Mathematician. Co-founder of PersonalData.IO. Free society by bridging ideas. #bigdata and its #ethics, citizen science