Coursera and keystroke biometrics
MOOC provider Coursera claims it can identify test takers uniquely through its patented keystroke biometrics system. I look under the hood.
As allowed by the Safe Harbor transatlantic data protection agreement, I asked MOOC provider Coursera for access to my personal data.
After some back and forth, I received copies of the mugshot pictures and typing samples taken prior to the tests. These are routinely collected by Coursera to make sure the same person is consistently taking tests. In this case the data was collected around July 2015 as I was taking the course Internet Giants: The Law and Economics of Media Platforms, offered by the University of Chicago (or rather by Prof. Randal Picker). Take the course, it is excellent, even if its silence is deafening on privacy issues.
The pictures are each 50Kb. I received only four, with two of those duplicates, despite Coursera taking many more. I doubt these would be all that useful to identify someone reliably, given the low quality and the poor framing (I was toying around to see how resilient the system would be). I have heard that Coursera has since given up on taking pictures for the purpose of identifying students for each test.
Biometric credentials tend to be very sensitive information, as they are irrevocable. Keystroke biometrics might initially sound innocuous but they have in fact the potential to be very sensitive. Potentially any website could identify you if they know your dwell time for each key and your flight time between each pair of keys. Heck, the exact lengths and gaps when typing Morse code were already enough 150 years ago to give telegraph operators unique signatures!
The data is here. It consists of a csv file (originally an Excel spreadsheet), i.e. a simple table. Coursera did not provide headers, so it is hard to guess what some of them are. One column is clear though, and contains the core data (column K). The idea is that the student has to type a sample sentence: I certify this submission as my own original work completed in accordance with the Coursera Honor Code. Coursera logs all the keypresses and depresses, which appear in one column.
Here is an example entry, stripped from its datestamp at the beginning (Medium’s letter o and number 0 look exactly the same, only the number appears):
The character | clearly acts as a separator. Within each blob, the middle character is always either a “d” or a “u”, for down and up. So these blobs translate directly into key presses and depress. The early part of a blob is timing information, the later part indicate which key is activated, in ASCII code or plain text for special characters (SHIFT, SPACE, BACKSPACE, TAB). The first six blobs are 0d73|8u73|2224dSPACE|40uSPACE|960d67|40u67. Together these blobs , and indicate that I pressed down on ASCII character 73 (“I”) at time 0, then up 8 units of time later, then down on the space bar 2224 units of time later, then up 40 units later, then “C”, etc. One can recover progressively I certify… What I entered was hugely inconsistent in the timing (on purpose), but it still got me through. On some of the other entries which Coursera has accepted (which are strangely not among the ones Coursera has given me), I have been uncannily consistent through the use of a Chrome extension called Keyboard Privacy. This is an extension developed by Paul Moore and Per Thorsheim. The extension works by offering the option to increase or decrease the noise associated to your typing. Possibly it maintains a buffer of pressed keys to delay entry, and thereby you can shift mean and variance of your dwell and flight times, and render yourself anonymous and/or interchangeable.
If one wanted to cheat on Coursera, it would be very easy indeed.
If you enjoyed this post please press the recommend button so more people can as well!
Published in Higher Education Revolution