Releasing the Balabit Mouse Dynamics Challenge Data Set

Unsupervised Blog
Balabit Unsupervised
2 min readAug 26, 2016
Via nyuhuhuu

Winning the gold medal at the Olympics is probably the single greatest achievement one can reach in sports. In data science, however, we believe there lies a challenge even greater than winning a competition; and that is designing an engaging one.

For a couple of months earlier this year, we have been busy integrating behavioral biometric capabilities into our User Behavior Analysis product. In order to detect unauthorized usage of user accounts we employ methods that analyze mouse and keystroke dynamics. This way even if attackers try to mimic the owner of the account they hacked, they will fail because no one is really capable of imitating the mouse and keyboard usage of a certain person.

Developing such algorithms were something that we truly enjoyed. We also discovered that biometric authentication is an active area of research pursued by professionals in academia and in the security industry. Sharing our excitement and data with the community seemed to be a perfect opportunity for fulfilling our dream of designing a data science challenge. This is how the idea of the Balabit Mouse Dynamics Challenge was born.

We formulated the task and provided the data set for solving it. The goal of the challenge was to protect a set of users from the unauthorized usage of their accounts by learning the characteristics of how they use their mouses. From March to May several teams tried their best to estimate the anomalousness of test audit trails based on the trail files provided for training the models.

For our cursor movement data set and a more comprehensive description please visit: https://github.com/balabit/Mouse-Dynamics-Challenge.

We hope you will have fun exploring our data as much as we did. You can even evaluate your solution with a part of the labels of the test trails.

Did we succeed in delivering an exciting challenge? We were given favorable feedbacks from the contestants. We managed to set up the task such that it was demanding but, having seen the performance of the models, not impossible. So far no one reported any kind of information leakage that would lead to unintended significant improvement in performance (which obviously does not imply that there are not any). On the whole, we believe it is a decent piece of work that we are happy to share. Enjoy!

Originally published at www.balabit.com on August 26, 2016 by Árpád Fülöp.

--

--