CADY: Compromised Account Detector for Yammer
By: Apoorva Dornadula (Microsoft Software Engineering Intern)
The number of social media users steadily increase from year to year. In 2017, around 2.46 billion social media users exist worldwide (a 1.8 billion user increase from 2016) . Corporate professionals have been following this trend for the past couple of years by using platforms such as Yammer. Yammer is an enterprise social networking platform targeting small or large companies that allow employees to collaborate on projects, discuss ideas, and connect with other employees .
With this increase in social media use in environments with personal data and sensitive corporate matters, the consequences of an account being compromised can be dire. In the case of Yammer, compromised accounts can be used to propagate spam within the company, impersonate high level employees to spread malicious messages, or be included in a cybercrime network. These risks call for the need to be able to detect when an account has been compromised. For my internship project this summer, I decided to tackle the challenge of detecting account compromise on Yammer based on user behavior using machine learning.
I tackle this problem by studying users’ social behavior on Yammer through a tool I developed called CADY (Compromised Account Detector for Yammer). CADY uses streams of a user’s Yammer activity to determine whether a new activity stream is suspicious for that user. A variety of different introversive and extroversive behaviors are included as features to encompass the active and passive tendencies of users. No real customer data is used. CADY uses logistic regression and a multilayer perceptron (a basic neural net) to determine whether an incoming stream of behaviors belongs to the user it claims to originate from. CADY has an accuracy of around 92% and a false positive rate as low as 3% for some of its classifiers.
CADY’s classifiers are implemented using Tensorflow  and CNTK (Microsoft Cognitive Toolkit) . This was done in an effort to compare the performance between the two platforms. Although CADY is designed to detect account compromise on the Yammer platform, many of the features and behaviors studied can be relevant for other social media platforms.
For more information about CADY in the form of a research paper, please visit https://web.stanford.edu/~apoorvad/#portfolio.
1. Number of social media users world-wide from 2010 to 2021: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
2. Microsoft Products Yammer: https://products.office.com/en-us/yammer/yammer-overview
3. TensorFlow — An open-source software library for Machine Intelligence: https://www.tensorflow.org/
4. The Microsoft Cognitive Toolkit: https://www.microsoft.com/en-us/cognitive-toolkit/