Using digital data to shed light on team satisfaction and other questions about large organizations
For several decades sociologists have speculated that the performance of firms and other organizations depends as much on the networks of information flow between employees as on the formal structure of the organization [1, 2].
This argument makes intuitive sense, but until recently it has been extremely difficult to test using data. Historically, employee data has been collected mostly in the form of surveys, which are still the gold standard for assessing opinions, but reveal little about behavior such as who talks to whom. Surveys are also expensive and time consuming to conduct, hence they are unsuitable for frequent and comprehensive snapshots of the state of a large organization.
Thanks to the growing ubiquity of productivity software, however, this picture is beginning to change. Email logs, web-based calendars, and co-authorship of online documents all generate digital traces that can be used as proxies for social networks and their associated information flows. In turn, these network and activity data have the potential to shed new light on old questions about the performance of teams, divisions, and even entire organizations.
Recognizing this opportunity, my colleagues Jake Hofman, Christian Perez, Justin Rao, Amit Sharma, Hanna Wallach, and I — in collaboration with Office 365 and Microsoft’s HR Business Insights unit — have embarked on a long-term project: the Organizational Spectroscope.
The Organizational Spectroscope combines digital communication data, such as email metadata (e.g., time stamps and headers), with more traditional data sources, such as job titles, office locations, and employee satisfaction surveys. These data sources are combined only in ways that respect privacy and ethical considerations. We then use a variety of statistical modeling techniques to predict and explain outcomes of interest to employees, HR, and management.
Predicting team satisfaction
To illustrate the potential of these new data and methods, we analyzed the aggregate email activity patterns of teams of US-based Microsoft employees to predict their responses to an annual employee satisfaction survey. To protect individual employee privacy only email metadata was used (i.e., no content) and all identifiers were encrypted. Email activity and survey responses were aggregated to the manager level, where only managers with at least five direct reports were included, and only these aggregated results were analyzed. Our predictions therefore apply only to teams of employees who share the same manager, not to individuals.
We focused on three survey questions: did teams have confidence in the overall effectiveness of their managers, did they think that different groups across the company collaborated effectively, and were they satisfied with their own work — life balance?
We started by examining the data and found that that the vast majority of teams were pretty happy. Although this result is encouraging, as a practical matter HR managers are less interested in the large majority of happy teams than in identifying the small minority of unhappy teams. After all, it is the latter group on which HR needs to focus its resources. Rather than trying to predict the satisfaction level of every team, therefore, we focused on predicting just the teams in the bottom 15% — i.e., the least satisfied teams.
We considered two statistical models: first, a simple linear logistic regression model of the type that is widely used in quantitative social science; and second, a more complicated model from machine learning called a random forest . Although random forests are generally less interpretable than standard regression models, making them less well suited to the explanation tasks typically found in the social sciences, they can capture nonlinearities and heterogeneous effects that linear models ignore and therefore often perform better at prediction tasks.
In our case the random forest performed much better: if it predicted that a team was in the bottom 15%, it was correct (across all three questions) between 80% and 93% of the time; in contrast, the linear model was correct at best 27% of the time. Critically, a ‘‘baseline’’ model that used only data on respondents’ position and level in the company — i.e., no email activity features — performed between 20 and 40 percentage points worse. In other words, the email activity data added large and significant value over and above the kind of data that HR managers already have (see Table 1).
Table 1. Precision of the random forest model (column 3) compared with a standard logistic model (column 1). Column 2 is for a random forest model that does not include email activity, but does include other features such as job category (engineer, product manager, sales, etc.) and level.
Table 1 also shows the particular features of email activity that were most predictive of low satisfaction. For work — life balance, it was the fraction of emails sent out of working hours: more is worse. For managerial satisfaction, it was manager response time: slower is worse. And for perceptions of company-wide collaboration, it was the size of the manager’s email network: smaller is worse.
At first glance these findings may seem unsurprising, but this reaction misses the point. To see why, consider work — life balance. Although it makes sense that sending an unusual volume of email outside of normal working hours would correspond to low satisfaction, it would have made equal sense that satisfaction was related to the overall volume of email sent or received, or to the relative distribution of email over days of the week. But none of these other factors were useful for predicting low satisfaction. The point, therefore, is not so much about finding results that are surprising and counterintuitive, but about ruling out all the plausible, intuitive explanations that are not in fact correct.
Another non-obvious finding is that different types of teams had different thresholds for what counted as a “bad” volume of out-of-hours email. The number of out-of-hours emails that predicted an unhappy sales team, for example, was different from that of an unhappy engineering team. Again, this result isn’t surprising (once you know it), but it would have been difficult to guess in advance. This result also highlights the advantages of using a complicated model over a simple one: although in general we believe that, all else equal, simple models are better, when effects are highly context-dependent, complex models can shine.
Lessons for managers and for science
Insights like these are of immediate interest to both employees and managers. In particular, because predictions based on email sending behavior can be made in real time, HR can obtain more timely feedback than surveys allow. Moreover, modern statistical modeling approaches such as ours can help managers in complex situations where many different factors could be at play — e.g., by showing which of many plausible explanations are supported by the evidence, and by cautioning against “one size fits all” solutions. Finally, employees could also benefit from tools that highlight help them quantify their work activity in the same way that personal fitness trackers help them quantify physical activity.
More generally, our results show how the combination of novel sources of digital data and modern machine learning methods — that is, computational social science — can yield insights that would not be available with traditional data sources and methods. Over time, we hope to expand this approach from the specific case of predicting team satisfaction to a much wider range of questions regarding teams, divisions, and even entire organizations.
Finally, it is worth emphasizing that deriving these kinds of insights requires a lot of care. To perform our analysis, we combined three datasets — email activity, the org chart, and poll results— that were collected in different ways at different times by different people. Joining these data sets in a manner that respected privacy and ethical concerns required significant effort and cooperation across teams, which in turn required us to clearly specify, and justify, our substantive research questions and goals. Likewise the realization that we needed to focus on only the least satisfied teams required us to think carefully about the structure of the data and about our research questions. For all the excitement about “big data,” in other words, computational social science works well only when powerful computation is matched with careful social science.
1. Burns, T. and G.M. Stalker, The management of innovation. 1961, London: Tavistock Publications. v, 269.
2. Lawrence, P.R. and J.W. Lorsch, Organization and environment; managing differentiation and integration. 1967, Boston: Division of Research Graduate School of Business Administration Harvard University. xv, 279.
3. Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5–32.