3 Pitfalls of Predictive People Analytics

Ansaro
Ansaro Blog
Published in
5 min readSep 4, 2017

Our first move when starting Ansaro was to interview over 100 executives about their people analytics experiences. Most came from organizations that collected rich workforce data. Most came from organizations that had declared a deep commitment to using analytics to make a bottom-line impact. And yet, while some of these early efforts were dazzling successes, many ended in failure. This post is our attempt to synthesize 3 themes we heard time and time again about why people analytics efforts fail — and what can be done to avoid that.

Predicting human behavior is tough. Predicting human behavior in the workplace is even tougher. It’s not something that can be accomplished overnight. It’s critical to define the problem, set realistic expectations, and have a plan for turning data science into behavioral change. We’ve seen a lot of projects that were doomed from the start because they failed to get these launch conditions right.

But when predictive people analytics is executed well, the payoff is huge. It goes far beyond dashboards and reports. The most successful people analytics projects provide tools that help employees make better day-to-day decisions. Better decisions around hiring, coaching, promotion, compensation, staffing… and the list goes on. And the results of these efforts are felt both in top-line growth and bottom-line profitability.

Pitfall 1 — Predicting the wrong outcomes

The single most glaring pitfall for a predictive people analytics effort is to focus on predicting outcomes that are easily measured — but which are not actually meaningful.

Performance ratings are the most common example of an easy-to-use-but-hard-to-trust measure. Many organizations suffer from the Lake Wobegon effect — where most the employees are rated above average (social scientists refer to this as the illusory superiority bias). Employees at these organizations know that when four out of five people are rated “top bucket”, being top bucket doesn’t mean much. An accurate predictive algorithm will predict four of five employees will be top bucket — predictions that are meaningless.

Organizations that start to build analytical tools on top of data that their employees don’t trust are building on quicksand. This isn’t just a problem with performance reviews, which are notoriously subjective. Often data that appears to be objective at first glance turns out to have a lot of subjective aspect upon further inspection. Take quota achievement for sales organizations. Many account executives argue that factors like territory size, lead quality, and pricing discretion mean their quota can’t be compared to anyone else’s.

What’s an organization to do? First, focus on outcomes that are more objective, even if they are recorded in systems that are more difficult to access. But don’t waste time trying to figure out the perfectly objective outcome, because that doesn’t exist. Second, use multiple outcomes, rather than just focusing on a single outcome. Once again, that often involves more analytical and data engineering work, but it allows you to understand tradeoffs across metrics — because there is no perfect single metric for measuring any workplace outcomes. Third, normalize outcomes for the subjective factors against which you can control. For example, if you’re predicting quota achievement, normalizing by territory size will increase the reliability of the predictive algorithm.

Pitfall 2 — Working in silos

In most enterprises, workforce data is scattered across systems:

  • Pre-hire data in an ATS (and plug-ins like assessment and work sample systems)
  • Direct productivity data in an ERP and CRM
  • Tenure and position data in an HRIS
  • Salary and benefits data in a Compensation/Payroll system
  • Performance reviews in a Performance Management system
  • Skill and certification data in an LMS
  • … and many others

Most people analytics initiatives start and end within just one system. That’s the easiest approach, but the least fruitful in the long-term. To figure out how to drive better workforce outcomes almost always requires crossing silos:

  • To make better hiring decisions, we need to understand how tenure and productivity are associated with pre-hire experiences and attributes — so we should link ATS, ERP, and HRIS data.
  • To make better compensation decisions, we need to understand how pay changes are associated with retention and performance — so we should link compensation data with HRIS and Performance Review data.

The technical challenges in bridging data across systems are real. Much of the time, there is no unique identifier for an individual’s data that persists across systems. And for large enterprises where there may be 10 people with the same name, there isn’t an obvious alternative to an ID number.

But with a bit of work “fuzzy matching” using multiples variables — name, location, position — can allow linking of employees across systems with high (but not perfect) confidence. As long as we’re building predictive systems, which are inherently probabilistic, using reliably, but imperfectly, linked data can work just fine.

We’ve seen these linkage efforts pay big dividends in practice:

  • By mapping tenure to job applicant data, a US bank doubled their ability to predict which job applicants would stay over a year in high-turnover roles.
  • By connecting position and calendar meta-data, a global beverage company quickly identified multiple quick wins to improve their meeting efficiency and decision making process.
  • By joining performance/potential rating, pay, promotion and external compensation benchmarking data, a European investment bank was able to use pay raises to measurably improve retention for the first time.

Pitfall 3 — Only thinking about rows and columns

Most BI and visualization tools only work with data that is already formatted in rows and columns (what we’ll refer to as “structured” data.) As a result, many managers unconsciously restrain themselves to thinking about structured data when they think about what can be used to predict workforce outcomes. But there is a lot of valuable unstructured data that goes beyond what can be sliced-and-diced in Excel.

  • Cover letters and writing samples: natural language processing (NLP) can be used to identify common themes and assess writing style across thousands of documents, allowing enterprises to understand what topics and styles are correlated with outcomes.
  • Hiring manager notes: NLP can be used to identify discrepancies in decision-making (“Candidate X seems dishonest” and “I recommend hiring Candidate X”), which can help companies figure out who their best interviewers are and why they outperform.
  • Communications and calendar meta-data: social network analysis can help understand the strength of employees ties within a company based on who they meet and communicate with — and changes in employees’ networks have direct implications for productivity and retention.

A few years ago, the tools and expertise to tap these data types were rare. Today, there are a variety of tools and frameworks, many of them free and/or open source, that can be used to manipulate and analyze unstructured data at scale.

All of this comes back to one idea: it pays to put in the effort to avoid these pitfalls when building predictive tools for people analytics.

--

--