Am I seriously asking this question?
The evidence on the business case for diversity appears to be overwhelming. A famous 2020 McKinsey report, entitled “Diversity Wins” concludes that “the business case for gender and ethnic diversity in top teams is stronger than ever”. A 2021 study commissioned (but not conducted) by the Financial Reporting Council, a UK regulator, finds that “Higher levels of gender diversity of FTSE 350 boards positively correlate with better future financial performance.”
Given these results, many people think asking the question is a waste of time. One Chair, interviewed in the FRC study, raged that “There have been enough reports … statistics and … evidence-based research to stop talking about it and get on with it.” Another viewed the evidence as so overwhelming that he tells executive search firms “I don’t want to see any men. I don’t care if they’re Jesus Christ. I don’t want to see them.”
But a new study has uncovered an inconvenient truth. It’s documented a strong negative relationship between gender diversity and shareholder value. The authors found that, over the last 20 years, the proportion of women on FTSE 100 boards improved by 26 percentage points, compared to 15 for the S&P 500. But total shareholder return for the FTSE 100 has been only +136% over the same period, versus 221% for the S&P 500. Commenting on the findings, the lead researcher said: “This is the clearest evidence yet that having more women on boards destroys shareholder value. Every additional 10 percentage points of female representation costs 77% in cumulative shareholder returns. Attaining gender parity on boards would reduce the value of UK companies by a cumulative £2.3 trillion.”
How do we react to that new study?
We immediately want to tear it apart, because we don’t like the findings. And, it’s not hard to invalidate it. There are dozens of other potential reasons for why the S&P 500 might have outperformed the FTSE 100, which have nothing to do with gender diversity — known as omitted variables. Yet for a report whose findings we like, we accept it uncritically, share it widely, and quote it enthusiastically. If it claims that diverse companies perform better, we conclude that diversity must be the reason for the superior performance, and forget about those pesky omitted variables.
This is an example of confirmation bias. We lap up any study confirming what we’d like to be true, and reject anything that contradicts it. Indeed, confirmation bias is deeply rooted within us. Three University of Southern California neuroscientists hooked up students to an MRI scanner, and found that when they heard evidence that contradicted their political beliefs, their amygdala was triggered. That’s the same part of the brain that’s activated when a tiger attacks you and induces a “fight-or-flight” response. People respond to opposition — ironically, to diverse viewpoints — as if they’ve been attacked by a tiger.
The new study is indeed garbage, and fortunately it’s a fictional one (I thank Tom Gosling for coming up with the parody). But we should apply the same discernment to all studies, including those whose conclusions we like. We often use the phrase “Research shows that …” to imply that something is gospel. But that research claims something is meaningless, because there’s huge variation in the quality of research. As a result, it’s nearly always possible to hand-pick a paper to show anything you’d like to show.
The McKinsey and FRC studies have been widely quoted despite being deeply flawed. For example, the McKinsey study has been shown to be irreplicable even with their chosen performance measure (EBIT) and preferred methodology. Moreover, there is no link between diversity and other performance measures — gross margin, return on assets, return on equity, sales growth, or total shareholder return — or when using more established methodologies (e.g. considering all the data, rather only the top and bottom quarter of diversity). Note that this study is on ethnic diversity, so I have a strong personal interest in its results being true, but they are not.
The FRC study, conducted by the London Business School Leadership Institute and the consultancy SQW, is even more problematic. Not only does it make basic methodological errors — for example, ignoring dividends when calculating shareholder returns — but the reporting of their results is disingenuous. The Executive Summary claims that “Higher levels of gender diversity of FTSE 350 boards positively correlate with better future financial performance (as measured by EBITDA margin).” But when you look at the actual results, they run 90 regressions relating diversity to EBITDA margin, of which not a single one is significant. The claim is strongly contradicted by the authors’ own analysis.
The main body claims that “These results suggest that gender-diverse boards are more effective than those without women” (emphasis in original). This claim comes just after the EBITDA margin and stock return results. There are 90 EBITDA tests of which 0 are significant, and 90 stock return tests of which 7 are significantly positive and 2 are significantly negative. Combining “these results” together, they find 7 positive and 2 negative results out of 180.
Not only do the authors claim a correlation when there is none, but they argue their study goes beyond correlation to document causation — that greater diversity causes better firm performance. They use the phrase “effect(s)/impact(s) of (gender) diversity” 38 times, and contrast their report with previous research that could not document causality. The foreword by the CEO of the FRC states “There has been a good deal of research about the business case for diversity. Often a correlation is found, but not necessarily full causality … I am pleased to see the analysis from this research builds the case for diversity across the board.” The Executive Summary claims “These results are significant because, for decades, researchers have largely failed to confirm any causal link.”
But despite the results being non-existent, many articles about the report (by organisations I greatly respect) believed the authors’ claim and popularised it. Examples follow below:
· “Diverse boards lead to better corporate culture and performance” (title of FRC’s press release announcing its report)
· “Boardroom diversity improves financial performance” (title of Minerva Analytics article)
· “The effort to diversify boards pays benefits in terms of boardroom culture and performance” (Linklaters)
· “The case for diverse boards was given further clout after new research published by the FRC supported the thesis that it leads to better corporate culture and performance” (opening sentence of ICAEW article)
The Delusion of Rigorous Research
Why have so many people taken the study’s claims at face value when they are clearly contradicted by the results? One reason could indeed be confirmation bias. A second could be the “delusion of rigorous research”. This is a phrase coined by Professor Phil Rosenzweig, in his excellent book The Halo Effect, to describe how authors over-exaggerate the sophistication of their methodology to brow-beat the reader into thinking that the study must be correct. Indeed, an unusual amount of space is devoted to describing what should be standard regressions, including going beyond the Hausman (1978) specification test (which is unnecessary in a standard regression to begin with) to using the Mundlak (1978) test. But, it doesn’t matter what specification test you use if you don’t include dividends in your measure of stock returns, or you misreport your results.
The authors also emphasise the rigour of their research. The foreword by the LBS Leadership Institute promises “The Leadership Institute at London Business School treated this opportunity accordingly, with the rigour and care of our best scholarly research.” The foreword by SQW argues: “The research design and analysis used in this report is both innovative and rigorous.” The foreword by the Dean of LBS claims “The academic rigour with which data was collected and analysed yields new insights on the impact of diversity and how to make diversity work. We all stand to learn from the authors’ methodology and findings.”
But we do not learn from a methodology that involves misreporting one’s results. With such strong claims of rigour, made so prominently by a reputed research institution, it is not surprising that people have accepted the conclusions uncritically.
To the authors’ credit, they do include caveats in the paper that not all their findings are statistically significant, and that some results are correlations rather than causation. However, buying gifts for your spouse on some days does not make up for mistreatment on other days. All claims quoted in this article are made in prominent places, such as the Foreword and Executive Summary, or in bold. Not every reader will have time to read the entire 132-page report and see the caveats, which is why articles by serious institutions took the claim at face value.
The Broader Literature
I have analysed one particular paper to highlight how easy it is to completely misrepresent findings on a topic that people feel strongly about. It is to my chagrin that the report was co-authored by an institute at my own employer, but the scientific process is about being equally discerning about all research, regardless of who wrote it. Just as diverse views by Eugene Fama and Richard Thaler at the University of Chicago have significantly illuminated the debate on whether markets are efficient or irrational, I hope that diverse views from LBS help shed light on the important topic of diversity.
But what does the broader literature find? The diversity movement should not be hamstrung by one flawed paper. The best way to find scientific consensus on any issue is to survey the literature, taking into account the quality of each paper rather than simply counting the number of papers that find a result. This is why it is particularly valuable for leading academics to conduct such surveys. Professor Katherine Klein of Wharton summarises the academic consensus in a non-technical article, concluding:
Research conducted by consulting firms and financial institutions is not as rigorous as peer-reviewed academic research. Here, I dig into the findings of rigorous, peer-reviewed studies of the relationship between board gender diversity and company performance. Spoiler alert: Rigorous, peer-reviewed studies suggest that companies do not perform better when they have women on the board. Nor do they perform worse. Depending on which meta-analysis you read, board gender diversity either has a very weak relationship with board performance or no relationship at all.
A newer survey by Professor Jesse Fried of Harvard, on Nasdaq’s recent diversity rules, concludes:
While Nasdaq claims these rules will benefit investors, the empirical evidence provides little support for the claim that gender or ethnic diversity in the boardroom increases shareholder value. In fact, rigorous scholarship — much of it by leading female economists — suggests that increasing board diversity can actually lead to lower share prices. Adoption of Nasdaq’s proposed rules would thus generate substantial risks for investors.
If the evidence is so weak, why might it be that claims of a “business case for diversity” are so widespread? As Professor Alice Eagly of Northwestern explained in her Presidential Address to the Society for the Psychological Study of Social Issues, entitled “When Passionate Advocates Meet Research on Diversity, Does the Honest Broker Stand a Chance?”:
From advocacy and policy perspectives, there is an obvious appeal in simple, straightforward claims that diversity in groups and organizations produces performance gains. Given this appeal, simplistic renditions of scientific findings on diversity continue to find favor among diversity’s advocates and the legions of practitioners and consultants engaged in helping organizations meet their diversity goals. Presented as if they were evidence-based findings, broad claims about the advantages of diversity for group and organizational performance appear regularly in promotional materials of consultants and advocates.
Is There A Business Case For Diversity and Inclusion?
So do the consistent scientific findings of no positive link between diversity and firm performance mean that we should stop trying to pursue diversity?
Absolutely not. The crux is that many of these studies don’t actually measure diversity. Most arguments for diversity argue that what matters is diversity of thought rather than demographic characteristics, since it’s the former that creates a breadth of perspectives and guards against groupthink. Indeed, a recent discussion paper by the Financial Conduct Authority (another UK regulator) and the Bank of England writes:
“With respect to diversity, we focus on ‘diversity of thought’, also called ‘cognitive diversity’. … We propose to define diversity of thought as bringing together a range of different styles of thinking among members of a group. Factors that could lead to diverse thinking could include, but not limited, to different perspectives, abilities, knowledge, attitudes, information styles, and demographic characteristics, or any combination of these.”
Thus, there may well be a business case for diversity, but existing studies haven’t found one due to blunt classifications based on only gender and ethnicity. This is a very narrow measure of true cognitive diversity, so these results have no implications for the value of diversity of thought.
Moreover, what matters isn’t just to recruit people with a broad set of characteristics (be they gender, ethnicity, or socioeconomic, regional, or experiential background) but making them feel valued and encouraging to contribute their diverse perspectives. This is why most initiatives stress the importance of diversity and inclusion. Not only do crude measures such as ethnicity statistics ignore the myriad of other components of diversity, but they completely fail to capture inclusion.
While I am unaware of any study that directly measures D&I and links it to firm performance, there is suggestive evidence that such a link may exist. One of my own studies shows that the 100 Best Companies to Work For in America delivered total shareholder returns that beat their peers by 2.3–3.8%/year over 1984–2011 (89–184% cumulative). While the Best Companies List measures employee satisfaction in general, rather than D&I in particular, several of the five dimensions it captures (credibility, fairness, respect, pride, and camaraderie) are linked to D&I.
Interestingly, this study was recently independently replicated by other researchers. They found that the results continue to hold for the original 1984–2011 period, even after controlling for new risk factors and firm characteristics discovered after my study. More importantly, they showed that the results also hold in the decade following its publication. This is surprising, because the returns to a trading strategy typically fall significantly after appearing in a scientific journal, because investors start exploiting the strategy. Given the emphasis on D&I (and ESG more broadly), it seems strange that the returns to the Best Companies remain strong. This may be because investors are focusing on blunt diversity metrics, given the claims of many studies, rather than true D&I.
Is There A Case For Diversity?
The misrepresentation of the business case for diversity is particularly disappointing since it may be that no business case is needed at all. Even without a business case for diversity, there are strong moral and ethical cases. Some people argue that you should choose the best person for the job, regardless of characteristics. However, others believe that, due to systemic and chronic discrimination against minorities, companies have a role to play in levelling up by actively recruiting under-represented groups. Perhaps doing so might not maximise profits, but many shareholders and stakeholders are willing to accept that trade-off — just as consumers buy organic food, despite its greater cost, due to non-financial considerations.
Moreover, study-based arguments for diversity are problematic because they relegate dimensions of diversity for which no study exists. I know of no rigorous evidence on the business case for hiring people with disabilities, but again there is a strong moral and ethical case. Making more money is not the only reason to pursue an initiative.
The Bigger Picture
Beyond the specific issue of whether there’s a business case for diversity, we can learn broader lessons on how to evaluate research from this example. Here are a few; for a more detailed discussion, please see my cut-out-and-keep guide on Evaluating Research.
1. Check whether a study has been published in a top peer-reviewed journal. The peer review process involves the world’s experts on a topic scrutinising a paper. This prevents methodological errors, such as omitting dividends when calculating stock returns, and claims not backed up by the results. (Neither the McKinsey nor LBS/SQW study were published in a peer-reviewed journal). There is a vast range in the stringency of reviewing standards, so journal quality is an important issue. The Financial Times has a list of the top 50 journals in business.
2. Look beyond headlines to the actual results. This involves not only checking that the results are what the headlines claim, but also how the authors measured the variables of interest. If the study concludes that X improves Y, verify how they measured X and Y. This is particularly important for papers on ESG topics — they’re hard to measure, so the value of a study hinges on how it measured the key variables. For example, claims to have found that corporate culture boosts firm performance should not be believed without first looking at how the study measured corporate culture.
3. Beware of “A study by X university”. Academic studies are often referenced as “A study by Oxford University” or similar wording. However, universities don’t release studies; people do. A company only publishes a study after substantial internal review, so it indeed makes sense to refer to a “McKinsey study” (however, it still won’t have been externally peer reviewed). In contrast, anyone at a university can release a study themselves without any internal approval. Sometimes the authors may not be scientific researchers (e.g. professors or associate/assistant professors) but adjunct lecturers whose main role is teaching. This doesn’t mean the research is definitely wrong, but does mean that “A study by X university” shouldn’t be automatically afforded the mantle of academic rigour. Several people who believed the headlines of the LBS/SQW study told me that, since it was an “LBS study”, they thought they could trust it. What’s more relevant is whether the study has been published in a top peer-reviewed journal.
4. Beware confirmation bias. While doing the above checks should not be laborious, they still take some time. It’s not feasible to do them for every study we read. It’s more important to be sceptical when we’d like the results to be true (as our own confirmation bias might be kicking in) or the conclusions go with public opinion (as the authors may have been tempted to draw these conclusions to make their study popular).
5. Beware researcher bias. Although it may not seem like it, researchers are humans too. They are subject to the same biases as the rest of us. They have strong incentives to claim results that people want to be true so that their study will be widely cited and shared. Such a bias is particularly problematic when the research is used as a guide to policy. The Financial Reporting Council should be lauded for commissioning academics to conduct a study on the very important topic of diversity (rather than, say, relying exclusively on internal analysis). However, giving the LBS Leadership Institute, SQW, and the Dean of LBS each an opportunity to write three separate forewords increased the incentive to claim eye-catching conclusions. In contrast, when the Department of Business, Energy, and Industrial Strategy commissioned PwC and me to conduct studies into the alleged misuse of share buybacks, and the potential distortive effect of executive pay on investment, we were only included in the acknowledgements each time. This reduced any incentive to misrepresent the results, and is good practice that should be followed for a study commissioned by policymakers. On the other hand, last year’s study conducted by EY for the European Commission on Sustainable Corporate Governance had EY’s logo on the front cover, and has been widely criticised for containing fundamental flaws.