The Certain Uncertainty of University Rankings

Richard Harris
12 min readJun 21, 2019

--

(Draft version — a work in progress. Last updated June 24)

Introduction

Soon after finishing my PhD, I wrote a book on geodemographics, which was what I had studied in my thesis. Geodemographic classifications work on the idea that you can type in a postcode and out will come a classification of the neighbourhood in which you love — a neighbourhood described as “Townhouse cosmopolitans”, for example (see the ACORN structure). As someone who most readily identifies as a critical realist and also as a quantitative geographer, these classifications fascinate me: they are not ‘made-up’ — at least, not in the sense that they are based on measured characteristics of people and places — yet they also are clearly subjective and partial: the end-product of many analytical decisions, each of which could have been made differently and each of which could have produced a different representation of what is sought to be measured.

League tables of University Rankings provoke the same curiosity within me. In fact, they have a lot in common. Both begin with a concept that sounds like it ought to be measurable but is, in practice, amorphous and ambiguous: the idea of a neighbourhood type for geodemographics; the idea of ‘quality’ or ‘excellence’ for a University. Both need to gather data to try and measure that concept: census or consumer data for geodemographics; National Student Survey data, citations scores, research income, …, for Universities; both need to take all these data and somehow reduce them to a single measurement: a category type for geodemographics; an overall score for Universities. And, finally, both are sold as products to make money for their developers (for example, selling the data through consultancy with relevant businesses).

The process of taking a concept or theory, seeking to make it measurable, collecting the data to do so, and then looking at the results is, of course, not unique to these businesses; it is the heart of what I do as a quantitative social scientist. However, social scientists are taught to be critical and to be open: when I look at measuring ethnic segregation, for example, I am aware that there are many ways that the idea of segregation can be conceptualised and therefore many ways it can be measured, and that, even if we agree on what measurement is appropriate for a particular context and study, measures of ethnicity, for example (e.g. White British, Asian Bangladeshi, etc.) are problematic in terms of who and what they include, and who and what they do not. Consequently, I do not pretend that my results are definitive. Instead, they are contributing to debate and are, quite rightly, open to critique from others who can identify where the analysis is lacking, and the consequences of them being so— not only for the analysis but more particularly for the people that the data represent . I hope my measurements are informative and contribute to social debate but I do not claim certitude.

The nature of uncertainty

The idea of uncertainty is made explicit in statistics through ideas such as confidence intervals. When, for example, a polling firm surveys current support for Brexit from a sample of the population (approximately 42% at the time of writing) it does not just give the ‘point estimate’ (here, the value of 42%), it also gives an interval, within which the ‘true value’ is likely to lie (typically 2 or 3 per cent either side of the point estimate for electoral surveys). That interval recognises that the data are uncertain; in this example, it is because those asked are only a sample of all those who could have been. Uncertainty can creep into data in other ways too — through the wording of a question, for example, or the vagueness of what is measured; or, because the process of taking something that itself is not clearly defined, somehow measuring it and then producing a summary score is inherently subjective, bordering on arbitrary. For the sort of ‘World University Rankings’ that are produced by THES and others such as the QS World University Rankings the greatest source of uncertainty is that there is no agreement on what is being measured or how it should be measured so what results is ultimately a matter of choices that could have been made differently.

Such rankings are not without meaning. They clearly do measure something about Universities. Taking the THES’s World University Rankings as an example, their methodology shows that they are using data about Teaching, Research, Citations, International Outlook and Industry Outcome. In many respects, it would be fine if they stopped there, simply making those data available for others to use and interpret as they wished. It would not resolve the problem that those data are themselves uncertain — the THES data include, for example, an annual Academic Reputation survey, which, being a survey, will contain sampling error (and what exactly is academic reputation, anyway? Is it not somewhat circular: Oxbridge has a good reputation because it has a good reputation?) — but at least the users could be mindful of such considerations. Unfortunately, they do not stop there. Instead they use the data to produce “The Times Higher Education World University Rankings 2019 [of] more than 1,250 universities, making it our biggest international league table to date.”

You cannot rank multidimensional data!

As my friend and Professor of Geocomputation, Chris Brunsdon sometimes reminds people, you cannot rank multidimensional data (i.e. a data set containing more than one variable). However, that does not stop people from trying! The best you can do is use what will always be an arbitrary function to translate it (combine it) into a single score. The easiest way to do this is to add the variables together but to do so assumes that each should count equally in your measurement of the University. The THES does not do this, instead it gives a weight of 30% to its Teaching score, 30% to Research, 30% to Citations, 7.5% to International Outlook and 2.5% to Industry Income. The obvious question to ask is, who decided upon these weights and what happens if you change them? The obvious answer, well demonstrated by Chris Brink in this book The Soul of a University, is that the overall score will change and so, potentially, will the ‘league table’ of University Rankings.

A thought experiment

Imagine you gathered 10,000 people in a room. Call them experts, if you like. All broadly agree with the THES weightings described above but not exactly so. Instead, there is random variation from person to person. The question is, allowing for this variation, would it much affect the 2019 THES rankings of Universities, using the data published (to their credit) on their website?

Yes, as shown below. Along the x-axis I have shown each University’s actual published rank position, and along the y-axis the range of ranks that are produced under the thought exercise but excluding the most extreme outcomes (the most unusual 5 per cent of cases). Only the first 250 Universities are shown — although the THES provides scores for far more, it seems to tacitly acknowledge that the rankings become unstable beyond 250 by thereafter grouping the rankings (if so, they would be right, they do).

What the chart is showing is that the very highest ranked Universities generally remain so even with some random changes to the weightings. This is not surprising: ‘the best’ will be all-rounders, scoring highly on each data metric and therefore on the overall scores but even for them there is some change in position. The further down the rankings we go, the more unstable they become: the average range is about 26 places. For my institution, Bristol, for example, where it is currently ranked 78th, the chart implies that it could reasonably be considered to rank anywhere between about 68th and 94th. Bear in mind that these are under-estimates of the actual uncertainty of the data and of the ranked scores because they take for granted that the THES’s weightings are correct overall and that there are not errors in the data or data processing that would add to the uncertainty. (Note also that these are not confidence intervals in a classic sense; they are showing the effects of random re-weighting of the variables around the THES’s own weighting rather than of sampling).

We can have more fun by asking what if those 10,000 experts don’t agree at all. The answer is below. The rankings are now all over the place! That is not at all surprising and no doubt many of these imagined rankings arise from particular combinations of variable weighting that most would consider absurd. However, it also makes an important point: the actual, published rankings are only certain if everyone agrees on the choice of variables used, the quality of the data, and the weights attached to them. If they do not, then lots of rankings are possible — many more than are shown here — each of which can raise or lower a University’s position and with little concrete basis to say that this is or isn’t the University’s correct ranked position.

A call to debate

As a quantitative geographer, I value data and the information and knowledge that can be drawn from them even if that knowledge is subjective (as, I suspect, it always will be). Whilst I share the concerns of other who see the public role of Universities as a place for critical enquiry and knowledge formation being eroded by a managerial culture that sometimes seems more concerned with dubious metrics of outcomes rather than more rounded understandings of what Universities should be, I am not necessarily against the use of data to evaluate strengths and weaknesses or to make comparisons between institutions. I am, however, concerned about poor comparisons and the University equivalent of ‘teaching to the test’ — trying to game the system to maximise a ranking that has no widespread agreement of its worth.

There is a clear parallel between these league tables and those used elsewhere, including for schools. And the same sorts of criticisms apply. It is notable that for schools there has been a move away from league table like data to instead provide a range of information that can therefore be looked-at in the round. School data, is however, provided by the DfE and that is accountable to the taxpayer in a way that publishers of University Rankings are not. And returning to geodemographics, there has been a move to make these formerly black box classifications more open.

Yesterday, I made a call to the THES to convene a free conference or workshop, drawing together a range of practitioners and critics to discuss these types of rankings and their impact — both good and bad — upon people and places. For me, this is something of a litmus test: if the THES and others believe that these sorts of rankings are useful and beneficial to the sector then let’s talk about it. Let’s be open on all sides to listening. They can contact me at rich.harris@bris.ac.uk

Touching a nerve

Since writing the above (originally in a slightly more strident form, for which I am happy to apologise), I have been contacted on Twitter and by email by one of the THES' journalists. I am grateful to them for replying although it is clear that I touched a nerve, as their response was somewhat personal — criticising me a person rather than engaging more fully with the actual topic.

To be fair, perhaps I was being impatient in how quickly they might respond and I do understand why they might feel got at and under the spotlight on Twitter. However, rankings are something that are imposed on people by outside agencies that have decided that they know what it means to define the best Univerisites. It is hardly surprising that those on the receiving end of these decisions can feel aggrieved when they have not been consulted about them; and when the effects of those rankings are viewed as problematic, it is no wonder that people respond critically.

The idea of a free (or not for profit) conference is important because it is the free transfer and critique of knowledge that characterises academia, and is why much of the work academics do (in the form of peer review or organising conferences, for example) is done for free. More critically, many academics have little money to travel and often end up paying for conferences from their own salaries even though it should reasonably be regarded as part of the job. As soon as a fee is imposed, it immediate creates issues of who is in the priveldiged position of being able to attend.

It was suggested that if I set out all the costings of a conference then the THES might, perhaps, consider it. That is a better offer than none but it does push to one side any leadership that they (and others) should have in developing understanding of the consequences (either good or bad) of the sorts of rankings they promote. It is worth keeping in mind that what these rankings do is make money from the labour of others — it is academics whose outputs are measured and capitalised in the form of selling these rankings, and it is academics who are affected by whatever changes to or cultures within Universities that the rankings affect and/or create. It would be somewhat perverse if those who did not ask for these rankings are asked to pay and/or to do the groundwork to better understand their consequences. That seems an oddly inverted view of where the responsibility should lie. And if not a conference, perhaps a special issue of the THES? I would be happy to (freely) offer suggestions for fair-minded contributors to the debate.

The problem, I suspect, is two-fold. First, the THES and others have a business model that needs these rankings and therefore it is not in their financial interest to critique them. They may well be in the position of declining print sales with the recogntion that selling data and consultancy provides the greater revenue stream stream even if it risks harming the very same sector that they are selling services about. Still, there is no reason to vilify them or to troll them on Twitter. That was by no means my intention and I apologise if it was the impression I gave. Actually, I think it has the potential and influence to guide and shape something beneficial to the academic community but it needs to be engaged and listening at the grassroots and prepared to challenge the culture of managerialism and data metrics that it, itself, is a part of.

Second, a gap and distrust has emerged between those on the ground who are affected by these types of rankings (in terms of how shape the current and future direction of Universities and the working practices and expectations upon staff), and those who promote and use them but who are not actually the ones at the coal-face of research and teaching. The THES is not responsible for this gap but it does have an opportunity to help close it.

All of this comes down to saying that dialogue and conversation are important. No act of quantification is neutral. It will empower some but not others. It will reflect particular values and objectives that are shared by some but also rejected by others. It will have pros as well as cons. At a time of disquiet within Universities and amongst University staff that is set within a wider debate about who and what Universities are for, to open a conversation about what is or is not measurable in terms of the characteristics of ‘the best’ Universities, and about how those measures impact upon people and places, is essential.

There is no reason for and I hope the providers of these rankings will not be defensive against all this. Instead, they have a great opportunity to be a part of the debate. One way forward, for example, would be to develop a code of conduct, setting out good practice when it comes to the collection, communication and comparability of University data. At a minimum, I would suggest that it involves being upfront about the uncertainty of rankings (a simple change is to provide an interval for each institution: an upper and lower bound rather than a precise rank; and/or to allow the data to be sorted by the user’s preferred criteria), works harder to measure real differences between Universities, and is far more consultative about what is measured, why and for whom.

Footnote

Encouraged by the response to this article (from the academic community, at least) I have had a go a writing some draft framing principles for good practice in the creation and application of University Rankings. It is intended as a collaborative document to which others can contribute: you can do so at bit.ly/2Rv4W0i . The INORMS Research Evaluation Working Group has also been consulting on a list of criteria for fair and responsible university rankings. See: https://inorms.net/wp-content/uploads/2019/05/fair-ranking-consultation-text-1.pdf

--

--