The dangers of assuming data are ‘open’ when using them to measure performance and success.

Published in

Open Knowledge in HE

5 min readMay 28, 2020

My job is almost, but not quite, unique. Across the UK, there are just five universities with an active post like mine, and six other universities who have run such a post in the past. So what is it, and why is it so rare? As an Innovations & Partnerships Fellow, I manage business engagement, research commercialisation and impact in the University of Manchester’s Physics & Astronomy Department. It is part-funded by public money (the Science & Technology Facilities Council), with the rest made up by the University. When the role was first created in 2014, it was on a trial basis. My contract was initially for one year, with an extension to a second year if successful. That I am still here, six years later, suggests the question of what success looks like is self-evident. However, it is incumbent on me to demonstrate success, and to do so I am reliant on various forms of open data.

Performance Metrics

I report my success using measures including:

research income (particularly from industry),
academic engagement with industry,
number of companies engaged,
number of patents & spin-outs generated.

To show progress, I can highlight historical trends, compare departments within my own institution and across different universities. And that is where the need for open data comes in. It is easy for me to collect data since my role began, but knowing what happened before, or what goes on in other departments or even other institutions, requires access to data that others have recorded. So how open are these data?

FAIR Data

If we consider the FAIR principles of open data (Findable, Accessible, Interoperable, and Reusable) the first challenge is in finding and accessing data on the above metrics. Internally, we use the Pure system to record all research activities, whether publicly or privately funded. By default, my access was restricted to data from my own department which is not very open at all. But it took only a simple request to the Pure administration team to have this access expanded to cover the whole Faculty of Science & Engineering. This allows me to review all research projects across nine departments going back over 20 years — ample data to make historical and inter-departmental comparisons. However, closer scrutiny of the dataset itself led me to a simple conclusion. The usefulness of the data depends on the person who recorded it; more specifically, the person’s knowledge of the relevant context. For example, we can label data on a research project with metadata to show that it is ‘knowledge exchange’ (or KE). However, the staff who input the data only include this flag for awards that pass through the University’s central KE team. This misses a vast number of projects, including industrial PhD studentships, awards made from research council ‘innovations and partnerships’ schemes, consultancy and projects fully funded by industry. Another way to filter records is to look only at those involving an industry partner. However, this is also unreliable. Many records only list companies who contribute cash to the research, missing those who are purely beneficiaries or making solely in-kind or intellectual contributions. Conversely, the system lists as separate entries some projects with multiple industrial funders, meaning a simple of a filter on external partners would lead to double counting. The above suggests Pure data are neither interoperable nor reusable. As I work to support setting up KE projects, I can identify which records to count, and which are duplicate entries. However, this is a time-consuming, manual process. As no equivalent position exists for other departments, comparisons across disciplines is impossible. To demonstrate these effects, I include here a plot I have collated from the data on Pure. This shows the historical trend in academic engagement in KE in Physics & Astronomy since 2008, comparing the figures returned under the application of the KE filter with my own assessment. The number of projects returned by the filter is, at most, 15% of that achieved in the manual sifting.

The situation in comparing different institutions is even harder. Data on research grants are available on the UKRI Gateway to Research portal, but there are also limitations on their usefulness. Filters exist on a range of different KE schemes including ‘Collaborative R&D’, ‘CR&D Bilateral’, but these require expert knowledge on their definitions for anyone to apply them effectively. Furthermore, by definition the site only includes UKRI-funded grants, and not any private agreements between HEIs and external partners.
Perhaps the most obvious place to look for open data on KE activities is the Higher Education Statistics Agency (HESA). Even here, data are limited. Research grant and contract data are either given as a total figure for all income, or are broken down by public body funder. The granularity of data does not go down to the level of grant type. The Higher Education Business and Community Interaction record (HE-BCI) is designed to measure ‘interactions between UK HEPs and business and the wider community’. However, this also has shortcomings in that it,

‘collects information regarding the whole HEP rather than any constituent team or function’;
Includes all activities intended to have ‘direct social benefits’, and therefore covers outreach and public engagement as well as business-focussed KE;
Relies on staff at institutions providing accurate ‘informative responses’ (i.e. has potential for human error, subjectivity and bias, leading to inconsistencies between institutions).

Conclusions & final thoughts

Accuracy of measurement is held in high regard in academia, particularly in scientific disciplines. It is dangerous, therefore, when we attempt to measure performance using data sets that aim to be open and complete, but fall short of FAIR principles. The Knowledge Exchange Framework (or KEF) could change this and provide a mechanism for institutions to compare KE activity consistently. However, given the commercial secrecy employed in REF submissions, it seems likely that KEF returns will feature bulk headline figures (number of partnerships, total industrial income, etc), rather than details of individual awards.

The dangers of assuming data are ‘open’ when using them to measure performance and success.

Written by Alick Deacon