Decoding Data: Navigating the Art and Science of Data Collection for DTBC’s Birmingham EdTech Project

Published in

Data, Tech & Black Communities

5 min readNov 15, 2023

Data collection is such a nuanced activity. It requires a methodical approach, but there’s also an art to it. It requires vision (based on a high-level objective), iterative execution, as well as artistic foresight to represent the collected data in the best way possible for those who need to make use of it.

Data, Tech, & Black Communities (DTBC) is a group of diverse Black/Black heritage people working together to ensure data and data-driven technologies (e.g. AI) enhance Black lives rather than curtail or surveil them. We are made up of leaders of grassroots organisations serving Black communities, activists, technologists and practitioners. Our interest in technology is broad, but we focus primarily on the uses of data and data-centric technologies in the spheres of health, education, crime & justice, and employment & enterprise in the UK. If you’d like to learn more about EdTech, I would highly recommend reading this article by DTBC. It does a great job explaining the landscape of EdTech adoption and some government policies and funding initiatives in UK schools.

Are we asking the right questions? — Why? What? Where?

Working as a Data Analyst on DTBC’s Birmingham-focused EdTech project, one of my tasks was to collate demographic data about Black and Black heritage children (BBH) in Birmingham. My approach to any data collation exercise is first to ponder the project’s objective. This guides me when deciding which data elements or features are most useful. So, in my mind map for this task, I asked myself three basic questions: why do I need to collect demographic data on BBH people in Birmingham? What can this information actually tell me (and the rest of the project team), and where can I access it?

‘Why’ is relatively straightforward to answer. DTBC’s EdTech project “aims to identify and review EdTech tools used in Birmingham schools, evaluate the claim that the technology enhances student performance and ensure that the technology does not curtail or surveil BBH pupils in Birmingham”. And why Birmingham? The short answer is that it was selected during the planning phase because we don’t believe London truly represents most BBH people's experiences in the UK. The population census data showed that 49.3% of all Black* people in the country live in London, followed by the West Midlands, where 11.2% live. Delving further into the data revealed that 155,559 BBH** people constitute 13.59% of Birmingham’s 1,144,922 population.

While London has the highest concentration of BBH, we believe it would be an unrepresentative picture of BBH people’s experiences nationwide — London is that much of an outlier. During the project planning phase, project stewards selected Birmingham because London and the South East enjoy a disproportionate amount of national resources, and they believed that it would be reflected in schools funding, too. To this end, they felt that schools in Birmingham are more representative of those in the rest of the country.

My analysis of Birmingham’s demographic data confirms their hypothesis. It balances a relatively representative sample with a deep culture of diversity and community. Birmingham has a diverse ethnic scene, with 51% of its population being Black, Asian and other minoritised ethnic groups. It is also the local authority with the largest concentration of BBH people in England and Wales, according to the ONS census data from 2021. Ultimately, this is why Birmingham is the ideal location to execute this study.

Deciding what data you need

Now that we understand the ‘why’, let’s move on to the ‘what’. What data is relevant, and what can it tell us? What level of detail is sufficient? These questions are just some of the many that relate directly to how we manage the data collection process. Answers to these questions are informed by our understanding of project objectives and how and where we try to source the information we need. This project is focused on EdTech and BBH children, so understanding the landscape in which the schools within our scope (primary and secondary schools in Birmingham) operate is pretty important. In practice, that means understanding things such as which schools have the highest concentration of BBH pupils, what areas these schools are located in, and what levels of deprivation are in these areas. The answers to these ‘what’ questions help to contextualise other ‘what’ questions, including the details of what sort of EdTech tools and applications are being used in these schools and what, if any, discernible impact they have had on BBH pupils’ attainment levels.

Finding the data you need

Now for the ‘where’ portion of the process. Obviously, when trying to find the data that allows us to answer the ‘what’ questions, open data is preferable. Still, that doesn’t mean that this data is easy to find or that it’s in a readily usable format. There’s also no guarantee that the data you need is collected at all! This latter point is made really well by people like Mimi Onuoha in her project, The Library of Missing Datasets and organisations like the Centre for Public Data through their Missing Numbers campaign project. This FT article also does an excellent job of highlighting the small ‘p’ politics that shape these data collection decisions and the very real impact it has on policy and scrutiny:

“Open data is what the government wants you to know. Freedom of information requests are for what they don’t want you to know. The things you can’t FOI because they don’t collect them are what they really don’t want you to know” — Anna Powell-Smith.

The Office of National Statistics (ONS) population census data is a well-known open and public dataset, so it was an obvious starting point for sourcing the demographic data we were interested in. However, real-world data is often messy and incomplete, so validating the data from one organisation against multiple sources is crucial to ensure completeness and accuracy. It was necessary to augment the ONS data with data sourced from the Department for Education (DfE), such as the School Census data. This dataset gave us the list of Birmingham schools by ward and their ethnic group breakdown. Meanwhile, datasets like the English Indices of Deprivation (IoD) 2019 contained socioeconomic information about deprivation in different so-called Lower Super Output Areas (LSOA). Drawing on the contents of this dataset, we were able to sort and categorise the schools based on deprivation levels. This helped us answer one of the ‘what’ questions by helping us evaluate the relationship between schools with high concentrations of BBH pupils and the level of deprivation in the area.

I hope this blog post has provided some context for data collection and insight into how to approach it. In a subsequent blog post, I will discuss the intricacies of data collection during the first phase of this project.

I would like to thank Ade, Tracey and Mwenza, the project stewards for the DTBCs Birmingham-focused EdTech project, for reviewing and editing this blog post.

N/B: BBH people are categorised by the ONS’ ethnic group classifications of Black, Black British, Black Welsh, Caribbean or African and Mixed or Multiple ethnic groups: White and Black African/Caribbean. In contrast, Black people constitute the former of the two groups.

Decoding Data: Navigating the Art and Science of Data Collection for DTBC’s Birmingham EdTech Project

Are we asking the right questions? — Why? What? Where?

Deciding what data you need

Finding the data you need

Written by Gabriel Olabode