Increasing the representation of women in FT journalism

Stephanie Pieri
FT Product & Technology
3 min readNov 16, 2021

The FT focuses on two of the most widely covered subjects in news: business and economic affairs. These same subjects are also where women are the least visible. Shockingly, women are only 16% likely to be used as sources in news on politics and government, with the figure rising to a still rather measly 21% in business and economy. A study spanning 20 years and including 114 countries found that women comprise only 24% of the people we read, see and hear in all news coverage.

More detailed statistics for changes in representation over time, from 18% in 2000 to 24% in 2015
UN Women, Source: The Global Media Monitoring Project

Interestingly, this figure is comparable with the percentage of FT readers who are women, which currently sits at a low 18%. Underrepresentation of women in journalism has been shown to lead to disengaged women readers, as the content can be irrelevant to their experiences and circumstances. A failure to include a wide range of diverse perspectives, especially from women who make up half of society, can result in an inaccurate and incomplete narrative.

One initiative looking to tackle this particular inequality is the BBC’s 50:50 Project, which “is committed to supporting organisations to consistently create journalism and media content that has an equal representation of our world”. The project has expanded beyond the BBC, with more than 100 organisations in 26 countries, including the FT, now taking part. The current focus is to increase representation of women in journalism and media content, but there are future plans to apply the same methodology to increase representation of other marginalised groups.

Ros Atkins, BBC News presenter and 50:50 founder, encourages self-monitoring, where journalists are responsible for tracking the gender ratio of sources used in their reports. This data-driven methodology not only collates and exposes the data, but gives journalists an awareness of any glaring disparities which they can then work to improve. This behaviour change would inevitably drive a shift towards a culture in which seeking fair representation is deeply rooted in the production process of a piece of journalism, as opposed to it being an afterthought.

At the FT, we decided to embed the gender ratio within the editing pane of Spark, our homegrown CMS (Content Management System), which is used by journalists to write, edit and publish content. The ratio would update as changes are made to an article. We wanted to draw journalists’ attention to imbalances and give them an opportunity to immediately rectify them. We had a solution in mind, with just one remaining question. How do we do it?

Research began on the accuracy and reliability of a variety of open-source tools that attempt to decipher the gender of sources used within a block of text. Eventually, after much deliberation, we decided to opt for GenderMeme, which had been used in a previous FT project. GenderMeme uses CoreNLP, which itself uses statistical, deep-learning and rule-based Natural Language Processing (NLP). Text is passed to CoreNLP, in which parsing, part-of-speech tagging, named-entity-recognition, conference resolution and quote detection are executed. On identifying a quote, contextual information is then analysed (pronouns, honorific, etc), and the technology attempts to arrive at an accurate gender, applying it to the source.

GenderMeme has impressive accuracy in terms of classifying a source as ‘man’ or ‘woman’, but what about those who identify as non-binary? We wanted to ensure the ratio was truly representative, and that we were able to override any incorrect classifications. We decided to add functionality where journalists could manually ‘re-classify’ the gender of a source, and our ratio went from woman:man to woman:man:non-binary:unknown.

Performance, unsurprisingly, was our biggest issue: initially, CoreNLP took between 1–2 minutes to return classifications when analysing a small paragraph. We removed things from our version of CoreNLP very meticulously, as if playing a game of Operation, and reduced the time to 1–10 seconds. This, alongside adding a load balancer with auto scaling for when memory and CPU limits were reached, gave us a version we could work with. The feature was then deployed to a pilot group.

During the last six months, 991 quotes have been analysed. The collective data has shown only 231 (23%) of the sources were women. As the data compiles over time, we’ll be able to see if this figure increases, giving greater clarity on the following: will making a gender imbalance visible prove to be an engine of change? I look forward to finding out.

--

--