Blog series: More policies and initiatives need to support responsible AI practices in the media

Published in

AI Media Observatory

12 min read5 days ago

Anton Grabolle / Better Images of AI / AI Architecture / CC-BY 4.0

Technology-focused regulations often have a preventative role — limiting or prohibiting unwanted behaviours to prevent societal harm — this is also the case with AI regulation. However, it is equally important to facilitate and stimulate responsible research and development of AI — a role the EU is beginning to fulfil via a range of initiatives, but remains limited, particularly for media organisations.

Authors: Rasa Bocyte, Netherlands Institute for Sound & Vision, Noémie Krack and Lidia Dutkiewicz, CiTiP KU Leuven and Anna Schjøtt Hansen, University of Amsterdam. Thanks to Natali Helberger, University of Amsterdam, for reviewing and providing excellent feedback.

This article is the second part of a blog series discussing how the AI Act and the Digital Services Act (DSA) address the needs of the media sector. You can find the other blog post under the AI Media Observatory.

When regulating technologies, legislators have different levers — they can regulate how the technology can be used, but they can also be a “facilitator of markets, a buyer of resources, and a producer of capabilities” (Ferrari, 2023).

These alternative roles of legislators as ‘stimulators’ will be crucial to support responsible research and development of AI Research has illustrated how the precarious innovation funding in the media sector creates problematic dependencies to big tech leading to self-censoring and does not offer long-term sustainability for the developed projects.

In the last few years, researchers and actors engaged in what can be called ‘algorithmic accountability reporting’ have also increasingly experienced significant limitations in conducting research on societally important AI systems, such as those that drive platforms like Instagram and Facebook. Algorithmic accountability reporters, such as AlgorithmWatch have even experienced threats and faced lawsuits because of investigations conducted on these platforms. This threatens both academic freedom and the ability of journalism to act as a democratic watchdog and hold big tech accountable.

In the first part of this blog series, we dived into the EU regulation of AI transparency and discussed whether the new digital legislation sufficiently supports the development of good practices around it. Now we turn to questions about how the emerging EU legislation manages to facilitate both responsible AI research and development.

Supporting research and stimulating responsible development
The need for policies that support AI researchers and industry in their efforts to both develop new AI solutions and to hold accountable actors who deploy AI systems across societal sectors, were considered key amongst the stakeholders who participated in six workshops organised by AI4Media.

Concretely, we found that there was a need for policies that would support:

Access to datasets and system APIs (Application Programming Interfaces) for research and investigative purposes
The production of open and shareable datasets
Long-term and sustainable funding schemes (both for research and industry innovation)
Organisational upskilling and guidelines on responsible practices
Public-private collaborations

While the EU has taken important steps to address these needs, among others in the DSA, we show how the current legislation fails to create meaningful access to datasets and system APIs for AI research and accountability reporting. Furthermore, there are only very few initiatives that begin to address the need for open and shareable datasets and sustainable funding — and some do not yet apply to the media sector.

Access to datasets and system APIs for research and investigative purposes

Access to datasets and system APIs is essential for investigative research by media organisations, as it enables journalists to hold platforms accountable for the workings of their systems and showcase potential harms caused by the systems. The TikTok investigation by the Wall Street Journal showed how the algorithm would surface highly problematic content to teens. See also a full report by Within. Similarly, the Facebook files, based on internal Facebook documents, research reports, and drafts of presentations — have been a central source of knowledge into the platform’s problematic effects on society.

Equally, access to, for example, social media data can be tremendously important for media organisations that are developing their own AI solutions — particularly in countries with smaller languages where it can be difficult to gain access to large quantities of data in the local language. This particular aspect was addressed in a workshop on the practical challenges of using AI for content moderation.

This particular need has been addressed in Article 40 of the DSA (see note 1), which now requires Very Large Online Platforms (VLOPS) and Very Large Online Search Engines (VLOSEs) (see note 2) to give access to their datasets and APIs for research purposes.

While it’s a great step forward, the well-intended provision has received some criticism.

Three central points of criticism

First, it requires that researchers are ‘vetted’ meaning they have to be affiliated to a research organisation. Although Article 40(12) also mentions “those affiliated to not-for-profit bodies, organisations and associations”, the vetting criteria risk leaving out researchers working in the media sector, for example, investigative journalists.

Since the European Commission Delegated Act has yet to be published in its final form, and much feedback was received there is hope that the scope of ‘vetted researcher’ could be extended to include other actors. However, until then media organisations might need to develop workarounds that still allow them to gain access. For instance, journalists might enter into project-based temporary contracts with academic organisations to gain vetted status. Equally, the provision does not address access for training purposes, which can seriously hinder the research into AI solutions that, for example, address important language gaps.

Second, the procedure to become vetted is complex and burdensome. VLOPs can also exploit this to their advantage and be selective about access conditions they set out and what type of dataset they make available.

At this point in time we are still waiting for the Member States to appoint their respective Digital Services Coordinators and a forthcoming delegated act which will lay down the specific conditions under which VLOPs and VLOSEs are to provide data is adopted (see the FAQ). So it remains too early to fully understand the impact of the legislation. However, there is a risk that the burden will be on the researcher and that it might compromise values around independent research, as it appears that platforms might be able to restrict the geographical scope, require an extensive description of the research project and methodology and ask the researchers to agree with the terms and conditions as well as asking researchers for a copy of the research before publication out of courtesy.

Third, these access requests are only eligible for research centred on understanding the systemic risks posed by the platforms and their risk mitigation measures. This, together with the fact that the researcher must already demonstrate that the ‘expected results of that research’ will contribute to an analysis of systemic risk, could exclude exploratory research aimed at identifying new risks and formulating new research hypotheses. Importantly, systemic risks are very broadly defined in the regulation (see note 3), which could enable a wider scope of research to fall under this umbrella. Ultimately, we will have to keep an eye on how data access on systemic is interpreted.

Image from X (formerly known as Twitter) illustrating the frustration amongst researchers.

Transparency provisions as an alternative route to access

Another way to potentially gain access to data for research is through the transparency reports of intermediary services providers and the yearly reports on systemic risk assessment and management (See Art. 24 & 35 (2)).

One of the requirements of the DSA is that so-called very large online platforms (VLOPS), such as Meta, X, and TikTok, need to assess and mitigate any potential “systemic risks” that could arise from the design and functioning of their AI-driven services. This includes both systems used for content moderation, but also, for example, the recommender systems that filter the users’ feeds.

A systemic risk in this context could entail any form of negative effect on fundamental rights, including freedom and pluralism of the media. One example could be the systemic removal of media organisations’ profiles or if the platforms block journalists’ accounts, which Al Jazeera, for example, has reported is happening to editors from two Palestinian news publications (see also this article). The data on systemic risks identified in these reports (pertaining to risks both at state and EU level) and best practices used for mitigation will provide an additional data entry for researchers but a lot depends on the extent of the content featured in these reports.

Organisational upskilling and guidelines on responsible practices

Our pilot policy recommendations on the use of AI in the media sector, already reflected on the need for upskilling in the media industry. Our findings showed that one of the reasons hindering the adoption of AI in the media industry is the lack of relevant skills by media professionals and difficulties in recruiting AI experts. Particularly, the lack of competitive salaries in the media sector as compared to big tech companies makes recruitment highly difficult, which is skewing AI divides between sectors, but also, particularly for small newsrooms and those localised in less strong economies.

Neither the DSA nor the AI Act have any direct programmes oriented towards supporting upskilling in the sector or providing solutions to the skewed labour market.

The DSA does, however, expect online platforms to “guarantee sufficient human and financial resources” (recital 43) and VLOPs to have sufficient human resources, including the content moderation personnel, their training, and local expertise (recital 87). The Digital Services Coordinators (i.e. the enforces of the DSA) shall have the “necessary number of staff and experts with specialised skills“ (recital 111, art. 50).

Similarly, Article 4 in the AI Act requires providers and deployers of AI systems to ensure, to their best extent, a sufficient level of AI literacy of their staff and other persons dealing with AI systems on their behalf. However, exactly what that entails, and what ‘extent’ of expertise is expected from media organisations using AI, is not clear (see note 4).

Importantly, the EU has launched programmes and initiatives, such as the AI Skills Strategy for Europe, ARISA — AI Skills Needs Analysis and media-specific calls such as ‘Fostering European Media Talents and Skills’. The business cluster under the MEDIA strand of Creative Europe also aims to promote business innovation, scalability and talents across the European audiovisual industry’s value chain. Thereby, trying to support this need outside legal instruments.

Support the production of open and shareable datasets

If the European media ecosystem wants to introduce responsible and robust AI innovations, open and shareable datasets are a must. As the EC study on Improving access to and reuse of research results, publications and data for scientific purposes shows, researchers and research organisations have to bear the burden of complexity and legal uncertainty in data access and reuse for research purposes.

Despite the importance of producing and sharing responsible datasets, the current legal landscape remains very fragmented and complex and neither the AI Act nor the DSA addresses this policy need directly. Some even argue that “trade secrecy” around algorithms will be a significant barrier to effectively enforcing the DSA.

Outside the direct legislative landscape, we do see great potential in the EU-funded initiative Common European Data Spaces, which could make it possible to create shareable datasets. In the media sector, TEMS (Trusted European Media Data Space) is a cross-border project tasked to build and run a data space for the media industry. This backbone infrastructure could provide a potential avenue for supporting the needs of the media sector.

Another avenue towards open data sets is open source (see note 5). Open Source companies generally disclose more information about the AI system’s architecture, datasets, and training methods making more data and documentation available. However, it has been argued that models that claim to be open source, such as Llama 2, Falcon, or Mistral, are being released without providing access to training datasets or even basic information about them (the so-called ‘openwashing’).

Due to the limited transparency requirements in the AI Act regarding datasets used for training open-source models, it is not expected that this particular legislation will support the creation of a marketplace of ethical open-source datasets.

Sustainable funding schemes & public-private collaborations

AI innovation in the media sector — in particular in public organisations — often relies on competition-based financing for projects. The Horizon Europe funding programme is a prime example of this. While these project-based innovation projects that enable both innovative new research but also concrete implementation of AI solutions in the media sector are essential, they should be guided by a long-term strategy that prioritises sustainability over ‘quick fixes’.

More generally, the media sector could benefit from more sustainable funding and support mechanisms with a long-term vision, to also work as a counter mechanism to the short-term funding provided by a variety of commercial actors, such as Google’s Digital News Innovation Fund (DNI).

If we look at the different EU funding mechanisms available for the media sector, they are currently almost exclusively project-based. This means that it is up to each organisation to figure out how they will fund the maintenance and long-term operation of AI services they develop through these projects. Furthermore, it becomes the organisation’s responsibility to determine how their projects can contribute to the broader ecosystem of AI developments in media.

There are some emerging initiatives, such as the new AI innovation package and the introduction of regulatory sandboxes that could begin to address the need for sustainable innovation funding and continue to stress the importance of private-public collaborations. Similarly, MediaInvest is the European Commission’s equity investment instrument that helps to bridge the financial gap in the audiovisual sector by stimulating more investment.

However, these also have short-term horizons and they unfortunately do not have specific provisions for the media sector. The GenAI4EU initiative funded under the AI innovation package will, for example, support startups and SMEs in 14 industrial ecosystems, but media is not one of them with the exception of public service media which will fall into the category with all public sectors.

Where to from here?

While we see there are some good initial steps, it is clear that access to data, open datasets, workings of and choices behind algorithmic decisions of big tech remains problematic, and that the conditions for responsible research and innovation in the media sector continue to be challenging. Clarifications from policymakers through delegated Acts and guidance from the AI office are necessary, so we will look towards these to fully grasp the effects and potential implications for the media sector. In the next blog, we will explore how the AI Act and DSA are trying to mitigate emerging AI divides and growing power imbalances.

Notes for clarification

See: DSA, Recital 97: This Regulation therefore provides a framework for compelling access to data from very large online platforms and very large online search engines to vetted researchers affiliated to a research organisation within the meaning of Article 2 of Directive (EU) 2019/790, which may include, for the purpose of this Regulation, civil society organisations that are conducting scientific research with the primary goal of supporting their public interest mission.
For the full list of designated VLOPS see here: https://digital-strategy.ec.europa.eu/en/policies/list-designated-vlops-and-vloses
Systemic risks include : (a) the dissemination of illegal content through their services;(b) any actual or foreseeable negative effects for the exercise of fundamental rights, in particular the fundamental rights to human dignity enshrined in Article 1 of the Charter, to respect for private and family life enshrined in Article 7 of the Charter, to the protection of personal data enshrined in Article 8 of the Charter, to freedom of expression and information, including the freedom and pluralism of the media, enshrined in Article 11 of the Charter, to non-discrimination enshrined in Article 21 of the Charter, to respect for the rights of the child enshrined in Article 24 of the Charter and to a high-level of consumer protection enshrined in Article 38 of the Charter, © any actual or foreseeable negative effects on civic discourse and electoral processes, and public security;(d) any actual or foreseeable negative effects in relation to gender-based violence, the protection of public health and minors and serious negative consequences to the person’s physical and mental well-being.
AI literacy is defined in art. 3(56) AI Act as “skills, knowledge and understanding that allows providers, users and affected persons, taking into account their respective rights and obligations in the context of this Regulation, to make an informed deployment of AI systems, as well as to gain awareness about the opportunities and risks of AI and possible harm it can cause”.
Open source AI refers to the use of open source AI components (e.g. AI model documentation, training data) that are under Open Source (OS) licenses, i.e. licenses that comply with the open source definition.