Digital Authenticity: Provenance and Verification in AI-Generated Media

Numbers
OvertheBlock
Published in
13 min readJan 10, 2024

This article is the second issue of a collection stemming from a collaboration between Overtheblock.io and Numbers Protocol. The post highlights the role that Distributed Ledger Technologies (DLTs) and Artificial Intelligence may play within the media & entertainment industry towards the attainment of more trustworthy information diffusion processes over social media.

Digital authenticity is a crucial aspect of AI-generated media. As AI-generated content becomes more pervasive, understanding how to verify and trace the origins of such media is vital. This topic could appeal to both creators and consumers of digital content, sparking conversations around trust and integrity in digital media.

Imagine browsing the web and encountering an article claiming to reveal a shocking truth about a political leader. The article is accompanied by a photo that shows the leader in a compromising situation. You are intrigued and outraged by the story, but you also wonder: is this real or fake?

This scenario is not far-fetched in the age of AI-generated media. With the advent of powerful machine learning models that can create realistic text, images, and other media from scratch, the distinction between fact and fiction becomes blurred. AI-generated media can offer a boundless canvas for creativity, but it can also seriously threaten trust, integrity, and credibility in the digital realm.

A survey by Statista found that 42% of marketers worldwide trusted AI to carry out content creation activities in 2022, while 38% trusted AI to carry out content curation activities. This shows that AI-generated media is becoming more prevalent and influential in the media and entertainment industry.

The chart below illustrates the rapid evolution of AI systems in the past two decades, showcasing their remarkable progress in language and image recognition. From an initial performance of -100, AI systems have advanced to consistently outperform humans in various domains, marking a significant shift from a decade ago when such feats were inconceivable.

Figure 1: The language and image recognition capabilities of AI systems -Source: Our World in Data

How can we ensure that the media we consume is authentic and trustworthy? How can we trace the origins and history of the content we encounter? How can we prevent the spread of misinformation and deception in the age of AI?

These are some of the questions that this article aims to answer. We will explore the concepts of provenance and verification in AI-generated media, and how they can help us establish and maintain digital authenticity.

The Growing Predicament

AI-generated media refers to any type of media content created or modified by artificial intelligence. This includes text, images, audio, video, and more. AI-generated media can be produced by various methods, such as generative adversarial networks (GANs), variational autoencoders (VAEs), transformers, and others.

Figure 2: An AI-generated image. The prompt used to generate this is: “Create a compelling visual representation of a scenario of two horses fighting”.

AI-generated media has many positive applications, such as enhancing artistic expression, generating novel content, improving accessibility, and more. However, it also has many negative implications, such as creating deepfakes, spreading misinformation, manipulating public opinion, infringing intellectual property rights, etc.

According to a recent article by the BBC, generative AI can produce new text, images, and other media by running a machine learning model fed by billions of existing bits of content from across the web and elsewhere. It is now possible to input a few lines of descriptive text (a “prompt”) and have tools like Stable Diffusion or Midjourney create an image with amazing fidelity and visual style. Many casual observers would not be able to tell whether an AI generated it.

The problem of AI-generated media is technical but also ethical and social. As AI becomes more capable of generating realistic and convincing media content, it becomes harder for humans to discern what is real and what is fake. This can erode our trust in the information we receive and the sources we rely on. It can also undermine our sense of reality and identity.

Therefore, we must develop ways to verify and authenticate the media content we encounter on the web. We need to be able to trace the provenance and verify the integrity of the content we consume. We need to be able to distinguish between genuine and counterfeit content.

The Role of Provenance and Verification

Provenance and verification are two key concepts that can help us achieve digital authenticity in AI-generated media. Provenance refers to the origin and history of a piece of content, while verification refers to the assessment of its authenticity and integrity.

Provenance involves tracing the source, creation process, ownership, and distribution of a piece of content. It answers questions such as: who created this content? When was it created? How was it created? Who owns it? Who has access to it? How has it been modified or shared?

Verification involves checking whether a piece of content is authentic or not. It answers questions such as: is this content original or copied? Is this content real or fake? Is this content accurate or inaccurate? Is this content consistent or inconsistent?

The Content Authenticity Initiative (CAI) is a coalition of technology companies, media organizations, and academic institutions that aims to develop an open standard for provenance and verification in digital media. The CAI proposes to embed metadata in digital media files that contain details about the provenance of the content, such as the source, creation process, ownership, and distribution.

Provenance and verification are complementary processes that can help us establish trust and transparency in AI-generated media. By knowing the provenance of a piece of content, we can verify its authenticity more easily. By verifying the authenticity of a piece of content, we can confirm its provenance more reliably.

Provenance: Tracing the Origins of AI-Generated Content

Provenance is a term that originates from the art world, where it refers to the documented history of an artwork, such as its origin, ownership, and changes over time. Provenance helps to establish the authenticity, value, and significance of an artwork.

In the context of AI-generated media, provenance refers to the information that describes the origin and history of a piece of digital content, such as its source, creation process, ownership, and distribution. Provenance helps to establish the authenticity, integrity, and credibility of a piece of digital content.

For example, consider an image generated by a machine-learning model based on a text prompt.

These are some examples of provenance data that can be associated with a piece of AI-generated content. However, provenance data can vary depending on the type, purpose, and context of the content.

The Significance of Tracing Content Origins

Tracing the origins of AI-generated content is significant for several reasons. First, it can help to verify the authenticity and integrity of the content. By knowing where a piece of content comes from, how it was created, and who is responsible for it, we can assess whether it is genuine or fake, accurate or inaccurate, consistent or inconsistent.

Second, it can help to protect the rights and interests of the content creators and owners. By knowing who owns a piece of content, how it was licensed, and how it was distributed, we can respect their intellectual property rights, acknowledge their contributions, and reward their efforts.

Third, it can help to enhance the quality and value of the content. By knowing how a piece of content was created, what techniques and tools were used, and what feedback or ratings were received, we can improve our understanding, appreciation, and enjoyment of the content.

Fourth, it can help to foster trust and transparency in the digital realm. By knowing who we are interacting with online, what information we receive or share online, and how we influence or influence online, we can establish more honest and ethical relationships with other users and stakeholders.

Techniques for Recording and Preserving Provenance Data

Recording and preserving provenance data for AI-generated content is not a trivial task. It requires a combination of technical and social solutions that can ensure that provenance data is accurate, complete, consistent, accessible, and secure.

Some of the techniques that can be used for recording and preserving provenance data are:

  • Metadata: This includes descriptive information about digital media, like title, author, format, and more, and can also contain provenance details like source and ownership.
  • Watermarks: These marks, either visible or invisible, are embedded in digital media to signify origin or ownership, and can carry provenance data.
  • Digital Signatures: Cryptographic methods that affirm the identity and integrity of digital files, these signatures can include provenance information and are verifiable using public-key encryption.
  • Blockchain: A secure, transparent ledger system that creates unalterable records. It’s used for managing digital assets and recording provenance data, such as the creation process and ownership of AI-generated content.

These methods, which are neither exclusive nor exhaustive, can be used individually or in combination for optimal results in recording and preserving provenance data for AI-generated content.

Verification: Ensuring the Authenticity of AI-Generated Media

Verification is the process of checking whether a piece of digital content is authentic or not. It involves assessing the accuracy, consistency, and integrity of the content. Verification answers questions such as: is this content original or copied? Is this content real or fake? Is this content accurate or inaccurate? Is this content consistent or inconsistent?

Verification is important for several reasons. First, it can help to prevent the spread of misinformation and deception in the digital realm. Misinformation and deception can have serious consequences for individuals, organizations, and society at large. They can affect our beliefs, opinions, decisions, and actions. They can undermine our trust in information sources and authorities. They can also influence our political, social, and economic outcomes.

Second, it can help to protect the rights and interests of the content creators and owners. Verification can help to identify and expose plagiarism, infringement, and manipulation of digital content. Verification can also help to enforce accountability and responsibility for the content that is created and shared online.

Third, it can help to enhance the quality and value of the content. Verification can help to ensure that the content we consume is reliable, credible, and trustworthy. Verification can also help to improve our understanding, appreciation, and enjoyment of the content.

Authenticity Assessment Techniques

The process of authenticity assessment involves the application of various verification methods to digital content. This task can be undertaken by humans, machines, or a combination of both, with the approach tailored to the content’s nature, intent, and situational context.

Key techniques employed in authenticity assessment include:

  • Human Judgment: This involves leveraging human perception, cognition, and intuition to determine the authenticity of digital content. It encompasses assessing various elements like visual cues, linguistic style, context, and the credibility of the source. Additionally, it may include consulting experts or authorities with relevant expertise.
  • Machine Learning: Utilizing artificial intelligence algorithms, this method analyzes and categorizes digital content based on distinct features and patterns. Techniques employed range from deep learning to natural language processing and computer vision. Machine learning also involves training models using extensive datasets, which may be labeled or unlabeled.
  • Blockchain Technology: As a secure, transparent distributed ledger system, blockchain aids in authenticating digital content. It does so by creating unalterable and verifiable records that provide detailed provenance information, including the content’s source, its creation process, ownership details, and distribution history.

These methods represent some of the primary techniques for authenticity assessment. They are not mutually exclusive and can be enhanced or supplemented with other methods to optimize effectiveness.

The Role of Blockchain Technology in Verification

Blockchain technology plays a significant role in the verification of AI-generated media. Blockchain technology offers several advantages over traditional methods for verification, such as metadata, watermarks, and digital signatures. Some of these advantages are:

  • Decentralization: Blockchain technology operates on a peer-to-peer network without relying on a central authority or intermediary to validate transactions or records. This reduces the risk of corruption, censorship, or manipulation by malicious actors.
  • Immutability: Blockchain technology uses cryptography to create records that cannot be changed or tampered with once added to the ledger. This ensures that the provenance information of digital content remains intact and consistent over time.
  • Verifiability: Blockchain technology allows anyone with access to the ledger to verify the authenticity and integrity of digital content by checking its associated records. This enhances transparency and accountability in the digital realm.

Blockchain technology can be used to verify AI-generated media in various ways. One way is to use blockchain technology to create and manage unique digital assets on the web using protocols such as Numbers Protocol. Another way is to use blockchain technology to detect and expose AI-generated media using platforms such as Sensity. A third way is to use blockchain technology to fact-check and correct AI-generated media using initiatives such as MIT Technology Review.

The Symphony of Traditional and Blockchain Approaches

Achieving provenance and verification in AI-generated media presents challenges, as traditional methods can be inadequate or ineffective. MIT Technology Review has highlighted that simply watermarking AI-generated content does not suffice to establish trust online since an ill-intent actor can readily remove or alter watermarks. Similarly, metadata is susceptible to manipulation, and digital signatures are not immune to being forged or compromised.

In digital media provenance, several innovative solutions have been developed to tackle the challenges of authenticity and verification, particularly in AI-generated content. These solutions employ a range of techniques, from traditional methods to advanced technologies:

  • Content Authenticity Initiative (CAI): Initiated by Adobe, CAI aims to combat misinformation by attaching provenance data to digital content, enhancing trust and transparency.
  • Project Origin: This initiative, backed by a coalition of news organizations and tech companies, focuses on verifying the source and history of news content through digital watermarking.
  • WITNESS: Specializing in human rights contexts, WITNESS works on technology tools that help verify the authenticity of digital content.
  • TruePic: Concentrating on image and video verification, TruePic utilizes controlled capture technology to authenticate content at the point of creation.
  • Everledger: Known for tracing the provenance of high-value assets like diamonds, Everledger applies blockchain technology to ensure the traceability and authenticity of digital media.
  • Starling Framework for Data Integrity: Developed by the USC Shoah Foundation and Stanford, this framework uses blockchain and decentralized web protocols to maintain the integrity of digital data.
  • Numbers Protocol: Distinct in its approach, Numbers Protocol offers a blockchain-based platform designed to preserve the authenticity and integrity of digital content. It integrates a multi-layered system that includes ownership, content provenance, creator signature, and on-chain records.

Numbers Protocol operates as a blockchain-based platform designed to preserve the authenticity and integrity of digital content. It provides a multi-layered container with embedded ownership, content provenance, creator signature, and on-chain records, thereby establishing a decentralized network. This network empowers content creators and consumers to verify the authenticity and ownership of their digital materials. Additionally, it offers a comprehensive toolkit for registering and retrieving images and videos within its network (Capture).

At the core of its technology is the creation of a unique identifier for each digital asset, known as a Proof-of-Existence (PoE). The PoE encompasses critical information, including the asset’s hash value, timestamp, and the identity of the digital asset’s owner. The immutable and verifiable nature of a PoE ensures that it is resistant to alteration or tampering, allowing for straightforward validation by any party.

By capturing essential details such as the source, creation process, ownership, and distribution of each digital asset and securely recording this provenance data on the blockchain, the system promotes transparency and integrity. This approach ensures that the origins and history of digital assets are traceable and verifiable.

Real-World Applications

Providing an immutable record of ownership and provenance for digital content is significant. According to the 2023 Special 301 Report by the Office of the United States Trade Representative, counterfeiting and piracy result in substantial financial losses for rights holders, legitimate businesses, and governments, as well as risks to consumer health and safety, privacy, and security.

By utilizing the immutable provenance data provided by Numbers Protocol:

  1. Creators can assert their ownership and rights over their digital assets, thus preventing unauthorized use or distribution.
  2. Users can combat misinformation and deception in the digital realm by offering a reliable source of authentic information. A study by MIT indicates that false news disseminates six times faster than true news on platforms such as X (formerly Twitter).
  3. It can enhance transparency and accountability in various industries, such as art, media, advertising, and beyond, by delivering a secure and transparent method to track the provenance and ownership of digital assets. A survey by PwC reveals that 45% of consumers are willing to pay a premium for products authenticated through blockchain.

Conclusion

In the age of AI-generated media, the quest for digital authenticity has become a pressing concern. The rise of AI-generated content blurs the line between fact and fiction, demanding effective methods for verifying and tracing the origins of digital media. This article emphasizes the importance of provenance and verification, with the Content Authenticity Initiative (CAI) leading the charge in creating an open standard for embedding metadata in digital media files. Among other Web3 solutions such as … (add some other examples), Numbers Protocol offers an innovative approach to ensuring digital authenticity by providing a secure and transparent framework for tracking the origin and ownership of digital assets.

Provenance, the record of an AI-generated content’s origin and history, is crucial for verifying authenticity, protecting creators’ rights, and enhancing content quality. Verification, the process of assessing content’s accuracy and consistency, combats misinformation and ensures the credibility of digital media. Blockchain technology, notably through Numbers Protocol, plays a pivotal role in this journey, offering decentralization, immutability, and verifiability, thus revolutionizing how we interact with and manage digital content in a world increasingly shaped by AI-generated media.

Please cite as: Galason, E., Yan S., (2023), (2024) Digital Authenticity: Provenance and Verification in AI-Generated Media, Overtheblock Innovation Observatory, retrievable at link.

OverTheBlock is a LINKS Foundation’s initiative carried out by a team of innovation researchers under the directorship of Enrico Ferro. The aim is to promote a broader awareness of the opportunities offered by the advent of exponential technologies in reshaping how we conduct business and govern society.

--

--

Numbers
OvertheBlock

https://numbersprotocol.io ;Decentralized Photo Network for Web 3.0 For creating community, value, and trust in digital media.