Provenance as a Service: The Rise of Verification Platforms in Digital Media
This article is the third issue of a collection stemming from a collaboration between Overtheblock.io and Numbers Protocol. The post highlights the role that Distributed Ledger Technologies (DLTs) and Artificial Intelligence may play within the media & entertainment industry towards the attainment of more trustworthy information diffusion processes over social media.
Introduction
We live in an era of information overload, where we are constantly bombarded with digital content from various sources and platforms. However, not all of this content is intentionally misleading, but a significant proportion is misinformation and disinformation made and spread to fool and manipulate us. It is a growing problem, with serious real-world implications for our society, democracy and economy.
According to a recent report by the World Economic Forum, misinformation and disinformation are among the top global risks in terms of likelihood and impact. The report estimates that the annual cost of fake news to the global economy is about $78 billion (about $240 per person in the US). Moreover, the report warns that the emergence of new technologies, such as artificial intelligence (AI), can exacerbate the problem by enabling the creation and dissemination of more sophisticated and realistic forms of digital deception, such as deepfakes, synthetic media, and generative adversarial networks (GANs).
This problem of misinformation (and disinformation) is not only widespread but persistent and pervasive. A study by PLOS ONE found that over 1.3 million COVID-19 related news articles from unverified sources were published in 36 countries and 24 languages in the period between January 2020 and December 2022. Another study by Statista revealed that 67% of Americans believe that fake news causes a great deal of confusion, and 38.2% of Americans admit that they have accidentally shared fake news.
To illustrate the problem of misinformation and disinformation in the digital age, we can look at the results of a study by PLOS ONE, which analysed 58,625 COVID-19 related articles from unverified sources in 36 countries and 24 languages. The study identified 12 common themes or super narratives that were used to spread false or misleading information about the pandemic, such as fearmongering, conspiracy theories, miracle cures, etc.
Fig 2 shows the distribution and evolution of these 12 super narratives over time, from January 2020 to December 2022, and highlights ten significant peaks that correspond to major events or developments related to the pandemic, such as the first cases, the lockdowns, the vaccines, etc.
One of the most prominent and persistent super narratives was fearmongering, which aimed to create panic and anxiety among the public by exaggerating or fabricating the risks and impacts of the novel coronavirus. As we can see from Fig 2, fearmongering reached its highest point in the early months of the pandemic, when the virus was new and unknown, and people were uncertain and scared about its consequences. However, fearmongering declined in the spring of 2020, as the weather improved and the cases dropped, and people became more familiar and confident with the situation.
Interestingly, some of the articles that were classified as fearmongering at the time of publication turned out to be underestimating the severity of the pandemic in hindsight. For example, some articles predicted that the pandemic would cause up to 4.4 million deaths worldwide, which seemed like a very high and alarming number in February 2020. However, by the end of 2022, the global death toll had surpassed 6.7 million, which showed that the reality was even worse than the worst-case scenario.
This example shows how misinformation and disinformation can be dynamic and context-dependent, and how they can affect our perception and understanding of the world. It also shows why we need provenance and verification to help us distinguish between fact and fiction, and to ensure that the media we consume is authentic and trustworthy.
How can we cope with this challenge? How can we ensure that the media we consume is authentic and trustworthy? How can we verify the source, origin, and history of the digital content we encounter? These are the main questions that this article aims to address. The main objective of this article is to explore the concepts of provenance and verification in AI-generated media, and how they can help us establish and maintain digital authenticity. Provenance refers to the information that describes the creation, modification, and ownership of a digital asset, while verification refers to the process of validating the accuracy, integrity, and credibility of a digital asset.
Provenance and Verification in AI-Generated Media
Provenance and verification are two key concepts that can help us achieve digital authenticity in AI-generated media. Provenance refers to the origin and history of a piece of content, while verification refers to the assessment of its authenticity and integrity. In other words, provenance answers the question of where the content came from and how it was created or modified, while verification answers the question of whether the content is true or false, and whether it has been tampered with or not.
Provenance and verification are important for digital authenticity because they can help us establish trust and confidence in the media we consume and produce. By knowing the provenance and verification of a piece of content, we can evaluate its quality, reliability, and credibility, and make informed decisions about how to use it, share it, or cite it. Moreover, provenance and verification can also help us protect our intellectual property rights, prevent plagiarism, and avoid legal disputes or ethical issues.
Challenges and limitations of existing methods and tools for verifying and tracing the origins and history of digital content
Existing methods and tools for verifying and tracing the origins and history of digital content can be broadly classified into two categories, human-based and technology-based.
1. Human-based methods and tools rely on the expertise and judgment of human agents, such as journalists, fact-checkers, editors, or analysts, to verify and trace the content. They typically use various sources of information, such as official records, eyewitness accounts, expert opinions, or online databases, to cross-check and corroborate the content.
Some challenges and limitations hinder the effective and efficient verification and tracing of the origins and history of digital content using human-based methods and tools. Some of these challenges and limitations are:
i. Human fact-checkers: human fact-checkers are often the first line of defense against misinformation and disinformation, but they also face many difficulties and drawbacks. For instance, human fact-checkers are limited by their time, resources, and expertise, and they may not be able to keep up with the volume and speed of AI-generated media. Moreover, human fact-checkers may be biased, inconsistent, or inaccurate in their judgments, and they may not have access to the original sources or metadata of the content they are verifying.
ii. Reverse image search: reverse image search is a technique that allows us to find similar or identical images on the web by uploading an image or entering its URL. This can help us verify the provenance and authenticity of an image by comparing it with other sources. However, reverse image search also has some limitations, such as:
- It may not be able to detect subtle or sophisticated manipulations or alterations of an image, such as cropping, resizing, filtering, or adding or removing objects or people.
- It may not be able to find the original or earliest source of an image, especially if the image has been widely circulated or modified by different users or platforms.
- It may not be able to distinguish between real and fake images, especially if the fake images are generated by advanced AI models that can create realistic and convincing images from scratch.
iii. Metadata analysis: metadata analysis is a technique that involves examining the data that is embedded or attached to a digital file, such as the date, time, location, camera, software, or author of the file. This can help us verify the provenance and authenticity of a file by checking its consistency and validity. However, metadata analysis also has some limitations, such as:
- It may not be available or accessible for every file, especially if the file has been compressed, encrypted, or stripped of its metadata by the user or the platform.
- It may not be reliable or trustworthy, especially if the metadata has been forged, altered, or deleted by the user or the platform.
- It may not be sufficient or conclusive, especially if the metadata does not provide enough or relevant information about the file or its content.
2. Technology-based methods and tools rely on the capabilities and algorithms of computer systems, such as artificial intelligence, machine learning, or blockchain, to verify and trace the content. They use various techniques of data processing, analysis, and comparison, such as pattern recognition, feature extraction, or hashing, to identify and verify the content. Some examples of technology-based methods and tools are:
- Artificial intelligence (AI): The use of computer systems that can perform tasks that normally require human intelligence, such as natural language processing, computer vision, or speech recognition, to analyze and verify the content.
- Machine learning (ML): The use of computer systems that can learn from data and improve their performance without explicit programming, such as neural networks, deep learning, or reinforcement learning, to detect and verify the content.
- Blockchain: The use of a distributed ledger that records and stores transactions in a secure, transparent, and immutable way, using cryptographic methods and consensus protocols, to verify and trace the content. Blockchain can create and preserve a unique and tamper-proof digital fingerprint for each piece of content, and enable anyone to access and verify its provenance and history.
These challenges and limitations suggest that there is a need for a new and better way of verifying and tracing the provenance of AI-generated media. This is where the concept of Provenance as a Service (PaaS) comes in.
The concept of Provenance as a Service (PaaS)
Provenance as a Service (PaaS) is a novel and innovative approach that leverages blockchain technology and decentralized storage to create and preserve immutable records of digital content creation and modification. PaaS can be defined as:
A service that provides a secure, transparent, and verifiable way of storing and accessing the provenance data of any digital asset, such as text, image, audio, video, or other media, using blockchain technology and decentralized storage.
PaaS can enhance trust, integrity, and credibility in digital media by providing transparent and verifiable information about the content’s source, authorship, ownership, and changes. PaaS can offer several benefits and advantages, such as:
- Immutability: PaaS can ensure that the provenance data of a digital asset is immutable, meaning that it cannot be changed, deleted, or tampered with by anyone, even by the owner or the creator of the asset. This can prevent fraud, manipulation, or plagiarism of the asset, and provide a permanent and indisputable record of its origin and history.
- Transparency: PaaS can ensure that the provenance data of a digital asset is transparent, meaning that it is publicly available and accessible by anyone, without any intermediaries or gatekeepers. This can increase the accountability and responsibility of the creators and consumers of the asset, and enable them to verify and validate the authenticity and integrity of the asset.
- Verifiability: PaaS can ensure that the provenance data of a digital asset is verifiable, meaning that it can be easily and quickly checked and confirmed by anyone, using cryptographic methods and algorithms. This can reduce the reliance and dependence on human fact-checkers or third-party platforms, and improve the efficiency and accuracy of the verification process.
PaaS is a promising and powerful solution for achieving digital authenticity in AI-generated media. However, PaaS is not a magic bullet that can solve all the problems and challenges of misinformation and disinformation. PaaS still requires the collaboration and cooperation of various stakeholders, such as content creators, consumers, developers, platforms, regulators, and educators, to ensure that the provenance data is accurate, complete, and relevant, and that the users are aware, informed, and educated about the importance and implications of provenance and verification in AI-generated media. PaaS is not a substitute for critical thinking and media literacy, but rather a complement and a catalyst for them.
A Case Study of PaaS
One of the examples of Provenance as a Service (PaaS) in action is Numbers Protocol, a decentralized photo network that provides content verification for AI-powered companies and creativity tools. Numbers Protocol is a platform that aims to create and maintain the authenticity of the photos and videos available on the internet and build a network where photos and videos are completely traceable and ethical.
Numbers Protocol works by using blockchain technology and decentralized storage to create and preserve immutable records of digital content creation and modification. Numbers Protocol offers three main features and benefits for its users, namely:
- Capture Cam: a mobile app that instantly uploads digital images onto the blockchain, creating a unique and tamper-proof digital fingerprint for each photo. The Capture app also allows users to edit, filter, and enhance their photos using AI-powered tools, such as face swap, background removal, and style transfer. The Capture app records every change made to the photo and stores it on the blockchain, ensuring that the provenance and verification of the photo are always available and accessible.
- Numbers API: a web service that allows developers to integrate Numbers Protocol into their own applications and platforms, enabling them to access and verify the provenance data of any photo on the network. The Numbers API also allows developers to create and customize their own verification rules and criteria, such as the minimum number of sources, the maximum number of edits, or the required level of quality or resolution. The Numbers API can help developers enhance the trust and credibility of their applications and platforms and provide their users with more transparency and control over their digital content.
- Numbers Marketplace: a platform that connects content creators and consumers, allowing them to monetize their photos and videos through smart contracts and tokenization. The Numbers Marketplace enables content creators to sell or license their photos and videos to consumers, who can use them for various purposes, such as journalism, social media, e-commerce, art, etc. The Numbers Marketplace also ensures that the content creators and consumers can agree on the terms and conditions of the transaction, such as the price, the duration, the usage rights, and the royalties. The Numbers Marketplace can help content creators and consumers to create and exchange value fairly and securely and to benefit from the authenticity and quality of their digital content.
Numbers Protocol can be applied in various domains and scenarios, where provenance and verification of digital content are essential and valuable. Some of the use cases and examples are:
- Journalism: Numbers Protocol can help journalists and media outlets to verify the authenticity and integrity of the photos and videos they use or produce and to avoid the risks of misinformation and disinformation. Numbers Protocol can also help journalists and media outlets to protect their intellectual property rights, and to monetize their content through the Numbers Marketplace.
- Social media: Numbers Protocol can help social media users and influencers to create and share authentic and trustworthy photos and videos, and to enhance their reputation and credibility. Numbers Protocol can also help social media users and influencers to edit and enhance their photos and videos using AI-powered tools, and to monetize their content through the Numbers Marketplace.
- E-commerce: Numbers Protocol can help e-commerce sellers and buyers to verify the authenticity and quality of the products and services they offer or purchase, and to avoid the problems of fraud and counterfeiting. Numbers Protocol can also help e-commerce sellers and buyers to create and exchange value through the Numbers Marketplace, and to benefit from the provenance and verification of their photos and videos.
- Art: Numbers Protocol can help artists and collectors to verify the authenticity and originality of the artworks they create or acquire, and to avoid the issues of plagiarism and forgery. Numbers Protocol can also help artists and collectors to create and exchange value through the Numbers Marketplace, and to benefit from the provenance and verification of their photos and videos.
Conclusion
In this article, we have addressed the problem of misinformation and disinformation in the digital age, and explored the concepts of provenance and verification in AI-generated media. We have defined provenance as the information that describes the creation, modification, and ownership of a digital asset, and verification as the process of validating the accuracy, integrity, and credibility of a digital asset. We have also explained why provenance and verification are important for digital authenticity, and how they can help us establish trust and confidence in the media we consume and produce.
We have discussed the challenges and limitations of existing methods and tools for verifying and tracing the origins and history of digital content, such as human fact-checkers, reverse image search, metadata analysis, etc. We have introduced the concept of Provenance as a Service (PaaS), and how it leverages blockchain technology and decentralized storage to create and preserve immutable records of digital content creation and modification. We have presented a case study of Numbers Protocol, a decentralized photo network that provides content verification for AI-powered companies and creativity tools.
We have emphasized the significance and implications of PaaS and Numbers Protocol for the media and entertainment industry, and how they can help combat misinformation and disinformation, and foster innovative and ethical ways of content creation and consumption. We have provided some use cases and examples of how Numbers Protocol can be applied in various domains and scenarios, such as journalism, social media, e-commerce, art, etc.
Let’s consider some recommendations and suggestions for future research and development in this field, and highlight some of the challenges and opportunities that lie ahead. Some of them are:
- Developing and adopting common standards and protocols for PaaS, such as the Content Authenticity Initiative (CAI), which aims to provide a framework for certifying the origin and integrity of digital content.
- Enhancing and expanding the features and functionalities of Numbers Protocol, such as adding support for more types of media, such as audio and video, and integrating more AI-powered tools, such as face detection, object recognition, and sentiment analysis.
- Educating and empowering the users and stakeholders of PaaS and Numbers Protocol, such as content creators, consumers, developers, platforms, regulators, and educators, to raise their awareness, understanding, and appreciation of the importance and implications of provenance and verification in AI-generated media.
- Exploring and experimenting with new and novel ways of creating and consuming digital content, such as using PaaS and Numbers Protocol to generate and verify creative and artistic expressions, such as poems, stories, songs, etc.
We believe that PaaS and is not only innovative and powerful technologies, but also visionary and transformative movements that can shape and improve the future of the media and entertainment industry, and the society at large.
As the famous quote by George Orwell goes, “Who controls the past controls the future. Who controls the present controls the past.” We believe that PaaS can help us control our own past, present, and future, by enabling us to create and consume authentic and trustworthy digital content, and by empowering us to be the masters of our own media.
Please cite as: Galason, E., Yan S., (2024) Provenance as a Service: The Rise of Verification Platforms in Digital Media, Overtheblock Innovation Observatory, retrievable at link
OverTheBlock is a LINKS Foundation’s initiative carried out by a team of innovation researchers under the directorship of Enrico Ferro. The aim is to promote a broader awareness of the opportunities offered by the advent of exponential technologies in reshaping how we conduct business and govern society.
References:
- Fake news worldwide — statistics & facts — World Trends in Freedom of Expression and Media Development
- Statistics & Facts about Fake News | Statista
- Where False Information Is Posing the Biggest Threat
- Trend analysis of COVID-19 mis/disinformation narratives–A 3-year study
- https://www.statista.com/
- Digital Authenticity: Provenance and Verification in AI-Generated Media
- Digital Authenticity: Provenance and Verification in AI-Generated Media
- Authenticating AI-Generated Content
- What happens if there is no provenance of digital media in the Generative AI era