Truth-Telling Technology: Establishing Content Authenticity with C2PA

Published in

AIDA User Group

7 min readFeb 12, 2024

When you browse the web, it is common to wonder if an edited image is spoofing you or whether a credited author really wrote an article.

Now, with AI on the scene, you may also wonder whether a story or poem was even written by a human. You might not be sure whether a video created by AI used an artist’s content… or if the artist was paid. You might also wonder if AI is using your content without paying you.

As a reflection of these times, Merriam-Webster’s 2023 Word of the Year is authentic. Depending on the context, authentic means either “not false or imitation” or “true to its own personality, spirit or character.” For content to be considered authentic, we must know we can trust it.

Authenticity and Trust Logos

One way to help people establish a comfort level for the authenticity of their content is to use a recognizable and memorable trust logo, which is an insignia provided by a reputable source indicating the content has passed a particular validation. You may have seen trust logos at the checkouts of online stores, where they help to comfort the shopper that some official organization has vetted the vendor for authenticity.

Similarly, trust logos can be displayed along with AI-generated content, allowing users to find the provenance at the level that gives them comfort. Providing a way to prove provenance, or tracking and tracing each step of creating and modifying content, is a key factor in defending legal risk issues involving AI.

C2PA: Truth-Telling Technology

As of last week, special truth-telling metadata is embedded inside every file produced by DALL-E. This is because OpenAI began using C2PA credentials for files generated by its image creation AI, DALL-E. The C2PA credentials are symbolized by the CR logo.

The following image shows the CR logo several times overlaid on a DALL-E-produced file. The file is being inspected using a C2PA verifier at https://contentcredentials.org/verify. Focusing on the top thumbnail would show that ChatGPT generated the image, and focusing on the bottom thumbnail shows that the image originated from DALL-E via OpenAI-API.

The Content Authenticity Initiative verifier displays the CR logo and C2PA credentials for a file produced by OpenAI DALL-E using the Juke Joint GPT. The verifier shows the summary of the file’s contents, the process used to create the file, the app or device used, the AI tool used, and any actions that took place.

This means that if someone tries to pass a DALL-E image off as their original work, someone else could determine otherwise simply by inspecting the file and examining its C2PA credentials.

Upon Further Examination

Digging deeper into the DALL-E-produced image file, you will find a C2PA manifest, a text-based JSON file that provides more detailed information, such as assertions. These assertions can contain various statements, including ‘do not use’ for AI training and a list of ingredients, which are the files used to create the source file.

To see these hidden manifests, you can use a C2PA command line tool or the point-and-click interface that I designed and built along with Walmart Senior Data Scientist Ethan Kuehl: File Baby.

File Baby helps you not only detect and inspect the manifests from any file that has one but also helps you store and share files with C2PA credentials, such as Adobe or Microsoft files. Additionally, File Baby helps you to create content credentials for your older work or other files without C2PA credentials. Using File Baby, you can build and share galleries of music, art, video, text, and prompts, all with C2PA credentials intact.

All The Cool Kids Are Joining C2PA, and You Can Too

Also last week, just days after OpenAI revealed its C2PA implementation, Google announced that it has joined C2PA as a member of its steering committee.

This technology, developed by the Coalition for Content Provenance and Authenticity (where the “C2PA” comes from), helps to prove whether or not an image is authentic by narrowing down its source. C2PA is led by Adobe, BBC, Microsoft, Publis Group, Sony, Truepic, and now Google–with many other organizations participating. C2PA’s membership roster also includes smaller organizations like Friends of Justin, a non-profit organization that Ethan and I co-founded along with Orson Weems, which is aimed at improving interactions between humans and AI.

C2PA develops the technical specifications and tooling for the broader standards group, known as the Content Authenticity Initiative, or CAI. Anyone interested in adopting CAI can join the group and, once they are approved, receive a banner they can display on their site. File Baby’s banner is shown here.

The Content Authenticity Initiative partner logo for File Baby, the software I wrote to automate the C2PA functionality.

C2PA was conceived five years ago in response to the onslaught of fake news in the US elections and the lack of any way to prove authenticity. With the explosion of interest in generative AI, C2PA is critical in brand new ways.

Why is C2PA Important to AI?

C2PA is particularly useful for AI because generative models like DALL-E are the source of very real-looking AI images. There has to be a way to tell if something is made by a human or an AI for many reasons, from simply remembering which photos are real to defending accusations of plagiarism.

C2PA manifests show the file authenticity through user-viewable content provenance, which is a step-by-step history of a piece of content such as a picture, a song, or a video, which is critical to helping people decide who and what they choose to trust.

AI becomes smarter as it goes along by finding new content and using that content to train itself. If AI does not receive a reasonable diet of human-generated content, the models quickly collapse.

This means there is a risk of having your content available online for everyone to see because “everyone” includes AI. The AI may use your content as training data and, unbeknownst to you, generate works in your style for people across the globe, with no attribution or payment to you.

C2PA and File Baby provide a safer way to share your content, taking special steps to prevent AI from using it as training data. C2PA manifests allow you to encode instructions specifying whether or not a file can be used for AI inference or creating predictions based on real data; AI training data, or creating predictions or performing tasks based on a data set; and data mining, which discovers patterns and relationships among data.

Since it can be established that you own your content and AI is not allowed to consume it freely, there is an emerging opportunity to make extra income by selling your content to AI models. Considering the heavy commitment among tech giants to C2PA and the ongoing demand for fresh training data, compensation for content creators is the logical next step.

Content Authenticity Initiative

The Content Authenticity Initiative (CAI) created the CR logo, which symbolizes that a file contains C2PA credentials.

The next figure shows an example image file on the left before being signed by a valid content credentials signer at File Baby; on the right example image, the image has been signed and is displayed in a viewer showing the CR logo. A deeper explanation appears in a popup if the visitor hovers a mouse over the CR logo. One such viewer is the C2PA Google Chrome extension by Digimarc.

A before-and-after image of C2PA content credentials being applied to a file. The CR is displayed dynamically when the file is loaded into certain verifiers or inspected with Google Chrome using the Digimarc C2PA browser extension.

Validating Content & Detecting Fakes

Assertions as to how the content was created–person or AI–how it was edited, a list of the original ingredients, a link back to a validated identity of the content’s owner, and a cryptographic hash that matches only the image in question are some of the ways that C2PA validates content and detects fakes.

For example, the animation below shows a file that has been signed by the Friends of Justin organization. The claim is validated using CAI’s verifier.

This shows an animation of dropping a file into the content credentials verifier and seeing its manifest. The displayed file is an original photograph by me, uploaded into File Baby and signed by Friends of Justin. Note the live link to my LinkedIn account; that could also be linked to Instagram or Behance.

The validation process indicates that a file has not undergone tampering. This makes it so you can tell right away if a bad actor does something like change a face in a picture or add someone who wasn’t really there.

Another Truth-Telling Technology: Blockchain

File Baby offers blockchain-ready cryptographic proofs to add an optional layer of tamper-evident transparency into C2PA file provenance. One area where blockchain can really complement C2PA is in the creation of a trust network. Blockchain technology is used to build trust networks because it provides a ready-made, fully pluggable framework into which other organizations, along with their individual technologies, can connect. Blockchain networks are distributed to each member, which keeps everyone above board and helps the governance of the AI stay true to its original intent.

Blockchain networks can also be structured to allow smart contracts to pre-govern certain interactions among group members, breaking down trust into its finite points and providing groups with a pre-made workflow and a quick way to evaluate trust. Zero-knowledge proofs, another feature of a blockchain network, allow members of a trusted network to prove certain information to the group without revealing the underlying secrets.

To read more about creating trackable, traceable, transparent, tamper-evident AI, read the book I co-authorered, Blockchain Tethered AI (O’Reilly 2023).

A promotional slide for Blockchain Tethered AI (O’Reilly, 2023), a book I co-authored, explains in detail how to create an AI and Machine Learning workflow that helps to keep AI models accountable and even reversible. Includes code.

Conclusion

The best time to establish the authenticity of content is at the time of its creation so that every modification along the way can be recorded. Truth-telling technologies like C2PA and blockchain are built for this purpose. C2PA credentials are most effective when the files containing the embedded credentials can be shared from a system like File Baby that keeps the credentials intact and can match them up with the files again if separated. Using these technologies, there is always a clear audit trail–file provenance that goes back to the beginning of the life of every file–so there is never any question as to what is authentic.