With the advancement of technology, it has become much easier to fake an image nowadays by using all sorts of editing software like Photoshop or advanced AI techniques like Deepfake¹. At the same time, it has become a lot harder for humankind to spot tampering in an image. To share a photo confidently in untrusted environments, such as the Internet, has become more demanding and challenging than ever before. On the other hand, the advancement of blockchain technology makes it possible to tackle an increasing number of trust-related issues. In this post, I propose an open-source app called ProofableImage to solve the tampering problem and build trust into an image with blockchain.
Although there are already many interested parties and attempts to use blockchain to solve the image trust problem² ³ ⁴, ProofableImage was built with the hope of demonstrating a more fundamental and general approach to build trust into any digital assets, not just images. The app is built on top of the free proving framework called Proofable we have created here at ProvenDB. It allows user to efficiently create digital certificates for all of their data at once in a single blockchain transaction or incrementally with multiple transactions, and it supports various anchor types: Bitcoin, Ethereum, Hedera, and more. ProofableImage is written in Golang and using the Proofable Go SDK. You can find its full source code here.
Chain of Trust
In my previous post, I proposed an app called ProvenLogs and explained how to use blockchain to build trust into logs. Both apps share the same theory of provenance. When a piece of information (data) is anchored to a blockchain using a Merkle tree, we can work backward using the corresponding Merkle path to assert that the data has existed from the time of the blockchain transaction. The Merkle path itself becomes a certificate for the data. We can generate such a certificate to prove that the image data has existed from a timestamp. Then we can use the certificate to verify an arbitrary image and see whether someone has tampered with it. We can also use the timestamp to verify the certificate itself, such that when we trust the timestamp, we can check the validity of the certificate and then the image. In this way, the surface area for potential forgery attack is reduced to a block. Or if we remembered and trusted the transaction hash of the certificate, the attack surface is reduced to that transaction. The timestamp and transaction hash are the seed information like the first principles in philosophy which we can base our trust on. In this way, a chain of trust is formed from the blockchain seeds to our image contents as in Figure 1.
Unlike the simple Chainpoint-like Merkle tree in ProvenLogs, which can only prove a single hash of a piece of data, we introduced a new Merkle trie in Proofable, which can prove many pieces of arbitrary data simultaneously. The trie is like a key-value store, where users can either hash their data by themselves and then put the hashes in or directly put the data in and let the trie manage and hash the data for them. After we put the data as key-values into a trie, the trie can efficiently (O(log(n))
) derive the Merkle root, which we can then anchor to a blockchain to generate a certificate. With the same certificate as shown in Figure 1, it can not only prove the image as a whole but also prove all the pixel boxes and metadata underneath the image. The hierarchy is a bit like a file system with files and folders, but in terms of key-values. Interestingly, Proofable also provides an API to split a standalone sub-certificate out of the existing certificate to independently prove a subset of the items (key-values) in the hierarchy, e.g., to prove just my Pineapple’s (my cat) head 😛.
Another distinction between ProvenLogs and ProofableImage is that the former stores the data and certificate in the cloud (ProvenDB) for the user while the latter exports the data and certificate, letting the user take care of the storage. The user can choose whatever storage option they prefer, such as storing locally, in their Dropbox, or simply sharing them with their friends.
With the trie in mind, we can now leverage it to prove an image to a blockchain and generate a certificate for the image. The certificate itself can be used not only to tell whether the image has been changed but also to tell what areas of the image have been changed just like a diff tool. In order to create the certificate, we first divide the image into pixel boxes, as shown in Figure 2. The origin is at the top-left corner of the image, and we identify each box by its X
and Y
. Then we put each pixel box into a trie with the key as {“X”: x_value, “Y”: y_value}
and the value as the hash of that pixel box. Once we get the trie, we can anchor it to a blockchain with a simple API call from Proofable. Then we can export the certificate (trie) that contains the trie data and Merkle path. Once we have the certificate, we can test any image against it. To verify an image, we first import the certificate back into the app. Then invoke the verification API, which will emit the expected key-value stream in the lexicographical order of the keys. We can then check each key-value pair by comparing the expected hash with the actual hash of the same pixel box in the selected image. Although we directly consume the key-value stream from the certificate to check each pixel box without relying on the key order, we could also create another stream from the selected image and efficiently compare (diff) it with the certificate stream on the fly. If we were doing that, we need to make sure the selected image stream is also ordered lexicographically by key. An example of this can be found here in Proofable CLI.
In Proofable, we use the term trie
interchangeably with the term certificate
in this post to represent a self-contained structure that contains all the key-values and Merkle paths, which can be visualized as in Figure 3. The screenshot is just a small fraction of the center area of the actual trie Graphviz Dot Graph. The trie is really flattened out because the trie nodes’ number increases exponentially with the depth, which testifies the O(log(n))
efficiency. Please read through this if you would like to know more about the trie concept.
Build Trust into an Image
Now, you understand that the Proofable is an API to help us form a chain of trust from blockchain to our data, and the ProofableImage is built on top it. Let’s walk through an example of using ProofableImage to build trust into an image of my Pineapple, and let me explain some implementation considerations along the way.
Download ProofableImage binary
First, you need to download a pre-built ProofableImage binary for your operating system. To make the distribution process more comfortable, I created the following scripts to help you download a binary in your current working directory.
For macOS and Linux
Copy, paste and run the following bash command in a macOS Terminal:
bash -c "$(eval "$(if [[ $(command -v curl) ]]; then echo "curl -fsSL"; else echo "wget -qO-"; fi) https://raw.githubusercontent.com/SouthbankSoftware/proofable-image/master/install.sh")"
For Windows
Copy, paste and run the following PowerShell command in a PowerShell prompt:
& ([ScriptBlock]::Create((New-Object Net.WebClient).DownloadString('https://raw.githubusercontent.com/SouthbankSoftware/proofable-image/master/install.ps1')))
Of course, you can also build the binary from the Go code. Please refer to this documentation.
Create an access token
When using ProofableImage for the first time, you will be asked to sign up/in to ProvenDB with your Google, Github, Facebook, or email account, so a free access token can be generated and saved locally for you. Then ProofableImage can pick up the token to access the Proofable API service in subsequent interactions.
Create an image certificate
If you have already cloned the repo, the image of my Pineapple is located at images/pineapple.png
. We can then use the following command to create an image certificate for it:
./proofable-image -output-dot-graph images/pineapple.png
Upon success, the certificate will be created beside the image at images/pineapple.png.imgcert
. In your console where you run ProofableImage, you will see some details about the blockchain transaction that anchors our image. When the transaction is confirmed, the chain of trust is formed at the transaction timestamp. An image viewer window will also pop up to show my Pineapple.
During creation, we first set the metadata key-value to store the original image size. Then we use stream programming to process pixel boxes one by one. For each pixel box, we first retrieve its pixel data, then use its location as the key and its hash as the value. After that, we write the key-value to the stream and move to the next pixel box. This process ensures that we can handle large images without a problem and consume the least memory to create the trie in Figure 3. The -output-dot-graph
option is used to output the trie’s Graphviz Dot Graph which can be opened in a viewer like the VSCode Graphviz Preview.
In the metadata key-value, \x00@META/imageSize
is the key and {“X”:1440, “Y”:1080}
is the value. We prefix the key by an invisible null character \x00
to ensure the metadata is always the first key-value emitted when verifying the certificate later on. The original image size is used to handle the situation when the testing image is larger than the original image, and we mark those overflow area as mismatches.
By default, ProofableImage anchors to Hedera Mainnet because it is cheap and fast. You can use the option -anchor-type
to anchor to another blockchain, including Ethereum or Bitcoin. Please refer to this list for all available anchor types in Proofable.
Verify an image against the certificate
With the image certificate, we can now test it on another copy of the image and detect tempering. Let’s manipulate images/pineapple.png
using an editing app by drawing a green circle around Pineapple as in Figure 0. Then save the edited image to images/pineapple.tst.png
, and run the following command to verify it:
./proofable-image -imgcert-path images/pineapple.png.imgcert images/pineapple.tst.png
If we don’t specify the -imgcert-path
option, by default, it will try to use the certificate from images/pineapple.tst.png.imgcert
and create a new one if not exist.
During the verification, the Proofable API first verifies the certificate itself by checking the anchor details and the Merkle path from the blockchain transaction to the certificate’s image trie containing the image key-values, which is the path from the root to the first light blue box in Figure 3. The seeds mentioned in the Chain of Trust section ensure the transaction itself is correct and carry our root hash. Then the Proofable API emits all the key-values contained in the certificate in the lexicographical order of the keys. Client apps should use this stream to further verify the Merkle paths, i.e., from the first light blue box down to leaves in Figure 3. For each key-value emitted, ProofableImage compares the hash in the value with the same pixel box’s hash in the testing image. Finally, the popped image viewer visualizes the differences as the red boxes in Figure 0.
When comparing two pixel boxes, we use an algorithm called perceptive hashing instead of cryptographic hashing like md5 or sha3 to hash each pixel box, because the nature of image compression that changing one area of an image could result in changing other areas of the image. A cryptographic hash function will produce a totally different hash value even a single pixel color is slightly brighter because of the reduction of color palette after compression. Thus, creating lots of false positives. This is even worse when modifying a JPEG image, which compresses the image more. Perceptive hashing, on the other hand, can tolerate such differences caused by color redistribution that is not perceptible and provide us with a distance value to measure the difference instead of a simple match or mismatch. In the right screenshot of Figure 0, each mismatched pixel box has a distance value to indicate how different the pixel box is from the original one. You can use the -distance-tolerance
option to specify a higher tolerance for situations like verifying JPEG images.
The image viewer is built with an immediate mode GUI lib called Gio, which supports efficiently drawing on all major operating systems. I found it quite easy and straight forward to build performance-critical features such as zoom in/out by dragging and resizing the window in Gio.
What’s Next
In this post, I demonstrated how Proofable could be used to create ProofableImage that can build a chain of trust into your image. You can use the same idea to build trust into your own applications with data held in cloud stores, databases, filesystems, or data streams. The data anchored can be digital media, intellectual property, legal or accounting documents, public records and much more. The possibilities are endless! Please give it a try and let us know your opinion.
Last but not least, we should never stop pushing things to their extreme states 😜. There are lots of things we can do to make ProofableImage perfect.
- Cloud storage for easy image storing and sharing
- Embed more image metadata such as location or even whole EXIF entries
- Connect ProofableImage to a camera and generate a certificate for each photo taken on the fly
- Reduce the size of the certificate to fit in a QR code for easy physical sharing. For example, in Figure 3, the hash values generated by the perceptive hashing have the same prefix and suffix, which could be avoided when marshaling and unmarshaling
- Support transfer of ownership by embedding a smart contract
- Combine a consensus algorithm with the certificate to authenticate data replication
- Support creating a sub-certificate out from an existing certificate to prove a sub-area of the image
[1]: Wikipedia. Deepfake. https://en.wikipedia.org/wiki/Deepfake
[2]: Photochain. https://photochain.io/
[3]: Mathies D. (Jan 25 2018). How a blockchain-based digital photo notary is fighting fraud and fake news. https://www.digitaltrends.com/photography/truepic-blochain-image-verification/
[4]: Coindesk. (Feb 21 2020). JPEG on the Blockchain: Image Format Creator Believes Tech Can Fight Copyright Theft. https://www.coindesk.com/jpeg-on-the-blockchain-image-format-creator-believes-tech-can-fight-copyright-theft