What Makes a Good Watermark?

Durability. Invisibility. Reliability. Scalability.

TrufoAI
Trufo
3 min readFeb 20, 2024

--

There are four primary characteristics of a good watermark. Of course, depending on the purpose of the watermark, the relative importance and specific details of each characteristic may differ, but a good watermark will have a healthy balance of all four.

Durability

The first characteristic is how well the watermark stays computer-readable after the content is edited or distributed. This, also known as robustness, is the main advantage of watermarks over metadata solutions. For example, while metadata is easily lost (e.g. the default iPhone image export setting) and easily changed (it can be directly edited), a good watermark embeds immutable content directly into the content data.

This durability should be calibrated against both benign and adversarial actions.

Against mundane benign alterations, such as compression or cropping, the watermark should be able to survive intact as often as possible. If not, then the watermark would not be useful most of the time.

Against targeted adversarial attacks, the watermark should primarily have protections built in against forgery attacks. Preventing watermark removal is not easy to guarantee (though it can be made quite tedious to do, and is quite important in preventing piracy), but adding in measures that prevent the watermark data from being changed or a fake watermark from being created is doable via cryptography.

Invisibility

Watermarks should also be largely invisible, or imperceptible to the human eye. This generally means that the application of the watermark should not change the underlying content too much, which poses challenges because the magnitude of the watermark signal must be much lower than the noise of the content data.

Good watermarks will factor in how human vision is processed and how content data is stored to maximize the invisibility of the watermark.

Invisibility can also be targeted. For example, our patterened watermark is designed to be completely invisible in 95% of the content, but convey information in a non-disruptive manner in the remaining 5%.

Lastly, in the context of generated images, a different type of invisibility is possible, because there is no original content to stay true to. Watermarks that directly intervene in the content generation process can change the end result significantly (because the content itself is changed) without the watermark effect being visible.

Reliability

For any good signal, both the false positive rate and the false negative rate should be low. If a watermark exists, the data should be decodable, and the data decoded should be correct. If a watermark does not exist, decoding should not return any data.

This is one of the main weaknesses of detection models that aim to use AI to detect AI. The accuracy rate is low, to the point that OpenAI recalled their ChatGPT detection product, and furthermore does not provide any guarantee against future AI models.

Cryptography, on the other hand, provides provable confidence. By incorporating cryptographic tools into watermarks, a similar style of confidence can be attained.

As an aside, while digital content is usually presented in a clean format, more physical interactions, such as the scanning of physical objects or whatever we may end up with in AR/VR, requires consistency in more freeform settings.

Scalability

Content generation is widespread, so an effective watermark needs to be effective at a large scale too. This means that the encoding and decoding processes should be fast and accessible. Specifically, the time and memory cost of compute should be similar to standard image processing or video streaming functions. This means under 100ms for an HD image.

The decoding side is especially important, because when decoding in a public setting, it is unknown what watermark is in the content. In a more fragmented setting, every watermark needs to be separately decoded, and the user’s patience dwindles quickly.

Lastly, data storage is limited. Some watermarking schemes require a copy of the original content for decoding. Others decode fine, but require a copy of the original content for authentication. Since digital content tends to be quite large in size, and there is a lot of digital content, this becomes a bit impractical quite quickly.

--

--