Implementing a Dynamic Live Video Watermarking Pipeline

Arthur Knoepflin
TrackIt
Published in
7 min readSep 17, 2024

Video watermarking plays a crucial role in safeguarding digital content by adding a layer of protection against unauthorized distribution and piracy. The utility of watermarking extends to a wide range of applications, from copyright protection for media creators to enhancing content security in the ever-expanding digital landscape.

Watermarks serve a dual purpose: they act as a deterrent against unauthorized content sharing and, when necessary, provide a means to trace the origin of a leak. By imprinting user IDs and source IP addresses onto an image or video, watermarks facilitate the identification of the responsible party. The inclusion of watermarks also serves as a disincentive for content leaks, as it renders content less enjoyable and diminishes the incentive to disseminate it.

The following sections describe the process of building a dynamic live video watermarking pipeline. It addresses the need for a cost-efficient and browser-compatible solution to meet the evolving demands of today’s media distribution.

Workflow Diagram for HLS and DASH Pipelines

Architecture Diagram for the Watermarking Pipeline

Tools and Components

  • FFmpeg: Open-source multimedia framework used to format the source media and add watermarks to video frames using the filter_complex system.
  • FFprobe: Employed to scan and prepare video files for playback.
  • AWS Lambda: Generates virtual DASH and HLS manifests and running FFmpeg.
  • Amazon API Gateway: Manages API routing and facilitates communication between components.
  • Amazon CloudFront: Caches generated videos, optimizing content delivery.
  • Amazon DynamoDB: Stores playback session and tracking information in a database.

Ensuring Browser Compatibility

One of the foremost priorities when designing the video watermarking pipeline is ensuring compatibility with web browsers. This is a pivotal requirement as media support can vary widely from one browser to another and across different environments. To achieve this compatibility, the playback solution is broken down into three distinct layers:

  • Streaming Protocols: HLS, DASH, MSS, RTMP, and WebRTC.
  • Media Container: MP4, MPEG-TS, WebM, etc.
  • Media Codec: H.264, H.265, AV1, VP9, etc.

An additional requirement that influenced the pipeline design is the need for Digital Rights Management (DRM). This security measure ensures that only authorized clients can access and view the content.

The DRM landscape is complex and offers multiple solutions, each with its platform and software restrictions. At the time of writing, there is no single DRM system that’s compatible with every browser and operating system. To bridge this compatibility gap, a two-fold DRM approach was adopted:

  • Widevine: For Edge, Chrome, and Firefox on Windows, Linux, and MacOS platforms
  • Fairplay: For Safari on MacOS

It is important to note that while these DRM solutions offer robust security, they come with certain limitations. Fairplay exclusively works with HLS, while Widevine provides only marginal support for HLS.

To ensure comprehensive DRM coverage across all browsers and platforms, both Fairplay and Widevine were implemented to ensure compatibility with HLS and MPEG-DASH — the chosen video streaming formats.

Media codec

In the video watermarking pipeline, one of the core components is the codec, an acronym for COder-DECoder. It serves as the innermost part of the system, and understanding its role is pivotal.

At its essence, a codec is a sophisticated piece of software designed to fulfill a crucial role: encoding and decoding streams of pictures or audio. Its primary objectives are to compress the media file size for optimal storage space, and then decompress the original sequence of images or audio upon delivery while preserving as much of the original image and sound quality as possible.

It is important to emphasize that creating a codec is a highly intricate endeavor, often taking years of development to reach maturity. Given the complexity and resource-intensive nature of codec development, creating a fresh codec was not a viable option. A simpler solution was to leverage existing solutions that have a proven track record.

At the time of writing, the most extensively supported codec is H.264. This codec made its debut in 2003 and was one of the initial codecs to be integrated into web browsers. H.264 boasts a significant advantage over its successors (such as H.265 and VP9), in that it demands fewer CPU resources for encoding content. This attribute is of paramount importance since the watermarking pipeline discussed in this article necessitates real-time re-encoding of media.

Container

A container plays a pivotal role in the watermarking pipeline, acting as the essential transport system for multiple video, audio, ancillary data tracks, metadata, and synchronization, all encapsulated within a single file.

Two container formats were considered: MP4 and MPEG-TS (Transport Stream). Each of these container formats offers its own unique advantages.

MPEG-TS, historically designed for audio and video broadcasting, was initially considered since it is the default container choice for HLS (HTTP Live Streaming). However, the watermarking pipeline also necessitates the implementation of MPEG-DASH for Digital Rights Management (DRM) purposes. This introduces a significant challenge as MPEG-TS would need to function within the DASH framework. While technically feasible, this integration poses compatibility concerns.

To circumvent these issues, a different approach was adopted. An extension of the format known as Fragmented MP4 (fMP4) was chosen. The advantage offered by this format is that DASH has been intentionally designed to work seamlessly with fMP4, ensuring effortless integration. Furthermore, thanks to the emergence of the Common Media Application Format (CMAF), a specification allowing for the use of a single media container and codec for various streaming protocols, fMP4 is now also a part of HLS.

Pipeline Workflow

Step 1: Preparing the Files

Every video file is transcoded into a lower-quality version known as the Mezzanine Proxy. This transcoding also creates keyframes at fixed intervals, allowing for efficient video segment generation. From a technical standpoint, this is not an essential part of the pipeline.

However, it was advantageous to incorporate this process during the file re-encoding, taking full advantage of the workflow. The rationale behind this decision was to ensure consistent keyframe placement, as the absence of such uniformity would result in variable segment durations, which could complicate content delivery in the DASH framework. FFprobe is used to determine keyframe positions, and this information is stored in DynamoDB.

Step 2: Requesting Playback

To initiate a playback session, the front end requests an endpoint, which ensures the availability and type of the requested track. The watermark configuration is fetched from DynamoDB and copied to the session table, guaranteeing that the session’s configuration remains consistent throughout the playback session even if the source watermark configuration changes.

A unique identifier and playback URLs are generated and provided to the client, with one URL for the DASH manifest and another for the HLS master file.

Step 3: Generating the Manifests

The system generates DASH and HLS manifests on-demand. This process leverages our knowledge of keyframe positions to predict segment durations. For HLS, a “master” manifest pointing to different rendition groups, audio, and subtitles, as well as a “stream” manifest listing all segments for a given track, are created. For DASH, all information is contained in a single file, simplifying the process.

These manifests are not stored but are directly streamed to the client each time the lambda function is called. To facilitate the retrieval of track and watermark configurations for each incoming call, a mandatory ‘sessionId’ parameter is introduced as a query parameter. This parameter ensures that the correct configuration is fetched from DynamoDB, given that the endpoint is the same across all playback sessions.

Step 4: Generating the Segments

Video segments are generated based on the segment number and session ID. FFmpeg is used to create these segments, and a complex expression filter is built from the watermarking configuration stored in the database.

Additionally, the frame size and bitrate are adjusted as part of the adaptive bitrate logic. The video segments are streamed to the client. A custom script is employed to set segment-specific values for decode time and segment number in the MP4 header. For audio segments, static CMFC files with AAC codec are generated using MediaConvert and are streamed directly to the client.

Conclusion

Building a dynamic live video watermarking pipeline that meets specific requirements is a complex yet essential task. The solution presented in this article leverages a variety of tools and components to create an efficient and browser-compatible system for adding watermarks to video content. we

About TrackIt

TrackIt is an international AWS cloud consulting, systems integration, and software development firm headquartered in Marina del Rey, CA.

We have built our reputation on helping media companies architect and implement cost-effective, reliable, and scalable Media & Entertainment workflows in the cloud. These include streaming and on-demand video solutions, media asset management, and archiving, incorporating the latest AI technology to build bespoke media solutions tailored to customer requirements.

Cloud-native software development is at the foundation of what we do. We specialize in Application Modernization, Containerization, Infrastructure as Code and event-driven serverless architectures by leveraging the latest AWS services. Along with our Managed Services offerings which provide 24/7 cloud infrastructure maintenance and support, we are able to provide complete solutions for the media industry.

--

--