Streamlining Metadata Extraction with the MDDF Parser

Thomy Lorenzatti
TrackIt
Published in
5 min readJun 26, 2023
Streamlining metadata extraction with the MDDF parser

Introduction

In the media and entertainment industry, metadata refers to data describing various aspects of the media content assets such as movies, TV shows, and music. It includes details such as title, genre, cast, crew, release dates, and other technical specifications.

Metadata plays a crucial role in content discovery, recommendation systems, and efficient content management. It ensures that content assets are properly identified, categorized, and searchable, enabling efficient content organization, rights management, and improved user experiences.

Understanding MDDF, MMC, and MEC

The MovieLabs Digital Distribution Framework (MDDF) is a comprehensive and standardized metadata structure developed by MovieLabs. MDDF was developed to help automate digital workflows and improve supply chain efficiency. It focuses on improving the core aspects of online distribution, including identification and asset delivery.

The MDDF metadata structure consists of various components including core descriptive elements, technical details, and rights information. These components provide a comprehensive representation of the content assets and ensure that essential information is captured and communicated.

MovieLabs has also developed two additional metadata components alongside the MDDF, namely the Media Manifest Core (MMC) and the Media Entertainment Core (MEC). The MMC serves as a foundational specification for organizing and describing content assets, providing a standardized approach for capturing essential metadata. It focuses on core descriptive elements, technical details, and rights information. MEC builds upon the MMC and extends its capabilities by encompassing a broader range of metadata elements, including creative and business-oriented information to facilitate comprehensive metadata representation.

The MDDF Parser

The MDDF Parser is a software tool specifically developed to extract, interpret, and parse metadata according to the MDDF specifications. It automates the process of extracting valuable information from diverse sources, making it easily accessible for content management and distribution.

Overview of MDDF Parser Functionality and Capabilities

A TypeScript library is used to parse XML data from MEC and MMC files. The parser follows the XML schema definitions provided by MovieLabs and converts the XML files into TypeScript interfaces. This allows developers to more readily access and manipulate data.

Explanation of the Parsing Process

The MDDF Parser employs advanced parsing algorithms to analyze input data and extract pertinent metadata based on MDDF specifications. By interpreting both structured and unstructured data sources, it converts them into a standardized format that adheres to the MDDF schema.

Supported File Formats

The parser is compatible with a wide range of file formats including XML, JSON, and CSV, which are commonly used for storing metadata associated with popular media file formats.

Use Cases for the MDDF Parser in Content Management and Distribution

The MDDF Parser can be utilized in various content management and distribution scenarios. Applications include efficient content ingestion, automated metadata extraction, content cataloging, content rights management, and optimized content discovery.

Key Features of the MDDF Parser

Comprehensive Metadata Extraction

The MDDF Parser excels at extracting comprehensive metadata from various sources, capturing essential information about content assets, such as title, cast, crew, genre, release dates, and technical specifications.

Adaptability to Changing MDDF Versions and Specifications

With a focus on flexibility, the MDDF Parser is built to accommodate new versions and specifications of the MDDF framework. It ensures seamless compatibility and adaptability to evolving industry standards, empowering content providers to stay up-to-date with metadata requirements.

Versatility Across Media Types

From movies to TV shows, documentaries, and beyond, the MDDF Parser offers broad support for various media formats. It enhances content management efficiency by catering to the unique metadata demands associated with each media type.

Benefits of the MDDF Parser

MDDF, MEC, MMC parser. Benefits of the parser

Improved Efficiency in Content Ingestion and Metadata Processing

By automating the metadata extraction process, the MDDF Parser significantly reduces the manual effort and time required for content ingestion. It ensures accurate and consistent metadata structure, minimizing errors and improving content management.

Enhanced Accuracy and Consistency of Metadata Across Platforms

The MDDF Parser ensures the consistency and accuracy of metadata across different platforms and services. It eliminates discrepancies, improves content identification, and enhances user experiences by providing reliable and standardized information.

Installation and Setup

The MDDF Parser library can be found on theMDDF GitHub repository. Detailed installation instructions, dependencies, and configuration guidelines are provided.

Users are advised to follow established best practices to ensure seamless integration of the MDDF Parser into their existing content management workflows. This includes defining clear objectives, considering data quality and validation processes, and implementing error handling and reporting mechanisms.

Usage Example

The screenshot above displays a test script utilizing a sample MEC file. The script identifies the ‘Current Title’ of the movie and asks the user to enter a desired replacement.

By leveraging the Typescript library, the title within the XML file is edited. It is worth noting that during the XML -> Typescript Object -> XML conversion process, only the title is altered, the rest of the file is left untouched.

Conclusion

The MDDF Parser serves as a powerful tool for extracting, parsing, and managing metadata according to the MovieLabs Digital Distribution Framework. By leveraging its capabilities, content providers can streamline content ingestion, improve metadata accuracy, and facilitate seamless content distribution.

About TrackIt

TrackIt is an Amazon Web Services Advanced Tier Services Partner specializing in cloud management, consulting, and software development solutions based in Marina del Rey, CA.

TrackIt specializes in Modern Software Development, DevOps, Infrastructure-As-Code, Serverless, CI/CD, and Containerization with specialized expertise in Media & Entertainment workflows, High-Performance Computing environments, and data storage.

In addition to providing cloud management, consulting, and modern software development services, TrackIt also provides an open-source AWS cost management tool that allows users to optimize their costs and resources on AWS.

--

--