Image of VisCommentator from (Z. Chen et al., 2022)

The Future of Sports Visualizations: VisCommentator:

Published in

VisUMD

5 min readOct 28, 2023

The world of sports is buzzing with the rise of visualizations, thanks to their versatile applications and their knack for simplifying game insights and statistics. These elaborate graphics like ball tracking and trajectories, seen in everything from soccer to table tennis, not only amplify our viewing experiences but also provide profound insights into the game’s intricacies. However, integrating these visuals into sports videos is no walk in the park. While sports analysts excel at deciphering the game, video editing might not be their forte. Hence, bringing these visuals to life requires a harmonious collaboration between sports analysts and skilled video editors.

VisCommentator: A Game-Changer for Sports Analysts

Enter “VisCommentator”. Spawned from comprehensive research, this tool is the answer to seamless visual data integration into sports videos. Using state-of-the-art machine learning, VisCommentator transforms raw table tennis footage into enriched visual tales. But it’s more than just a tool. It empowers sports analysts, allowing them to interact directly with the visuals and choose which data points to emphasize. The system’s brilliance is further illuminated by its visualization recommendation engine, which tailors visuals based on narrative flow and user choices. And the best part? Its data-driven design makes it versatile enough for other racket sports, given the right data models.

Behind the Scenes: The Research

The birth of “VisCommentator” was no accident. Researchers delved deep, examining 233 augmented sports videos to understand the nuances of creating augmented content. Their findings were distilled into two key levels and four essential questions to guide the augmentation process which ensures the visuals enhance rather than overshadow the viewing experience.

Element Level:

What data is suitable for augmenting a sports video?
How are these data visually presented?

Clip Level:

How should the data be organized for varied narrative intents?
How should visuals be temporally arranged concerning the raw video?

Fig: Steps involved in the creation of VisCommentator.

Research Goals & Strategies

With their extensive groundwork, the researchers set three ambitious goals:

Effortless data extraction from sports videos.
Enabling analysts to engage directly with video data.
Offering visualization suggestions driven by their effectiveness and narrative flow.

To achieve these, they employed distinct strategies:

Goal 1: In their quest to integrate deeper insights from table tennis videos, researchers utilized a comprehensive bottom-up method. They employed sophisticated deep learning models, such as RestNet-50 and TTNet, to precisely capture details like ball movement, table positioning, and player postures in real time. While TTNet adeptly detected ball and table positions, BodyPix was instrumental in identifying players and their actions. This detailed extraction enabled insights like ball velocity and player movement. Beyond the object-specific data, they identified critical events, like ball bounces and player strokes, by analyzing the interactions between the ball and the player. On a tactical level, rather than relying solely on computer vision models, they combined rule-based methods and expert insights. This approach inferred player strategies and potential game outcomes. Despite some limitations in automatically capturing every tactical detail, the system offers flexibility, allowing the integration of external data for a richer analysis.

Goals 2 & 3: To make the analyst’s job easier, the researchers enhanced videos with visual cues. Events were color-coded on a timeline for easy navigation, and users could interact directly with elements like players or the ball. VisCommentator’s smart tool suggests the best visuals based on the story’s flow. To achieve this, they crafted a model based on conditional probability distribution, represented as p=f((d,v)∣O). In simpler terms, this formula calculates the likelihood (or probability, p) that a particular visual ( v ) is the ideal match for specific data ( d ) within a chosen story sequence ( O ). Underlying this was a probability model, ensuring that the chosen visual perfectly complements the data and the narrative. They even used a two-track rendering system for precise visual timings, especially for non-linear stories.

Fig: Image showing how predictions are made based on ball and person tracking.

Fig: Flow of steps involved in creating Augmented Video in VisCommentator.

Technical Backbone

VisCommentator stands tall on a robust browser/server architecture. The browser end, powered by HTML, CSS, and JavaScript, offers a seamless user interface. Advanced tools like OffscreenCanvas ensure smooth video enhancements. Meanwhile, the server side, sculpted with Node.js and TypeScript, harnesses platforms like PyTorch and TensorFlow.js, driving the system’s deep learning capabilities.

Validation & Feedback

When tested with seven sports analysts, the system received glowing reviews. They found it easy to learn (scoring 6.00) and user-friendly (scoring 6.86). Standout features included the system’s ability to pull data (scoring 6.29), its interactive interface (scoring 6.86), and smart visualization suggestions (scoring 6.57). Overall, users seemed quite satisfied with what VisCommentator had to offer.

Room for Growth

Yet, like all pioneering ventures, VisCommentator isn’t without its limitations. Its findings, due to a restricted pool of sports experts, have a qualitative tilt. While it effectively captures prevalent patterns, it may not cover all possible scenarios. The tool’s reception from a broader audience remains an exciting frontier for exploration.

References:

Z. Chen et al., “Augmenting Sports Videos with VisCommentator,” in IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 824–834, Jan. 2022, doi: 10.1109/TVCG.2021.3114806.