Epic-Kitchens | Largest Egocentric Video Dataset Gets New Baselines

Synced
Synced
May 6 · 4 min read
Image for post
Image for post

Cooking shows have moved beyond unglamorous narrations like “bring three litres of water to the boil” or even “dice the kiwi.” These days, cooking is performance — dynamic, dramatic, and designed to impact not only the palate but also the other senses. Award-winning chef Emeril Lagasse sums it up with his trademark catchphrase: “BAM!”

The art of cooking and other things that people do and say in kitchens is the focus of the EPIC-KITCHENS dataset. Introduced in 2018, the collection of annotated first-person viewpoint videos of individuals cooking and interacting with objects in their kitchens has enabled AI researchers to explore a variety of challenges in video understanding.

In a new paper, researchers from the University of Bristol, the University of Toronto and the University of Catania explain how they created Epic-Kitchens and introduce new baselines that emphasize the multimodal nature of the largest such egocentric video benchmark.

Image for post
Image for post

Unlike previous action classification benchmarks whose videos tend to be of short duration or recorded in scripted environments, Epic-Kitchen was created to capture unscripted and natural interactions from everyday scenarios — whether one grills chicken with the same gusto as a Lagasse or bakes cookies like a grandma.

The researchers note that the recordings also show the multitasking that home chefs naturally perform, like washing a few dishes during the cooking process. “Such parallel-goal interactions have not been captured in existing datasets, making this both a more realistic as well as a more challenging set of recordings.”

Image for post
Image for post
Image for post
Image for post

The researchers instructed 32 participants covering 10 nationalities and five languages to record their kitchen time for at least three consecutive days using a head-mounted GoPro camera.

The participants then watched their videos and recorded a sort of live commentary of the actions they performed to generate “coarse annotation” speech data. The researchers say recent attempts in image annotations using speech have produced speed-ups of up to 15x when annotating ImageNet. The researchers also believe the participants can describe the actions better than independent observers simply because they were the ones performing the actions.

Some issues emerged, for example, synonyms in the free text that participants used in their annotations. Different people said “put”, “place”, “put down”, “put back”, “leave”, or “return” when describing similar object-placing actions. The researchers grouped such annotations into classes to minimize semantic overlap and to accommodate common approaches to multiclass detection and recognition, where each example is believed to belong to one class only.

The resulting dataset features 55 hours of video (11.5 M frames), and a total of 39.6K action segments with 454.2K labelled object bounding boxes.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

The Epic-Kitchens researchers chose three challenges for testing — object detection, action recognition, and action anticipation — which they say form the base for a higher-level understanding of the participants’ actions and intentions.

The team evaluated several existing methods to demonstrate how challenging Epic-Kitchens is and identify shortcomings in current SOTA approaches. The results on object detection using Faster R-CNN showed that objects in the Epic-Kitchens dataset are generally harder to detect than those in most other current datasets. The team also noted the importance of explicit temporal modelling in action recognition, with models that incorporated temporal modelling in the architectural design showing improved accuracy for example on verb classification tasks.

The paper The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines is on arXiv.

Journalist: Fangyu Cai | Editor: Michael Sarazen

Thinking of contributing to Synced Review? Synced’s new column Share My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.

Image for post
Image for post

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Image for post
Image for post

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

Image for post
Image for post

SyncedReview

We produce professional, authoritative, and…

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store