🎥🙈 Using AI to automate phobia safe film production

4 min readJul 27, 2021

At Google I/O this year I used ML to build an automated video processing pipeline that detects what’s in a video and automatically hides the parts you would find scary.

see the video of this in action at Google I/O

Background

Whilst watching some movies with some friends, we discovered that one of our friends has a deathly fear snakes and so we had a tell them exactly when the snakes were going to appear , and sometimes we got this wrong 😬 (watching Harry Potter was a poor choice in hindsight).

Solution

I built an automated video processing pipeline using Video Intelligence API that would scan through a video to detect all the things within the video. Then if it detects something that matches with the phobia that we told the system about, it would use the Transcoder API to insert an overlay and hide it from view. So the solution was based on these two APIs:

Video Intelligence API

A powerful video analyse API on Google Cloud Platform. It uses machine learning to detect what is in a video e.g faces, people, objects, text etc.

If you want to see all that it can do you can check out this interactive demo. Also check out the docs here.

Transcoder API

A scalable cloud based video transcoding API on Google Cloud Platform that lets you perform complex video transcoding jobs. Some of its’ features include :

generating different bit rates and formats
generating thumbnails
inserting overlays
inserting ad breaks

see more details on the API in the docs.

Connecting these together

The system pipeline uses the Label Detection feature of the Video Intelligence API (see visualisation of this feature here) to detect what’s in each scene of the video and return a list of labels with time segments. If any of those labels match the phobia we are scared of (e.g snakes, swans, bread, birds) we then use the Transcoder API to inject a full screen overlay on the video for those time segments, hiding them from view.

To get these API’s to run in a scalable way I used a chain of Cloud Storage Buckets and triggered Cloud Functions that automatically run when a new video is uploaded.

exact tech stack including storage and triggers

Tricky challenge

In order to tell the system what my phobia/phobias were without some fancy UI I came up with the idea of just using the ‘see no evil’ emoji 🙈 in the uploaded file name followed by the phobia. Then get the cloud function code to read the filename and extract the phobia from that. See the specific code in the github repo.

Output

Finally I would just drag and drop a scary video into the input storage bucket with a filename like test_nature🙈swan.mp4 …

… and in less time that it would take to watch the video the Video Intelligence API has already scanned the video and the Transcoder API has already generated the new non-scary video. Check out the I/O video ☝️ that demo segment was all realtime!

generated phobia safe video (hiding swans)

Future work

In the future I want to use the same kind of pipeline to solve video processing problems surrounding sports analysis and sports video production e.g highlights generation. You can see the kind of things I do with sport in this episode of Dale Markowitz’s Making with ML series:

Takeaways

Building ML powered video production pipeline need only take you plugging together a couple of API’s.

GitHub - ZackAkil/phobia-safe-videos-with-ml: How you can use ML to make watching videos safer for…

How you can use ML to make watching videos safer for people with unique and/or serious phobias. - GitHub …

github.com

Follow and ask me questions on twitter ZackAkil!