At Google I/O this year I used ML to build an automated video processing pipeline that detects whatās in a video and automatically hides the parts you would find scary.
Background
Whilst watching some movies with some friends, we discovered that one of our friends has a deathly fear snakes and so we had a tell them exactly when the snakes were going to appear , and sometimes we got this wrong š¬ (watching Harry Potter was a poor choice in hindsight).
Solution
I built an automated video processing pipeline using Video Intelligence API that would scan through a video to detect all the things within the video. Then if it detects something that matches with the phobia that we told the system about, it would use the Transcoder API to insert an overlay and hide it from view. So the solution was based on these two APIs:
Video Intelligence API
A powerful video analyse API on Google Cloud Platform. It uses machine learning to detect what is in a video e.g faces, people, objects, text etc.
If you want to see all that it can do you can check out this interactive demo. Also check out the docs here.
Transcoder API
A scalable cloud based video transcoding API on Google Cloud Platform that lets you perform complex video transcoding jobs. Some of itsā features include :
- generating different bit rates and formats
- generating thumbnails
- inserting overlays
- inserting ad breaks
see more details on the API in the docs.
Connecting these together
The system pipeline uses the Label Detection feature of the Video Intelligence API (see visualisation of this feature here) to detect whatās in each scene of the video and return a list of labels with time segments. If any of those labels match the phobia we are scared of (e.g snakes, swans, bread, birds) we then use the Transcoder API to inject a full screen overlay on the video for those time segments, hiding them from view.
To get these APIās to run in a scalable way I used a chain of Cloud Storage Buckets and triggered Cloud Functions that automatically run when a new video is uploaded.
Tricky challenge
In order to tell the system what my phobia/phobias were without some fancy UI I came up with the idea of just using the āsee no evilā emoji š in the uploaded file name followed by the phobia. Then get the cloud function code to read the filename and extract the phobia from that. See the specific code in the github repo.
Output
Finally I would just drag and drop a scary video into the input storage bucket with a filename like test_naturešswan.mp4 ā¦
ā¦ and in less time that it would take to watch the video the Video Intelligence API has already scanned the video and the Transcoder API has already generated the new non-scary video. Check out the I/O video āļø that demo segment was all realtime!
Future work
In the future I want to use the same kind of pipeline to solve video processing problems surrounding sports analysis and sports video production e.g highlights generation. You can see the kind of things I do with sport in this episode of Dale Markowitzās Making with ML series:
Takeaways
Building ML powered video production pipeline need only take you plugging together a couple of APIās.
Follow and ask me questions on twitter ZackAkil!