The Cube
Creating a real time bullet time rig with 12 BlackMagic Cameras, FFMPEG & nodeJS.
Overview:
A recent project we completed working with Hovercraft Studio for Nike and Jordan was a 30ft x 30ft Cube that was lined on the inside with LED panels from floor to ceiling to create an interactive experience that would capture users on a real time bullet time rig and create a shareable piece of video content of the user.
The room was controlled via Kinect that would detect a users position and make the entire space react based on position and movement. The idea was to create a feeling of defying gravity and a part of that was to capture the user in mid air floating with a frozen moment and then continue playing out.
The Kinect and Walls were handled by Justin Gitlin from Mode Set, with the help of Legwork’s motion Team for graphics. The camera rig setup was done by Jasper Gray from Futuristic Films and myself doing the programming for capturing the videos, editing and delivering to the user.
The final product that was produced was this:
The technical challenges on this project were massive and a lot of prototyping had to be done in a very short period of time due to the overall timeline of the project. This was all done in around 9 days, from first meeting to flying to LA to install.
With that, here is how I made a real time bullet time rig that captured, edited and rendered a final product ready for delivery in 30 seconds.
TL;RD — There were a lot of technical challenges and learnings throughout this project, a lot of batch scripts and minor tweaking to get things just right, but the final product was pretty awesome and ensured all users ended up with a great piece of content featuring them. There was a lot of “in theory this should work”
Hardware Overview:
3 PowerSpec G416 Desktop Computer; Intel Core i7–6700K Processor 4.0GHz; Microsoft Windows 10 Pro 64-bit; G.Skill 16GB DDR4–3000 RAM;
12 BlackMagic Micro Cinema cameras
12 BlackMagic Micro HDMI to SDI converter boxes
12 SDI Cables
12 HDMI Cables
Software:
NodeJS
FFMPEG
BlackMagic SDK
Building & Prototypes:
The whole experience was intended to create this moment of gravity defying time stoping video which made the user feel and look like a pro so coming up with a camera array that would accomplish this was critical. We have used BlackMagics on similar projects in the past but never had I worked with 12 at the same time and consuming 12 live feeds of video in real time to process.
The first step was confirming that the hardware would support doing multiple video feeds at 1080p. The DeckLink Duo 2 has 4 Duplexing I/O channels that can be set to be all input or output on an individual level. So in theory if we could get multiple streams happening then we could capture that data and use it.
BlackMagic has an SDK which is written in C++ but with our timeline we didn’t have time to write a full interface with the SDK, luckily FFMPEG has direct support for DeckLink built in. This meant that I could spin up multiple FFPLAY windows and stream independent channels and do what I wanted with the data.
The following opens a play with dshow.
One thing that was missing from documentation on the FFMPEG and BlackMagic side that took a bit to understand was the formats and access to ports.
In the above example we are specifying port 1@12. To understand what the 12 means on channel 1 we have to list options from FFMPEG.
When this runs it will list out the different video options that are available form the camera. In this instance we are getting 1920x1080p at 29.97fps
You NEED to specify this, otherwise you will not get any data and will be left scratching your head over why you can see the camera but can’t get a feed out of it. The camera needs to also be set to this same setting. So on the camera I had it set to Prores LT 1920x1080 at 29.97fps.
The next big test was to do this with multiple cameras at the same time and write the videos to the computer and check if there was any latency between them at all or any other weirdness.
So creating 2 batch files:
would record 2 videos from both cameras and save them as camera-1.mp4 & camera-2.mp4
Another interesting thing was the port numbering in relation to the cameras.
Port 1 = camera 1
Port 2 = camera 3
Port 3 = camera 2
Port 4 = camera 4
I am not 100% sure as to why this is the case, but it was again an interesting discovery that was not very well documented.
After this test it was confirmed that everything in fact did work! The captures came through as HD streams and everything was great. The file size was interesting… each video was about 1GB so with 4 videos capturing we would be dealing with 4GB of video x 3 so 12GB real time video.
The machines handled this well, but of course this comes with delays to the render time and transfer time, so we will need to down sample the videos a bit to make it a bit more stable and less abusive on the network and computers.
After confirming the prototypes we moved on to scaling the whole thing by buying 2 more machines and 11 more cameras and associated pieces of hardware the 12 conversion boxes that take HDMI and turn it into SDI data.
The new boxes were setup with base software and a local network was created so all of the computers would talk to each other over a local network and the master machine would be able to reach out to the internet.
This also allowed all three machines to be controlled with 1 monitor and keyboard.
All three machines lived in a space that was about 2ft wide between the outer wall and LED panels in a spaced dubbed “the troll hole”
The machines were living inside there and being controlled with remote desktop to do pulls from github.
The next step was working out how to trigger all of the machines to start capture at the same time. The capture command was coming from the Kinect app which was watching for a user to cross a certain line at which point we determined timing for the capture to start to ensure everyone would get a solid shot and final video.
For initial prototype I went with just using a master batch file to call a capture batch file on all 3 machines. The idea being this would get kicked off and the other machines would do just their part.
Something to know about this process is that it isn’t always 100% reliable. There could be latency between the calls, the executing master file would also seem to be run, but the data was feeding directly to that machine vs the different machines so the feeds got very slow and dropped frames became a very big issue.
The master machine was running an express app with sockets to listen for the Kinect on when to start capture, so I wound up rolling that same app out to the other 2 machines and found that having each machine listening for the socket message was actually much more efficient even though it meant to have 3 servers running.
Using the forever node module was a big help as well as that kept the servers running and restarting if there was a failure at any point.
The next thing that had to happen was for the machines to start capture and then when complete transfer those captures to the master machine to be edited and rendered into our final clip.
For this I wound up down sampling the videos to 720p which greatly reduced file size, but still kept enough quality for the final product. Initially at 1080 we were dealing with around 500mb of data after compression which was a tad big to depend on transfers always going through on top of the time involved, with sizing down we got more around 100mb.
Each camera needed to be tuned to a specific focus point. For this we built an enclosure that would house the cameras and all associated cables and power.
The capture batch files looked something like this:
Something interesting here was the use of the .mkv extension which acts as a container for our raw data — the video being captured is raw from the camera so it’s size is pretty large to start with and rather than running any compression or codecs on it right away it was faster to just save it raw in that file.
These batch files were called within the node app using the child_process module.
This would listen for the start capture and being the capture process, it would also block any other capture requests that might slip through on accident so only 1 capture could happen at a time. This ensured our assets being created were all part of the same capture.
After the capture was complete the app would call transfer which would run the raw files through a quick down sample and apply mp4 codec.
On the master machine I was listening for the other machines to callback with capture complete which also included transferring all data at which point I had all captured data on the master machine ready for editing and rendering.
This would kick off the render batch file which was more of just a master manifest file that called different operations sequentially:
Note the timeouts here — this was done in order to avoid any kind of hiccup that might happen while executing the scripts.
The first thing we needed was to render a still image from each video at a specific timestamp, this is how the bullet time was achieved. Then the images were turned into a sequence 1.jpg, 2.jpg… and so on were turned into another mp4. Then the final video was rendered.
The final render consisted of taking video from camera 1 and 12 and mixing in the image sequence between them.
This was pretty straight forward, I would take the start and end buffer videos for the final piece and create temporary streams of those as well as the sequence video and camera 1 & 12.
I needed to offset the start time of camera 12 by 1.3 seconds so that it would pickup right where the last frame of the sequence was.
One thing that was also interesting about this was that there was an incredibly slight latency between the master machines capture and the slave machines, so the master had to be adjusted so that cameras 1–4 were .3 seconds faster than the slaves.
Something that should be noted about doing live feed captures from BlackMagics via the DeckLink’s is that there is a required buffer time to avoid any dropped frames. I am capturing 4 seconds of video but really only using 3 seconds total, the first second from all videos is being cut because the initial “capture” period has a tendency to serve up dropped frames initially.
After the capture and render were complete I needed to serve up a form for a user to enter their information and get the video emailed to them. The entire process from capture to exit was around 30 seconds, so they needed to be leaving the cube and walking up to an iPad that was waiting with a gif preview and form ready to send.
The render process was around 30–45 seconds total round trip from transfer to final product. This varied because of the transfer process between machines which would sometimes hit lag or hiccups ( this is again why file size was kept down as much as possible ).
For the form I needed it to be looking at our local servers, but it was on an iPad which based on our network setup wasn’t working great, I needed a static name to access the server consistently which wasn’t always happening with Windows. I wound up using nGrok with a custom subdomain so the iPads could access it.
One thing that nGrok doesn’t have is support for sockets, so I wound up having to do polling from the iPad to check for the current state of the render and when things were completed.
With all of the pieces connected, the actual cube graphics and interactions, the experience was something like this:
So that’s how we made a real time bullet time rig with BlackMagic Cameras, FFMPEG & NodeJS. It was a massive team effort that created an awesome final product. 10/10 would do again.
Bonus video: ( I take my t-rex costume with me everywhere I go and obviously needed to capture this )
Like this story? Get more in the weekly newsletter from CinematicVR.