What would it take to make your own video footage at 360 degrees?
The answer: a bunch of cameras and some open source software.
First we need to talk about image stitching. With no surprise, OpenCV has a great sample implementation for this and often gives impressive seamless results. The thing is we want to produce video, not static images, so we need to talk next about camera synchronization on the Raspberry Pi in order to capture frames at the same time. And finally, how do we make a 360 degrees video with all of this?
Stitching is the process of merging multiple overlapping images in a seamless way, so that it looks like one single image.
Stitching, the simple way
Step 1: Take a picture, rotate the camera, take another picture. Be careful to keep some overlap.
Step 2: Run a feature algorithm to find the keypoints of each picture. The colorful little circles are the keypoints. There are more than 2000 in each picture here.
Step 3: Match the keypoints which are similar. There are 580 matches here, mostly on the sunflower.
Step 4: Compute the homography matrix and warp one picture to take the point of view of the other one. Here is the right picture (2) from the point of of view of the left picture (1).
Step 5: Display the picture side-by-side.
Rotation vs Translation
Something important to know: if the camera is only rotated and never translated, stitching will be great at any distance, e.g., turning a camera on a tripod for static images like in the sunflower example in the previous section.
The problems start when the camera is translated. In that case stitching will only be best at a given distance. Since multiple recording cameras cannot be physically on the same point, camera translation is unavoidable.
See the example below. If we try to align the mountain summit with the flower top, any translation of the camera with move the background and the different pictures will not overlap as nicely as they do with pure rotation.
Fortunately, OpenCV’s sample stitching implementation already comes with a mitigation for this: seam masks with GraphCut. This algorithm was initially designed to create seamless textures. A misalignment is very visible when it crosses a contrasted line. GraphCut will attempt to minimize this.
For instance in this example, GraphCut avoided a cut through the highly contrasted rooftop over blue sky (on the right). Similarly on the left, the cut favored a dark area in the trees where misalignment is less visible.
Some specs: the RPi camera v2 has a horizontal field of view of 62 degrees. We use eight cameras for the 360° view. Each Raspberry Pi encodes the video stream in H264 in hardware and stores it on its SD-card. At the end of the recording session, all eight streams are collected and processed on another computer. The Raspberry Pi can capture and encode at maximum 42fps with the full sensor. See RPi camera module docs.
To carry the camera wheel around, it was mounted on a bike cart, together with a battery and inverter for power supply. Yes, compactness was clearly not a priority here…
For stitching to work while moving, all eight cameras need to capture a frame at the same time. They cannot run on an independent clock. At 40fps, the frame time is 25ms (time between two consecutive frames). In the worst case, frame capture time would be different by 12.5ms and the delay would be visible in the video after stitching even at low speed.
I modified the camera tool raspivid to add a software PLL in the video frame CPU callback. A “phase-locked loop” (aka PLL) synchronizes two cyclic events (i.e., clocks) together.
The software PLL changes the camera framerate at runtime to align frame capture time on the Linux system clock.
All eight system clocks are synchronized over Ethernet using PTP (which stands for Precision Time Protocol). Even though the Raspberry Pi Ethernet lacks the dedicated hardware for high accuracy PTP clocks (hardware timestamping), it still often achieves a clock synchronization well under 1ms using PTP in software mode. While recording the video, the network is only used for PTP and no other communication is made.
Step1: Record video
Power on, let PTP stabilize (might take a few minutes), start recording on each Raspberry Pi.
Step2: Align frames
Copy video streams, split video in single JPEG files along with capture timing information in text file (i.e., modified raspivid PTS file).
Find the matching frames.
Step3: Calibrate transformation
Run OpenCV stitching_detailed on some matching frames to find transformation matrices. Depending of the feature algorithm (e.g., SURF, ORB, …) and the details visible in the overlapping area of the frames, matching will succeed or not.
In case of success, each frame has two matrices: the camera matrix (aka “K”) which encodes the camera characteristics (e.g., focal, aspect ratio), and the rotation matrix (aka “R”) which encodes the camera rotation.
As explained in the beginning of this article, stitching will assumes a pure rotation, which is not the case in real life. Each matching frames of the stream will have different matrices. Only one set of matrices is selected for the next step.
Step4: Stitch frames
Rebuild stitching_detailed with the matrices selected in the previous step as hardcoded transformation, and stitch all the video frames.
Step5: Encode video
Assemble all stitched JPEG files into a video again and encode it back to H264.
The 360 video, finally
The result is a very wide video stream. Note that out-of-sync frames are replaced by solid white instead of showing the delayed frame.
If this video is too wide to be displayed on your screen, here is a slit version:
A big limitation of video presented here is that stitching is run independently on each frame, without taking into account the previous frame:
- The GraphCut algorithm chooses a different path every time and video looks glitchy.
- Matrix calibration is done only once for all frames.
With few improvements in stitching and clock synchronization, and the addition of a GPS receiver, Raspberry Pi street view is just around the corner…
See further developments about the street view idea in a new article:
An idea to make your own Google Street view
Scanning the streets might not need a expensive setup.
raspivid modified by Inatech:
Look into the releases section for the pre-compiled binary raspivid-inatech.
Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. Graphcut textures: image and video synthesis using graph cuts. In ACM Transactions on Graphics (ToG), volume 22, pages 277–286. ACM, 2003. https://dl.acm.org/doi/10.1145/882262.882264
OpenCV stitching_detailed, source code: https://github.com/opencv/opencv/blob/master/samples/cpp/stitching_detailed.cpp
Original Raspberry Pi raspivid, source code: https://github.com/raspberrypi/userland/blob/master/host_applications/linux/apps/raspicam/RaspiVid.c
To address the issue of glitchy looking GraphCut seams, the paper below suggests a promising spatial-temporal content-preserving warping algorithm:
To address the issue of PTP clock synchronization, see this deeper analysis in an additional article: