Human eyes, while excellent visual sensor in most situations, have a limited range of detectability. For instance, a person’s pulse can technically be visually detected from their wrist by sight; each pulse through the vein physically causes a movement on the skin. This motion is invisible to humans because the motion is simply too small for human eyes to detect.
Sound waves also cause vibration in all objects, which cause motion in a micro-scale also too small for the human eye to detect. Instead, we make up for the lack of sensitivity in sight by other sensory inputs such as touch and hearing.
But it turns out that where our eyes fail us, software can open up a new avenue of understanding and dissecting the reality:
Recently, a group of researchers led by Abe Davis in Stanford University developed a technology that extrapolates sound from a soundless video. This technology takes advantage of the aforementioned micromotion in objects. Sound causes objects to vibrate in a very small scale; the frequency and intensity of the vibration are directly dependent on the sound wave and the material properties of the vibrating object itself. The group dubbed this technology visual vibrometry, because scientists like taking ordinary words and putting -metry at the end to make it sciencey.
The group was able to play a song to a plant, record the vibration with a high-speed, high-resolution camera, and recover the sound from the video based on the visuals alone:
They call this the passive recovery of sound from video:
omg
The group was able to use this technology to play music through a pair of earphones, record the earphones, and recover the music through the video. Apparently the quality of the recovered song was so good, Shazam could identify it.
It goes without saying that there are limitations; the recording device must be relatively still and have a high resolution. But if visual cues for sound can be extracted at such scale using software, what other modalities can we augment with the same principle?
But wait, there’s more!
More recently, the same group expanded their visual vibrometry technique to observe motions from an object to model hypothetical motions in new situations.
From videos as short as 5 seconds, the group sampled the motions of an object to extrapolate its material properties. This knowledge is then used to simulate hypothetical force being applied to the object to generate motion in the video.
This is different from previous simulations of object in CGI, where motions were predicted only on simulated objects with known properties. Davis’ work allows us to predict how a real object in the real world will response to new situations.
It’s not hard to imagine this technology being used to greatly improve augmented reality to become more ‘real’. Maybe to the point where constructed reality and the real reality becomes indistinguishable to the puny humans.
The future is basically Black Mirror waiting to happen, guys.
// This post, inspired by this TED talk. (Watch it, it’s so cool)
// Find me, get to know me at at my personal site!