Using Deep Learning to turn an iPhone into a 3D camera

Arthur Swiniarski

5 min readSep 7, 2016

This piece is an example of how Neural Networks can today be very easily used as building blocks of products.

TL;DR 5 Steps

The article

A few months ago I saw this post trending on Hacker News:

It explained how its authors used 3D movies, along with their 2D version, to train a Neural Network (CNN) to convert a 2D picture or video into 3D picture or video.

What a great idea! But how could we use the resulting “Deep3D” model?

3D

To create 3D your brain uses two slightly different images of what is in front of him. The small space between your two eyes gives those images a different perspective of the scene allowing your brain to create 3D.

So in order for a video to be seen in 3D you have to send a different image to each of the viewer’s eye.

Here is what the Neural Network is capable of :

Left Eye Image— Right Eye Image. Not the same, look closely !

Usually 3D content is created using a 3D Camera. Special cameras that mimic the space between your eyes with the space between their 2 lenses.

Not anymore !

Watching 3D

Once you have 3D content you need to find a way to show each part, left and right, only to the eye that needs to see it.

This can be done though a number of ways, like blue and red glasses.

Here we are going to allow the users to enjoy the 3D content they create with their phone on their phone.

We will use Apple built-in 3D Engine SceneKit to create a viewer app and use it to watch our content in a Google Cardboard.

Here is a representation of a SceneKit 3D Scene:

A SceneKit Scene, its camera, and the camera (the cube) field of view (blue).

It is just a plain 3D space (you can picture a sphere along with X, Y, Z coordinates) in which you have a virtual camera that acts as the eye of the user and where you can add any virtual object .

You map the movements of the camera to the movements of the phone gyroscope and you have a 3D space to explore on your phone.

In order to make it Google Cardboard compatible you can start with my SceneKit VR Toolkit project. Here is what it looks like on an iPhone:

You’ll need to reconfigure it not to show cubes but only one flat surface, acting as a virtual screen, on which we will project a video or image.

Apple SceneKit documentation is very intuitive and you should be up and running in no time.

To use a video as a SceneKit texture you can refer to my 360 video player project (it also does 360 stereoscopic 3D video).

Here is a schema of what is happening in SceneKit and on your iPhone when you run the VR Toolkit:

The two, slightly separated, virtual cameras will give the impression of depth while looking at the 3D Scene.

But we don’t only want to perceive the scene depth. We want 3D video or photo. We want both eyes to see something different while looking at the same screen.

We need a trick.

Easy enough: we will build two identical scenes, one per eye, and project the right eye image on the right virtual screen and the left eye image on the left virtual screen.

Scheme to the rescue:

Each scene is identical which tricks the brain into thinking there is only one. But both screen are independent and allow for 3D content viewing by showing different images.

Wrapping up

Now you need a PC with a discrete GPU in order to install and run the CNN.

Process some pictures or videos taken from the iPhone on it and display the resulting files in your viewer app! You can also very easily create an Apple TV app to display it, 3D compatible TV automatically detect 3D content.

Here is what it looks like in the app with a friend photo processed :