Omar Barakat
Jan 11, 2018 · 5 min read

Input :

A 2-D RGB-image.

Output :

A depth map for this image. Which is simply a 2-D matrix with the same size as the input image , that contains information relating to the distance of the surfaces of scene objects from the camera viewpoint. ie : the larger the pixel value means the farther the object is.

Loss function :

The following cost function formula is used in this paper : “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture by David Eigen and Rob Fergus” and many other related papers.

D : predicted , D* : ground truth , d = D— D*

Here, ∇xdi and ∇ydi are the horizontal and vertical image gradients of the difference between ith pixel predicted and ground truth depth di

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade