Computational Photography: from the Amateur’s Lens

Yish Lim
5 min readDec 10, 2018

--

I’m a huge fan of Google’s hardware. My previous and current phones are the Pixel 2XL and the Pixel 3XL. Prior to that, I was a diehard iPhone user so — what got me to make the switch?

I take a lot of pictures. Before this year, whenever I travelled I’d bring a DSLR, a GoPro with a gimbal and sometimes a drone. I always had an additional bag to carry all my cameras, extra batteries and all the chargers. When the second-generation Google Pixels were announced and started topping all the lists for having ‘the best smartphone camera’, I was sold. Sure, I still like to use all my fancy cameras, but when I’m doing something where I don’t want to carry all the stuffs, I’m more than happy with my phone’s camera.

Google’s buzzword when talking about the Pixel’s camera is computational photography. Take a look at this example from Julian Chokkattu’s article on Digital Trends:

The Pixel is “excellent”.

Where the iPhones use two camera lenses to judge distances (just like how we have two eyes), the Pixel uses just one camera lens and does the background blurring effect using pure software. I find that so incredible.

What is Computational Photography?

Wikipedia (yes good source) says, “Computational photography refers to digital image capture and processing techniques that use digital computation instead of optical processes.” Makes sense. But this still sounds like something only some big-shot computer science whizz (that apparently works at Google) can do.

I then find out that there are several undergraduate courses on computational photography! One amazing resource I found while doing my research is the course website for Brown University’s undergraduate Computer Science class: Computational Photography and Image Manipulation.

When you take a picture, a lot of information is stored. Professional photographers often shoot in RAW formats, that store information about exposure, intensity, colors, and along with that, metadata such as the camera settings used, locations, and more. JPEGs are much more straightforward. There is an entire process of compressing RAW photos into JPEGs, but we won’t go into that.

In each pixel of a JPEG, there are the levels for RGB. RGB tells us how much red, green and blue light is present in that particular pixel for us to perceive that particular color. Every color perceivable by us can be represented by an RGB code, which consists of three numbers, each ranging from 0 to 255. Hex codes can be derived from RGB color codes through some decently simple formula. Here are some colors:

Screenshot from https://www.rapidtables.com/web/color/RGB_Color.html

Anyways, because the information in JPEG pixels are just these numbers, we can access them as an array! It is then functions that are written to analyze and manipulate this data that allows us to edit pictures and perform computational photography things.

Here are some things I attempted to understand:

High Dynamic Range (HDR)

You know how when you take a picture with your phone and it’s on HDR and it takes forever?

So it takes that long because the camera is actually capturing the image several times, to obtain said image at different exposure levels. So now you have a series of images with different exposure levels. Let’s say that we have 5 images. Pixel X in each of these five images represents the same color in reality. Without going into equations, taking the RGB values of several (even just 3–5) sets of such five pixels, we can generate what is called a response function. This response function is what maps over the entire image to produce the HDR effect!

This deck goes far more into detail. (and has some really cool examples of how smartphones do HDR!)

Frequency Domains

There’s a thing in Mathematics that states that every function can be represented as some combination of sine and cosine waves. This process of expressing functions as trigonometrically is the Fourier Transform process. Let’s assume we already know how to do this, and say that each pixel in a JPEG can be expressed as a trigonometric function.

Quick lesson on trigonometry:

  • Sine, cosine and tangent functions essentially have two components: period and amplitude
  • 3cos2x has an amplitude of 3 and a period of π

Here I have a screenshot of a slide from the Frequency Domain class of the course:

Credit: http://cs.brown.edu/courses/csci1290/lectures/12_FrequencyDomain.pdf

What’s really awesome, is that the left and right photographs both represent the same information. In grossly simplified terms, the photograph on the right represents the intensity of the image.

With the JPEG in the form of trigonometric waves, it’s a lot easier to manipulate and filter pixels to edit the image.

Blurring

The term kernel is used to define functions that are applied to pixels. It’s a filter. Kernels take the form of small matrices and are applied by centering the matrix on every pixel, and summing up the values to get a new pixel value. Wikipedia has a pretty clear article on kernels and some of their functions.

One of the most basic filters is a Gaussian filter, used for blurring. Gaussian matrices are applied onto the pixels and produces a blurred effect. We’re essentially taking a weighted average of neighboring pixels and the amount of blur would depend on the kernel size, or the size of the matrix.

thanks, Wikipedia

Different matrices perform different functions. If you scroll down the Wikipedia link, you can see how applying a different kernel can achieve edge detection.

Going back to the Pixel portrait photos, I’m going to guess that they combine blurring and edge-detection techniques with machine learning to produce the best portrait-mode pictures with pure software.

Moving Forward…

Unfortunately, none of what I went through comprises of machine learning so I didn’t figure out what these Google engineers are doing to make these awesome Pixel cameras. Now that we know how JPEG data can be easily accessed and manipulated, we have an idea of how these computational photography processes begin.

If you find any errors in my amateur explanation, please shoot me a message to let me know!

--

--