Why Machine Learning Is So Important For The Future Of Photography

This is an attempt to clear the air, reduce the hype, and bring some basic understanding to the subject of why machine learning matters so much to the future of photography.

I don’t doubt for a moment that the hype around machine learning has roots in some amazing accomplishments, many of which I use almost daily, but when it comes to how the technology affects the world of photography, we need to dig deeper. The story really does start with recent innovations in machine learning, more specifically, the machine learning algorithms that support computer vision. Yes, you could go back further and delve into some of the work done during the last Machine Learning / Artificial Intelligence “Winter” but that’s digging a bit deeper than we need to illustrate why the future of photography depends on these technologies. This CNET post comes close to getting the point across on what the future impact of this technology might be but lacks some real-world examples. What it does get right is that the recent combination within smartphones of machine learning and the latest image processing techniques is what gives us this glimpse into the future of photography.

“Machine Learning” — Gives “computers the ability to learn without being explicitly programmed.“ — Arthur Samuel

Machine learning in its most basic form is the ability for a computer, or more specifically, an algorithm, to learn from the provided data. Think of it this way; you can teach a child what something looks like by showing them a photo, and if you show them more photos of similar objects they then can then learn to identify an entire class of objects. Machine learning algorithms work essentially the same way (with some important limitations we will discuss shortly) where you show the algorithm large amounts of example data and it can then “learn” to classify that data. If you do it right, that trained algorithm might even be able to classify data it has not been trained on, which is commonly referred to as “generalization”. So that’s the super-simple non-math explanation of how you might build what the machine learning world calls a “classifier” but do understand this is only scratching the surface of how this technology functions. To better understand the details, and for enough matrix multiplication and derivatives you can handle, please check out Andrew Ng’s very popular series of courses on Coursera. I’ve taken several of these courses and they are worth your time if you really want to understand what is going on under the machine learning hood. As you might expect, machine learning does have its limits. Specifically, there is the obvious issue of needing enough data to train the algorithm, and the less obvious issue of making sure the results you get are consistent, especially when applied to photography.

Enter Computational Photography

The term “Computational Photography” is getting lots of press lately as Apple and Google roll out their latest devices but it is worth noting that computational photography is by no means exclusive to smartphones. Nor was it invented recently. In fact, basic computational photography started years ago with the very first digital cameras that had built-in image enhancements. In fact, almost every digital camera sold today employs some basic form of “computational” image processing in that they have to take data coming from the digital image sensor and render it into a format that can then be used (such as JPEG, or even some RAW formats). Some cameras do very little processing of image data, and others, such as Fujifilm & Sony mirrorless cameras, do amazing image enhancements that mimic old films types and other impressive effects, all performed directly in-camera. So why all the hype? If all digital cameras have some basic computational photography built-in then why are we hearing so much about it now? Well, the truth is mostly related to the combination of machine learning and classic image processing. Specifically, the implementation of new technology within the latest generation of smartphones where machine learning is directly combined with traditional image processing has everyone super-excited, and for good reason.

Machine Learning + Image Processing =?

So when machine learning and in-camera image processing hook up what kind of result do we get? Let’s just say, that’s where the magic happens. Why? Let’s jump back to that classifier discussion we had earlier, where the machine learning algorithm was able to classify photos into various categories. Imagine we can now use that same algorithm to classify photos into various scenes such as; landscape, portrait, night, daytime, etc. And this algorithm would perform this classification very accurately, giving the camera exactly what it needs to correctly set all of the color information, white balance, exposure, and sharpening, all automatically when you take the photo. But wait, what if you can go way beyond just identifying the scene? Why stop there? Consider this possibility; what if we trained that same algorithm on exactly what various scenes should look like, right down to the pixel level? Consider a typical nighttime scene with varied lighting such as the one below.

Slightly cropped, but otherwise un-edited image captured from an iPhone XS Max.

You have some lights that are very bright (the neon signs) and some elements of the scene which are very dimly lit, such as the lights just under the bar. This is a classic problem for most cameras due to the limits of their sensor’s dynamic range (the ability to capture very light and very dark areas in a single photo). However, the example photo below exhibits none of those common problems, all of the lighting has been captured exactly as it looked to the human eye. Magic? Nope, just the combination of that per-pixel machine learning algorithm and multiple images captures done directly in the camera’s hardware. The machine learning model, trained to know what the right luminescence value should be for each pixel in the image chooses the closest match from the set of images taken which is then combined to form the final image. No magic needed, but the results sure look magical (IMHO).

Machine Learning + Image Processing = Increased Capabilities!

So we have established that through the use of machine learning and advanced image processing we can capture the lighting of a scene in a way that more closely preserves how it appeared when the photo was taken. And that’s fantastic, but it is not the whole story. The photo above hides another secret, one that is not obvious, but almost as magical as getting the lighting correct. The above photo was taken on an iPhone XS Max which has a tiny sensor and yet the image contains hardly any noise, despite the image showing clear signs of motion (people). How is that possible? Poor lighting, motion, and yet we don’t see the typical grainy noise so common in smartphone photos? Why? As you might already suspect, again the answer is a combination of machine learning and the image capture process. By combining several images together, the image processing system can eliminate significant amounts of noise for areas of the photo that are not moving. For those areas where motion was detected (by checking pixel data between each frame that was captured), a machine learning algorithm was used to remove the noise.

Think about how this technology might perform when used with a much larger image sensor!

So let’s recap our analysis. We saw how machine learning combined with a multi-frame image capture allowed us to take a photo that was poorly lit with high dynamic range and motion, using a 1/2.5" sized image sensor contained within a smartphone which had nearly perfect lighting and virtually no noise, while still retaining high amounts of detail. Take a brief moment to let that sink in. Then think about how this technology might perform when used with a much larger image sensor, presumably in a dedicated camera such as those previously mentioned. It might take a few more years for the major camera companies to really start implementing this type of technology. To support today’s machine learning algorithms they need to beef up the computational capacity in their cameras, but just imagine the possibilities.

Another example iPhone XS Max photo showing how much dynamic range has been captured in the shadows. Edited in Snapseed to make it easier to see the shadow detail.