PAPERS 101 - How An AI Learned To See In The Dark?

Nishank Sharma
the ML blog
Published in
6 min readJun 1, 2018

Hello, I am Nishank and welcome to PAPERS 101, a series where we will discuss new and exciting research going on in the field of Machine Learning and Artificial Intelligence!

I would like to thank Two Minute Papers and Károly Zsolnai-Fehér for introducing me to this paper via their YouTube channel.

Before we start, don’t forget to subscribe to our newsletter to never miss a story from clickbait!

What the heck?

In the age of smartphone cameras, low-light photography is a must have. All flagship phones support low-light photography but as you must have noticed, the results are not that good.

This is because they mostly use traditional denoising and deblurring techniques which are effective for removing some noise but fail miserably in extreme conditions like darkness which is a big challenge.

This paper is a solution to that challenge.

Computer Vision is a field in Artificial Intelligence which revolves around getting visual input and either making sense of the input received or manipulating the given input in some way to get the desired output. The paper that we are concerned about here works on the second use case.

In their paperLearning to See in the Dark, researchers Chen Chen (UIUC), Qifeng Chen (Intel Labs), Jia Xu (Intel Labs) and Vladlen Koltun (Intel Labs) have proposed a model which can see in extremely low lighting conditions that are almost comparable to darkness using a new image processing pipelines using Convolutional Neural Network.

The results are astounding!

If you take a picture using a camera with no low light photography (low ISO), it will look something like this-

If you click the same picture with a low light camera like that in flagship smartphones, the result would be something like what is shown below. Notice how grainy the picture is because of the scaling filters like BM3D and lower ISO.

Scaling + BM3D Denoising

Now what the fully-convolutional network does is take the first image and processes it to obtain the image below (Yes! I am not kidding.)

Image After Processing Through CNN

Wait, What!

The model here uses an end to end trained fully-Convolutional Network which makes the use of a dataset of raw short-exposure night-time images, with corresponding long-exposure reference images. This makes obtaining results from extreme scenarios like night photography very easy and efficient as compared to traditional denoising and deblurring techniques.

How is the CNN trained?

The CNN is trained on two sets of images.

  1. A dimly lit (almost dark) scene or short-exposure picture as an input.
  2. A corresponding normal lighting scene or long-exposure picture of the same scene as target.

The neural net is trained on a dataset containing 5094 raw short-exposure images and their corresponding long-exposure images.

So if you want to train the network, you will have to first click a photograph under normal lighting conditions which will be used as a target variable to obtain error by the network.

Next, you will have to click a low exposure photograph of the same scene so that it looks dark. This will be given as an input to the network while training.

The pair of these two photographs will produce an (input,output) pair for the network upon which it will be trained to be used on low-light test images.

FCN Pipeline

The training was carried out using L1 loss and an Adam Optimizer, which gave results which of exceptional quality and makes the model most efficient till now!

How good is it?

This model puts traditional deblurring and denoising methods to shame. Here is a comparison with traditional BM3D denoising -

Image Using Traditional Scaling
Image Using Scaling + BM3D Denoising
Image After Processing Through CNN

You can immediately notice the difference.

If we compare the performance of the CNN with flagship mobile cameras using different parameters like exposure and lighting, the results may surprise you!

Let’s consider a scenario, where 8 candles are lit in a dark room and changes are observed in a photograph of a mannequin by different cameras as number of candles are halved every time.

8 Candles
4 Candles

Notice how the quality of the photo is getting decreased in iPhone X and Google Pixel 2 as compared to Sony a7S. This is because on one hand, Sony camera has a better ISO than both the mobile cameras, the mobile cameras are using traditional deblurring and denoising techniques to obtain a low light photo from raw data.

2 Candles
1 Candle

Now the photo is almost dark in both the smartphone cameras and it’s conclusive that they have failed in extreme conditions like darkness.

However, the photograph on the Sony camera is still clear because, as you may have noticed, it has cleverly changed it’s exposure time from 0.8 seconds to 1.6 seconds, allowing more light to come in and hence giving a better photograph.

This is impractical for smartphone cameras as it would give a blurry image and so it can only be done on expensive, high-end cameras which have a better reflective lens and a highly efficient sensor.

But, let’s see what happens if we decrease the exposure time to 1/30 of a second i.e, how well the cameras perform in extremely low light and low exposure time.

1 Candle - Low Exposure

As you can see at this stage all the cameras have failed and we observe total darkness. This is because of two different reasons -

  1. In mobile cameras, this has happened because they use traditional deblurring and denoising techniques like BM3D denoising which fail when the amount of light is very less and it has nothing to work with.
  2. In the Sony camera, this has happened because of less exposure time so less amount of light was allowed into the camera, hence a dark image.

But hold on. Can we do something, to get a better image in 1/30 of a second of exposure time and conditions of extremely low light (<0.1 lux)?

Be ready to get amazed as this paper has done exactly what we want!

If the raw sensor data from the above image (the one with the darkest photos) with 1 candle and low exposure, is fed into the fully-convolutional network, we will get an output which looks like this!

What! Are You Kidding Me!

Surprised? I was too!

I am hoping this technology will be implemented in smartphone cameras really soon and you will start enjoying the extremely low light photography that machine learning has to offer!

And that’s the power of Machine Learning and Neural Networks.

It’s applications like these which motivate more and more people to study Machine Learning and Neural Networks. This is the precise reason I started clickbait and the reason PAPERS 101 came into existence!

Feel free to post in the comments, what you think about this paper.

Also, suggestions and appreciation about PAPERS 101 and clickbait, in general, are welcomed in the comments.

Well, that’s it for this week and until next time!

Adios!

--

--

Nishank Sharma
the ML blog

Hello, I’m Nishank. I design beautiful, usable and enjoyable interfaces.