Glasses removal by deep-learning

giangpham
SK Geek
Published in
3 min readNov 5, 2019

An introducing

Glasson is my company’s first product, A solution for virtual try-on eyewear products.
I’m one of the key members of this product, who gets responsibility for the try-on function. The result of my work is good, you can see how ‘real’ the glasses on your face. (Try it at https://glasson.sk-global.biz/en/index.html)

Problem

After publishing the product for a while, there is a rising issue, said by a lot of users.
They cannot see things when they took out their own glasses, and if they not doing that, they see both glasses on their faces, it’s totally no good.
Mean you have to strip out your glasses to use the virtual mirror.

Approach

To find a solution, I start researching the topic of glasses-removal.
Luckily this issue isn’t new, but unfortunately, the current result is not good and also there is no library or something available to integrate into the current system.

So my only option is research papers, understand them and create my own solution. I found that there are two ways to approach this issue:
- First is using image-processing with the algorithm to remove the glasses.
- And second is using AI to remove the glasses.

With the first solution, a lot of maths used to detect and subtract the image pixels containing the glasses and then synthesize the obfuscated facial region through smoothing or inference. Even with ingenuity algorithms, this solution still falls in a lot of cases, such as different skin tones, shadows, magnification, and glare caused by the frames and lenses.

I’m absolutely not thinking I have the ability to improve or create new algorithms in that way. So I choose the second option.

By creating and training a neuron network, AI will find the algorithms for me. In this way, I can avoid maths, algorithms, image processing, … which I’m not good at. Also, the result of AI can generalize across a much broader range of inputs.

Working with AI

To be honest, I’m not a professional in the AI field, I just have a little knowledge of AI and TensorFlow.

Data preparation
The first thing to do is having a bunch of data for training.
Luckily, I have the training data in-store since I have Glasson.
I can create a lot of training data between the face with glasses and without glasses

Building neuron-network (model)
As I know there are some kinds of model for image generation, and the highlight is:
1. Using encoder/decoder
2. using generative adversarial networks (GAN)

I really don’t know what to use.
So I pick up both to try then select the better one.
For encoder/decoder, I tried with U-Net model.
For GAN, I tried with Pix2Pix model.

After training on the same data for both models (Thank Google-Colab, I can reduce a lot of time), I see U-Net bring out more promise results than Pix2Pix.
I choose to continue with U-Net.

The result of U-Net is quite good, but the generator makes a lot of damage to the original image.

Improvement

There is some point can be improved:

training data
To force the model learning better, I add a lot of noise to the data, such as light, color, contrast, distortion, Gaussian noise.

neuron-network
I try to make it deeper by adding more convolution layers, but the result is unstable. After researching, I learn that the stacking layer model is hard to train. ResNet structure is one of the ways to avoid that, so I choose to apply it.

I retrain the network, the result got improved.
But there are some parts of the generated image still get distorted.

Walk around the internet, I found that Attention-Method is the current trend in image generation. So I walk through its paper to understand how it works, then I decide to apply it. After applying Attention, the result got much improved, the glasses almost disappear out of the face and also the distortion got reduces and rare to see.

Result

You can try it at demo-link: https://ai.sk-global.biz/api/glasses_removal

Result of glasses-removal by using AI (This is my boss, not me :D)

The result is not so good and also there is rather much fail case, so it needs to be improved more.

Next step, I will try to modify the model, and apply some new method if found. Maybe I will try again with GAN.

--

--