Computer Vision Application

Li Yin
Li’s Computer Vision Blogs
6 min readMar 4, 2018

--

All categories are benefiting from deep learning. And deep learning is continuingly creating beyond of imagination applications.

1 Traditional Computer Vision Task

The techniques used include Fully Connected Neural Network (FCN) (image-to-image end-to-end structure), dilated convolutional neural network, residual neural network, Generative adversarial network (GANs)

1.1 Image recognition

Image classification, object detection, semantic segmentation, instance segmentation. For this categories, mostly the non-GANs network.

1.2 Image Super-resolution

GANs

From Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

1.3 Image Denoising

Personally I think image denosing can benefit from super-resolution.

2 New Application

Some of the following content come from https://machinelearningmastery.com/inspirational-applications-deep-learning/

2.1 Automatic Colorization of Black and White Images

Image colorization is the problem of adding color to black and white photographs.

Traditionally this was done by hand with human effort because it is such a difficult task.

Deep learning can be used to use the objects and their context within the photograph to color the image, much like a human operator might approach the problem.

A visual and highly impressive feat.

This capability leverages of the high quality and very large convolutional neural networks trained for ImageNet and co-opted for the problem of image colorization.

Generally the approach involves the use of very large convolutional neural networks and supervised layers that recreate the image with the addition of color.

Colorization of Black and White Photographs
Image taken from Richard Zhang, Phillip Isola and Alexei A. Efros.

Impressively, the same approach can be used to colorize still frames of black and white movies

Further Reading

Papers

2.2 Automatic Handwriting Generation

This is a task where given a corpus of handwriting examples, generate new handwriting for a given word or phrase.

The handwriting is provided as a sequence of coordinates used by a pen when the handwriting samples were created. From this corpus the relationship between the pen movement and the letters is learned and new examples can be generated ad hoc.

What is fascinating is that different styles can be learned and then mimicked. I would love to see this work combined with some forensic hand writing analysis expertise.

Sample of Automatic Handwriting Generation

Further Reading

Papers

2.3 Automatic Text Generation

This is an interesting task, where a corpus of text is learned and from this model new text is generated, word-by-word or character-by-character.

The model is capable of learning how to spell, punctuate, form sentiences and even capture the style of the text in the corpus.

Large recurrent neural networks are used to learn the relationship between items in the sequences of input strings and then generate text. More recently LSTM recurrent neural networks are demonstrating great success on this problem using a character-based model, generating one character at time.

Andrej Karpathy provides many examples in his popular blog post on the topic including:

  • Paul Graham essays
  • Shakespeare
  • Wikipedia articles (including the markup)
  • Algebraic Geometry (with LaTeX markup)
  • Linux Source Code
  • Baby Names

Automatic Text Generation Example of Shakespeare
Example taken from Andrej Karpathy blog post

Further Reading

Papers

2.4 Automatic Image Caption Generation

Automatic image captioning is the task where given an image the system must generate a caption that describes the contents of the image.

In 2014, there were an explosion of deep learning algorithms achieving very impressive results on this problem, leveraging the work from top models for object classification and object detection in photographs.

Once you can detect objects in photographs and generate labels for those objects, you can see that the next step is to turn those labels into a coherent sentence description.

This is one of those results that knocked my socks off and still does. Very impressive indeed.

Generally, the systems involve the use of very large convolutional neural networks for the object detection in the photographs and then a recurrent neural network like an LSTM to turn the labels into a coherent sentence.

Automatic Image Caption Generation
Sample taken from Andrej Karpathy, Li Fei-Fei

These techniques have also been expanded to automatically caption video.

Further Reading

Papers

2.5 Automatic Game Playing

This is a task where a model learns how to play a computer game based only on the pixels on the screen.

This very difficult task is the domain of deep reinforcement models and is the breakthrough that DeepMind (now part of google) is renown for achieving.

This work was expanded and culminated in Google DeepMind’s AlphaGo that beat the world master at the game Go.

Further Reading

Papers

2.6 Predict Pulse Rate from Videos

But some algorithms excel humans. For example, could you tell his pulse rate, by looking at these images only?

Probably not.

But magnifying subtle changes in color [5] reveals his blood flow:

That is impressive to me! http://people.csail.mit.edu/mrub...

2.7 Image-to-Image Translation

Unsupervised Image-to-Image Translation Networks

2.8 High-Resolution Image Synthesis

From semantic segmentation to photo-realistic images

2.9 Face Aging with GANs

G. Antipov, M. Baccouche, and J.-L. Dugelay. Face aging with conditional generative adversarial networks.arXiv, preprint arXiv:1702.01983, 2017.

2.10 More applications with GANs

Check out this site.

--

--