Computer Vision Application
All categories are benefiting from deep learning. And deep learning is continuingly creating beyond of imagination applications.
1 Traditional Computer Vision Task
The techniques used include Fully Connected Neural Network (FCN) (image-to-image end-to-end structure), dilated convolutional neural network, residual neural network, Generative adversarial network (GANs)
1.1 Image recognition
Image classification, object detection, semantic segmentation, instance segmentation. For this categories, mostly the non-GANs network.
1.2 Image Super-resolution
GANs
1.3 Image Denoising
Personally I think image denosing can benefit from super-resolution.
2 New Application
Some of the following content come from https://machinelearningmastery.com/inspirational-applications-deep-learning/
2.1 Automatic Colorization of Black and White Images
Image colorization is the problem of adding color to black and white photographs.
Traditionally this was done by hand with human effort because it is such a difficult task.
Deep learning can be used to use the objects and their context within the photograph to color the image, much like a human operator might approach the problem.
A visual and highly impressive feat.
This capability leverages of the high quality and very large convolutional neural networks trained for ImageNet and co-opted for the problem of image colorization.
Generally the approach involves the use of very large convolutional neural networks and supervised layers that recreate the image with the addition of color.
Colorization of Black and White Photographs
Image taken from Richard Zhang, Phillip Isola and Alexei A. Efros.
Impressively, the same approach can be used to colorize still frames of black and white movies
Further Reading
Papers
- Deep Colorization [pdf], 2015
- Colorful Image Colorization [pdf] (website), 2016
- Learning Representations for Automatic Colorization [pdf] (website), 2016
- Image Colorization with Deep Convolutional Neural Networks [pdf], 2016
2.2 Automatic Handwriting Generation
This is a task where given a corpus of handwriting examples, generate new handwriting for a given word or phrase.
The handwriting is provided as a sequence of coordinates used by a pen when the handwriting samples were created. From this corpus the relationship between the pen movement and the letters is learned and new examples can be generated ad hoc.
What is fascinating is that different styles can be learned and then mimicked. I would love to see this work combined with some forensic hand writing analysis expertise.
Sample of Automatic Handwriting Generation
Further Reading
Papers
2.3 Automatic Text Generation
This is an interesting task, where a corpus of text is learned and from this model new text is generated, word-by-word or character-by-character.
The model is capable of learning how to spell, punctuate, form sentiences and even capture the style of the text in the corpus.
Large recurrent neural networks are used to learn the relationship between items in the sequences of input strings and then generate text. More recently LSTM recurrent neural networks are demonstrating great success on this problem using a character-based model, generating one character at time.
Andrej Karpathy provides many examples in his popular blog post on the topic including:
- Paul Graham essays
- Shakespeare
- Wikipedia articles (including the markup)
- Algebraic Geometry (with LaTeX markup)
- Linux Source Code
- Baby Names
Automatic Text Generation Example of Shakespeare
Example taken from Andrej Karpathy blog post
Further Reading
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Auto-Generating Clickbait With Recurrent Neural Networks
Papers
- Generating Text with Recurrent Neural Networks [pdf], 2011
- Generating Sequences With Recurrent Neural Networks [pdf], 2013
2.4 Automatic Image Caption Generation
Automatic image captioning is the task where given an image the system must generate a caption that describes the contents of the image.
In 2014, there were an explosion of deep learning algorithms achieving very impressive results on this problem, leveraging the work from top models for object classification and object detection in photographs.
Once you can detect objects in photographs and generate labels for those objects, you can see that the next step is to turn those labels into a coherent sentence description.
This is one of those results that knocked my socks off and still does. Very impressive indeed.
Generally, the systems involve the use of very large convolutional neural networks for the object detection in the photographs and then a recurrent neural network like an LSTM to turn the labels into a coherent sentence.
Automatic Image Caption Generation
Sample taken from Andrej Karpathy, Li Fei-Fei
These techniques have also been expanded to automatically caption video.
Further Reading
- A picture is worth a thousand (coherent) words: building a natural description of images
- Rapid Progress in Automatic Image Captioning
Papers
- Deep Visual-Semantic Alignments for Generating Image Descriptions [pdf] (and website), 2015
- Explain Images with Multimodal Recurrent Neural Networks [pdf, 2014]
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description [pdf], 2014
- Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models [pdf], 2014
- Sequence to Sequence — Video to Text [pdf], 2015
2.5 Automatic Game Playing
This is a task where a model learns how to play a computer game based only on the pixels on the screen.
This very difficult task is the domain of deep reinforcement models and is the breakthrough that DeepMind (now part of google) is renown for achieving.
This work was expanded and culminated in Google DeepMind’s AlphaGo that beat the world master at the game Go.
Further Reading
- Deep Reinforcement Learning
- DeepMind YouTube Channel
- Deep Q Learning Demo
- DeepMind’s AI is an Atari gaming pro now
Papers
- Playing Atari with Deep Reinforcement Learning [pdf], 2013
- Human-level control through deep reinforcement learning, 2015
- Mastering the game of Go with deep neural networks and tree search, 2016
2.6 Predict Pulse Rate from Videos
But some algorithms excel humans. For example, could you tell his pulse rate, by looking at these images only?
Probably not.
But magnifying subtle changes in color [5] reveals his blood flow:
That is impressive to me! http://people.csail.mit.edu/mrub...
2.7 Image-to-Image Translation
Unsupervised Image-to-Image Translation Networks
2.8 High-Resolution Image Synthesis
From semantic segmentation to photo-realistic images
2.9 Face Aging with GANs
G. Antipov, M. Baccouche, and J.-L. Dugelay. Face aging with conditional generative adversarial networks.arXiv, preprint arXiv:1702.01983, 2017.
2.10 More applications with GANs
Check out this site.