I recently did a talk for the Swift User Group at Lyft, covering some of the challenges that come up when trying to perform image recognition on a mobile device. The talk was intended as an update to a talk I did last year: convolutional neural networks, swift and iOS 11, so if you have an extra half an hour you should watch that as well.
Below, I will loosely discuss the four main points.
Four Important Talking Points of Neural Networks
1. What to Study
2. Convolutional Neural Networks
3. Neural Network Architecture
4. Mobile Specific Architecture & Next Steps
1. What to Study (1:40)
There is some background on what you should study if you are interested in getting up to speed on neural networks in general. Very specifically, I would suggest you do some math review, complete the deeplearning.ai course and then tackle the fast.ai course from this year. After that you will be up to speed on pytorch (my current favorite deep learning framework) and ready to tackle real-world problems. Kaggle is great place to start!
2. Convolutional Neural Networks (7:20)
At the beginning, we did an overview of how convolutional neural networks operate by looking at how perceptrons (single node neural networks) are built out of a classic machine learning algorithm, linear regression. From there, we looked at some different neural network architectures to try and explain the theory behind feed forward neural networks, the most common type in image recognition.
After that we did a quick overview of how convolutions work and looked at how to combine 3x3 striding and 2x2 maxpool convolutions with a feed forward network to produce a network called vggnet, which produced state of the art results in 2014. Finally, we looked at modifying our vgg network slightly to produce Resnets, which are a solid state of the art architecture that forms the building blocks of a number of modern techniques.
If you’re interested in different applications of Resnets, see my presentation solving go for a discussion of how the Alpha Go family of engines work.
3. Mobile Image Recognition (19:30)
We skipped through the neural network architectures I demoed in last year’s presentation (feed-forward neural network, vggnet, inception, resnet, mobilenets, yolo) and then looked at three new general image recognition architectures that I think you should add to your toolbox: densenet, darknet and unet. Densenet is network conceptually similar to resnet that achieves lower error rates, but at the cost of a more expensive multiply operator. Darknet is a tiny network, which I think is valuable conceptually. U-net is more oriented towards image segmentation, but as GPU’s get more and more RAM in the next few years I feel architectures like U-net, which can easily utilize extra variables, are going to become more common.
4. Mobile Specific Architectures & Next Steps (22:15)
We examined some mobile specific architectures, specifically squeezenet, mobilenet, shufflenet, senet, mobilenet v2 and finally nasnet mobile. Squeezenet is an network architecture from 2016 that achieved impressive results in only 5MB of ram. Shufflenet and Senet are interesting new ideas in the field. Mobilenet v2 is a solid update to the v1 architecture. Nasnet and other computer-generated (Amoebanet) architectures are only going to become more prevalent.
At the end (30:30), we discussed the upcoming pytorch conference and 1.0 release. Then we took a look at tensorflow on swift, which I am going to do a talk about next month! [UPDATE: read/watch here: tensorflow and swift]
As ever, QuarkWorks is available to help with any software application project — web, mobile, and more! If you are interested in our services you can check out our website. We would love to answer any questions you have! Just reach out to us on our Twitter, Facebook, LinkedIn, or Instagram.