Deep Learning and It’s Importance in Computer Vision

Deep learning is a modern extension of the classical neural network technique. One can view deep learning as a neural network with many layers. Rapid development of modern computing enables deep learning to build up neural networks with a large number of layers, which is infeasible for classical neural networks. As such, deep learning can explore more complex non-linear patterns in the data making it more sophisticated. Another reason for the recent popularity of deep learning is due to the increase of the volume and complexity of data. In addition, majority of deep learning is used in imaging analysis, which makes sense given that images are naturally complex and high volume.

Different from the classical neural network, deep learning uses more hidden layers so that the algorithms can handle complex data with various structures. In the medical applications, the commonly used deep learning algorithms include convolution neural network (CNN), recurrent neural network, self-organizing maps, Boltzman machines, and AutoEncoders.

The CNN is developed in viewing of the incompetence of the classical ML algorithms when handling high dimensional data, that is, data with a large number of traits. Traditionally, the ML algorithms are designed to analyse data when the number of traits is small. However, the image data are naturally high-dimensional because each image normally contains thousands of pixels as traits. One solution is to perform dimension reduction: first pre-select a subset of pixels as features, and then perform the ML algorithms on the resulting lower dimensional features. However, heuristic feature selection procedures may lose information in the images. Unsupervised learning techniques such as PCA or clustering can be used for data-driven dimension reduction.

The CNN was first proposed and advocated for the high-dimensional image analysis by Yann Lecun, Yoshua Bengio et al. The inputs for CNN are the properly normalised pixel values on the images. The CNN then transfers the pixel values in the image through weighting in the convolution layers and sampling in the subsampling layers alternatively. The final output is a recursive function of the weighted input values. The weights are trained to minimise the average error between the outcomes and the predictions. The implementation of CNN has been included in popular software packages such as Caffe from Berkeley AI Research, CNTK from Microsoft and TensorFlow from Google.

Recently, I used CNN to successfully classify some spatial imagery from National Center for Disease Control (NCDC) as either inhabited or abandoned, this is to help them in disease monitoring and control and makes their work more effective. I achieved an accuracy of 69% but after several tunning, I increased my accuracy to 75% which is not excellent but with more dataset available for training, i intend to increase it to as high as 92%.

CNN has also been implemented in the medical area to assist disease diagnosis. It has been used to diagnose congenital cataract disease through learning the ocular images. The CNN yields over 90% accuracy on diagnosis and treatment suggestion. CNN has also been utilized to identify skin cancer from clinical images. The proportions of correctly predicted malignant lesions (i.e, sensitivity) and benign lesions (i.e, specificity) are both over 90%, which indicates the superior performance of the CNN. Gulshan also applied the CNN to detect referable diabetic retinopathy through the retinal fundus photographs. The sensitivity and specificity of the algorithm are both over 90%, which demonstrates the effectiveness of using the technique on the diagnosis of diabetes. It is worth mentioning that in all these applications, the performance of the CNN is competitive against experienced physicians in the accuracy for classifying both normal and disease cases.