MistyWest
Published in

MistyWest

Machine Learning I— Breaking down hand writing with ML

by Justin Girard

Classification of Handwritten Numbers

When trying to get a grasp on how machine learning (ML) works, it is often useful to experiment with existing ML problems. One popular starting point is the MNIST database, a collection of image scans of handwritten numbers by clerics and high school students. If you look at the MNIST TensorFlow demo, we see an introductory approach to detecting handwritten digits with 91% accuracy. The general problem is this, given a digit, we would like to predict if it is a real number from 0–9.

Image source: http://www.kdnuggets.com/

A simple sequence may be interpreted as “5041” by a well trained classifier. In the TensorFlow example, however, the data is labelled; this means each training-digit has a corresponding “value” of 0–9 assigned, so the classifier, in this case a softmax regressor, is told what output is expected. This is called supervised.

Principal Component Analysis of the MNIST data set

Now, a contrary domain is unsupervised learning. This problem is much more difficult, and in the MNIST space, can be described as detecting how many kinds of symbols there are. In a human sense, this would be akin to guessing all the kinds of symbols in an alien language. Thus, our unsupervised goal is to have our system discover the different classes of digits (0–9) by itself!

We decided that a two-step approach would be appropriate to accomplish this. Discussed here is the first step: running the MNIST data through a PCA algorithm.

PCA Introduction

PCA (Principal Component Analysis) is a process of reducing the dimensions of a set of data while trying to maintain statistically significant information on that set. For example, if we had a 2-dimensional set of data (x,y), we would try to find some new axis on which this data’s original meaning is somewhat preserved. Broadly, this can be coined dimensionality reduction, but in another sense projecting data onto a new axis can lead to a more separable set of data. (We will Ignore the extremely useful kernel trick here.)

http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues/140579#140579

PCA Implementation

There are many tutorials about how to apply PCA to MNIST, but it is worthwhile discussing what exactly the discovered eigenvectors represent. In essence, from left to right, the presence of the vectors signify the most variance in the dataset. So a shape that looks something like an 0 distinguishes many of the characters (0–9), then second, a symbol that appears to only model the top and bottom of a seven or two. In a real sense, a linear combination of the first four eigenvectors can create any human readable number. The last selling point of PCA, is that the late vectors, with smaller eigenvalues, may only model noise or random perturbations.

http://www.cs.ox.ac.uk/people/varun.kanade/teaching/ML-HT2016/practicals/practical6.pdf

Clustering Implementation

After dimensionality reduction, we may theorize that the data, at least along the first principal components, is separable. Essentially, below, each color is a single number, and thus we can see that this manner the number one (yellow) is likely to fall in a different cluster than that of a zero (red).

Image source: http://colah.github.io/posts/2014-10-Visualizing-MNIST/

Overall, there are many approaches to applying unsupervised and supervised learning to the MNIST dataset. In this sense, it is possible to discover new and interesting features from existing datasets. It’s recommended to jump into some of the introductory blogs available.

[0] TensorFlow MNIST
[1] Kernel Trick
[2] Visualizing MNIST
[3] PCA Tutorial
[4] PCA in Tensorflow

--

--

--

We are an engineering design consultancy that exists to create an inclusive and prosperous global community enabled by technology.

Recommended from Medium

Drift Reporting with Automunge

How to deploy Machine Learning models on Android and IOS with Telegram Bots

Custom ML Infill with Automunge

Harry Potter and the Deep Learning Experiment

Transfer learning with a small data set- “nanos gigantum humeris insidentes”

Stochastic Gradient Descent explained in real life: predicting your pizza’s cooking time

Evolution of Extreme Learning Machines

Multi-Task Learning with Deep Neural Networks

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
MistyWest

MistyWest

MistyWest is a leading engineering design consultancy accelerating the world’s transition to a more sustainable future.

More from Medium

Machine Learning and Nail Smacking: Role-Driven Outcomes

Data Enrichment for ML Model Deployments

AI Implementation: What Does It Take to Adopt Artificial Intelligence in Business? | Toolbox

Weather Nowcasting — Model compression — by Margaux Gérard