Machine Learning for dummies 101
Can you explain to me how a neural network works?
Sanity check — Do you understand all the words in this image? Can you add a few more words to this soup?
- If YES, Then you won’t need to read these post.
- If NO, please continue.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
In the movie, The Imitation Game, “Alan Turing” realize that the only way he could break a cipher generate by a machine, was with another machine.
And some scientist realize that the only way we the humans can make a computer do certain task are by imitating nature, in this case imitating biological neurons.
What we can do with ML?
- Does your bank app or ATM can read your checks amounts?
- This code can detect lanes in the street.
- This code can transfer artistic styles to photos “Neural Style”.
- This code can search for a face on a photo.
All of this tasks and more thanks to machine learning/neural networks... That in my opinion is a new programming paradigm.
How does a neural network learn?
Some calculus trick in which you don’t have to search all the possibly numbers to find the correct solution, you take a “shortcut” to the solution, doing what is call back propagation.
Why I see different terminology on different guides?
I’m not sure about it, but machine learning is a science in which you join multiple disciplines. So I guess that’s the real reason behind it, the wording will get you a little bit crazy!
- Transfer function = Activation Function, Squash Function
- Learning Rate = ETA, alpha, epsilon
- Neurons = Nodes
- Summation Junction = Net Input, neuron activation, pre-activation, dot product, inner product, X = W * I
- Observation = Record, Sample, Instances
- Features = attributes, measurements, dimensions
- Loss function = cost function, objective function, error function
- Class labels = targets
Be careful is easy to get lost when reading different guides from different authors.
Yo say that ML is a mixture of multiple disciplines, which ones?
Statistics, Multi-variable Calculus, Algebra, Programming, Data science.
Why MNIST? (Modified NIST data-set)
MNIST is like the Hello World! in programming. Well understood data-set that you can use always to carry experiments, and play with it.
Call yourself a mechanic, and you should be expected to know how to change a car battery. Call yourself a data-scientist, and you should be expected to know how to work with the MNIST data-set.
Whats all the hype with the GPU?
Big ANN takes times to train, and the GPU cut that time. You don’t want to wait 7 days to obtain a result? Right? You want it now! So google develop their own hardware (Tensor Processing Unit) for this purpose. They metric they use is processing power per watt. Right now they won’t let you train your model with this hardware on the cloud, but I suspect that will come soon.
- sckit-learn won’t give you support for GPU.
- tensor flow will give you support for multiple GPU, not just one.
Image representation of a neural network?
Each link have some “synapses weights” that carry a number, usually from 0.01 to 0.99 and the goal is to search for the numbers that will give you the correct answer at the end of the output layer with respect to the input layer.
- The first layer don’t carry an activation function, but the hidden and final layers will have one
- The output layer carry a loss function
- You can have multiple hidden layers
- The bias is only for the input layers, and hidden layers not for the Output Layer
- Some task, doesn’t need bias.
- The job of the activation function is to generate a non-linear boundary in the decision map.
- The value of the bias will be a constant, while the values of the link in the bias will be the one changing with the training.
- In a real scenario NOT all the links, will be used by the ANN, some links will become zero meaning that there’s really no active link between one neuron and the other, even through we represent the link on the above graphic.
- In the case of Sigmoid, the output will be always between 0 and 1.
- The rule of thumb is to always start with the RELu activation function.
- Yes, you can mix activation functions on a ANN.
w_ih = weights from input to hidden.
w_ho = weights from hidden to output.
So be ready and download the anaconda environment for your system, install scikit-learn and get a little bit familiar with jupyter. In the next post we would do some programming with real numbers.