An Absolute Guide to Take Off in Machine Learning

--

Let’s begin with a scenic memory, where everything feels so beautiful.

Shameless plugin: We are a machine learning data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick.

The buzzwords today are Machine Learning and Artificial Intelligence. Alright. Agreed! But, why do we all chase these fields even though most of us have limited knowledge regarding the field, and yet, try to use pre-existing models and thrive in such an environment.

Let’s break this pattern and enter the pure world of Machine Learning where you can actually think and perceive algorithms, and understand the elegance behind the subject. In this blog, I stitch out the path to learning Machine Learning in a holistic and efficient way.

Let’s take off!

Let’s get some terminologies straight, that we encounter in the ocean of Machine Learning.

  1. Cost function: The cost function also known as the error function, objective function, intuitively put, is that function, that models the variation between your ML model and the model of reality, that is the original model from data.
  2. Classification: Classification is concerned with building models that separate data into distinct classes. These models are built by inputting a set of training data for which the classes are pre-labeled (Supervised Learning) in order for the algorithm to learn from. Caution: Labeling data-sets consumes huge amounts of time. That’s where we come in to help you out! We provide solutions to do data annotations super easily. Check out: dataturks.com
  3. Regression: This is again a type of classification, only that the number of classes is now not distinct. Thus, it is used to predict continuous data. For example: predicting the height of the future generations given the past. (Check the history of regression: You shall understand the relevance of this example.)

Beyond Regression

Whenever we look at any online course, they take off with linear regression and this is a concept that most of us know, that is, an equation of a line initially and then gradually fitting of the best fit line. The application of this algorithm is used in machine learning as a way to predict results in the future given the feature vectors, x.

Visualizing squared cost function.

So, why is the cost function a squared cost function? Why not have an absolute cost function? Well, there are plenty of reasons as to why we consider this, but when we derive this mathematically, we come across the concept of exponential families under general linear models, which generalize the notion of loss functions for any given model, and thus the square function is actually an exponential family curve.

Non-Linearity

Then we head towards logistic regression, a classification algorithm. In logistic regression, there are two main concepts of importance- loss function and softmax function. What is the significance of the loss function, and why do we need softmax? What are the applications of softmax and understanding the nuances of converting scores into probabilities? I have let the reader venture into these areas as they are simple and interesting.

Predictive Vs Generative

Now the route to be taken is that of towards general linear models and generative models. There are two kinds of models, predictive and generative. In predictive models, during training we are fine tuning the outputs given the inputs, for example, regression. On the other hand, in generative models, we predict the probability of the feature vectors occurring given its output. I have been really general in these descriptions as it is an introductory material.

Support Vector Machines.

Finally, we enter the realm of SVMs which are intuitively so simple, but the math behind can give the best of the best a challenging time. Simply because it is an algorithm that, at once upon a time during the inception of neural networks, competed with neural networks. Yes, it is that good.

SVMs are machine learning algorithms that try to fit decision boundaries in such a way that the degree of confidence is very high. By degree of confidence I mean, the certainty with which a data point being classified belongs to the class, and also another way to look at it would be, the distance of the data points from the decision boundary. This is termed as margin.

We are implementing linear classifiers. What do we do when our data is not aligned linearly in its current dimension! (If you stopped and thought about dimensions, then you are right) Because the data is not linearly separable in the current dimension, we use kernels, which are computationally effective dimension uplifters, to project the data into higher dimensions, where the data can be linearly separable.

Linear classification in higher dimension.

These concepts are Machine Learning absolute basics. The more you go in depth, the more you realize that there is so much to be learned.

Neural Networks

The path from here is towards artificial neural networks, where we learn about single feed-forward networks, multi feed forward networks, multi layer perceptron, back-propagation, and the loss function. And finally the realm of DEEP LEARNING!

A typical neural network

Lets come to the coding part, where the best holistic approach would be to code all algorithms from scratch and implement them using numpy and Tensorflow. The codes will be available online, but we get absolute clarity only when we code these algorithms on our own. This doesn’t mean that we can’t start learning libraries related to machine learning… They are equally important and they will be used frequently in all the projects. Let me list out all the important state of the art libraries that are used in various domains under Machine Learning:

  1. Machine Learning: scikit-learn, matplotlib(to view results)
  2. Computer Vision: numpy, Keras, PyTorch, Tensorflow
  3. Natural Language Processing: NLTK, OpenNLP, Keras, Tensorflow,re(for data preprocessing)

The most fundamental libraries would be Numpy and Tensorflow, and the most fundamental theoretical knowledge would be Linear Algebra and Probability. The reason for these would be that linear algebra provides knowledge regarding the operations that take place on matrices, and probability formalizes large uncertainty in data. The data that we deal with are usually an array of matrices called tensors, and we will have to implement these in Python using Numpy, Tensorflow as our basic libraries and if needed, others too

Wishing you the best hours of coding and learning! Happy learning! :)

Resources:

  1. Machine Learning: CS229n Lectures, HackerEarth ML Tutorials, https://www.datacamp.com/community/tutorials, Documentations
  2. Computer Vision: CS231n Lectures & Assignments, Adrian Rosebrock’s blogs
  3. Natural Language Processing: CS224n Lectures
  4. Deep Learning: deeplearningbook.org, Oxford Lectures on Deep Learning

The art of thriving in an ever growing community is to keep reading research papers. We can cover that in another blog I guess! For now… That’s all folks. This is the first blog of a 5 part tutorial series… We shall cover most of the topics mentioned in the blog in the upcoming ones.

Let me know if you have any feedback/ doubts regarding the blog. You can reach me at lalith@dataturks.com.

Connect with the Raven team on Telegram

--

--

Data Annotation Platform. Image Bounding, Document Annotation, NLP and Text Annotations. #HumanInTheLoop #AI, #TrainingData for #MachineLearning.