Distance metrics, like Euclidean, Manhattan, Minkowsky and so on, suffer a lot when we increase the number of dimensions (features) of our data. Its like they dilute and loose their meaning. The become unreliable.
This is not very easy to understand, and it comes into conflict with the general understanding that more features equals to better Machine Learning models, which is far from being true.
Reducing the dimensions of our data is something which should always try to be done in order to remove redundant or noisy features. You can learn all about feature selection in this article.
Also, dimensionality reduction techniques like PCA or Kernel PCA are a very clever treatment option to apply to our data when we are going to use an algorithm that computes distance metrics. …
‘Everything should be made as simple as possible, but not simpler’ Albert Einstein.
Deep learning is one of the most promising and powerful branches of Machine Learning, however many people struggle to grasp what it is about, understand it from its very core, and lastly, efficiently take the theory to practice and develop applications that make use of it.
It is so attractive that most people dive into it with very little previous knowledge of Machine Learning, programming, or math, and end up being overwhelmed by the derivatives in Gradient Descent, the chain rule of Back-Propagation, and the vast amount of Deep Learning programming frameworks out there like Pytorch, Keras, or Tensorflow. …
Hello dear reader! I hope you are doing super great. Today we are going to learn how Random Forest algorithms calculate the importance of the features of our data set, when we should do this, why we should consider using some kind of feature selection mechanism, and show a couple of examples and code. Let's get to it!
Feature selection is a very important step of any Machine Learning project. More features equals more complex models that take longer to train, are harder to interpret, and that can introduce noise. …
Random Forest are an awesome kind of Machine Learning models. They solve many of the problems of individual Decision trees, and are always a candidate to be the most accurate one of the models tried when building a certain application.
If you don’t know what Decision Trees or Random Forest are do not have an ounce of worry; I got you covered with the following articles. Take a quick look and come back here.
In this quick article, we will explore some of the nitty-gritty optimisations of Random Forests, along with what each hyper-parameter is, and which ones are worth optimising. …
In this post, we will explain what a Random Forest model is, see its strengths, how it is built, and what it can be used for.
We will go through the theory and intuition of Random Forest, seeing the minimum amount of maths necessary to understand how everything works, without diving into the most complex details.
Lastly, before we start, here you have some additional resources to skyrocket your Machine Learning career:
Awesome Machine Learning Resources:- For learning resources go to How to Learn Machine Learning!
- For professional resources (jobs, events, skill tests) go to AIgents.co …
Hello dear reader! In the post you’re about to read, I will cover in a very simple manner what the three main types of learning in Machine Learning are: Supervised, Unsupervised, and Reinforcement Learning.
As there are millions of posts out there on the differences between these three, where they can be used, and all the typical topics, I will try to go a fair end further, explore them in a novel manner, give my opinions from an Industry and commercial perspective, and throw in a little bit of humour, while also neatly explaining what each of them is about.
Let's go! …
Transfer learning is a widely used technique in the Machine Learning world, mostly in Computer Vision and Natural Language Processing.
In this post, we will explain what it is in detail, when it should be used, why it is relevant, and show how you can use it in your own projects.
Once you’re ready, lay back, relax, and let's get to it!
We are going to see various definitions of the technique, to clarify what it is from different angles, and ease the understanding. …
Yesterday, (28th of August of 2020) Elon Musk gave a very expected update of the state of the technology of his brain-implant company: Neuralink.
The goal of the company, for those who don’t know, is to develop neural interfaces (connections between electronic devices and your brain), with the goal of being able to monitor your brain activity (in the demo we can see how it can precisely track the joint movements of pigs) and eventually also be able to cure certain diseases by not only monitoring, but also acting.
Many people think they know what a probability is, and also think they have a perfect understanding of what odds are: they are used in betting all the time.
But do they really? It is very easy to confuse these two terms.
In this post, we will quickly clarify what each of them is, their differences, and their relationship.
Let's go!
We use probability in our every day language all the time when we say things like ‘I don’t think so’ or ‘That is very unlikely’. …
You’ve divided your data into a training, development and test set, with the correct percentage of samples in each block, and you’ve also made sure that all of these blocks (specially development and test set) come from the same distribution.
You’ve done some exploratory data analysis, gathered insights from this data, and chosen the best features for the task at hand. You’ve also chosen an evaluation metric that is well suited for your problem. …