“If world is already talking about it, then it’s already obsolete.” -Donald Knuth
Let’s start with the most offensive ones
** Rolls Up Sleeves **
- Artificial Intelligence and Machine Learning are NOT BUZZ WORDS
- Linear Classifiers are NOT, I repeat NOT, substitutes of Deep Learners.
- If you use something that was outdated a few years ago and are shamelessly satisfied with it, then you are NOT a Data Scientist or Data Science Researcher.
- Doing a Tutorial does NOT mean you have mastered or even understood all the underlying concepts.
History of AI in Nutshell
** Classic Generation Gap **
- Wave 1 : AI is a vague concept, it’ll never work. So AI was hard coded.
- Wave 2 : AI is possible but it won’t be able to handle complex tasks for ages. So Stats and Probability formed the bases of our AI.
- Wave 3 : AI is the future and future is here. Continuous, Deep, and Generalised Learning is capable of handling specific tasks as good as humans.
Most Startups are still stuck on Wave 1 and Wave 2 and propagate that these are good enough to deliver the all the Intelligence.
I am sorry, those learners aren’t even intelligent in true sense.
Why does this problem exist?
When it comes to machine learning we usually define our capabilities over a task that was built for learners a decade ago. We Don’t even bother to come up with Something New, Something Original. We usually make little optimisations on predefined methods.
Where do we go wrong?
Until we are able to achieve accuracy more than an average human, the problem cannot be called trivial, the solutions become trivial. Only after we are able to out perform a human in terms of accuracy, we can call a solution, accurate enough. Gain through optimisations is small but gain through leaps (New Concept whole-together) can be (Usually is!) huge.
The most cliche problem in NLP is Sentiment Analysis of Tweets.
Ironically we are nowhere near human Accuracy!
Most of us still use algos like naive bayes or some of us get high just by using word vectors.
This is actually a small part of much bigger problem, which is information retrieval from small text.
Good solution is not so straight forward. As we need to extend the information! We do that by creating not only word vectors but also by creating additional character vectors and paraphrase vectors.
What does Information Extension look like?
A simple sentence like “The Year was 1947 !!” holds a lot of information iff we extend the information about each word and the source.
- Here the writer is the source. The writer is an Indian and India got independence in 1947. So as an Indian this year holds positive sentiment for me.
- The two exclamation marks show I am overwhelmed by sentiment, saying that sentence.
- The word ‘Year’ is used in title format (only first char is in upper case) which is contrary to it’s placement with respect to the sentence (only first words or proper nouns are in title format). This means the emphasis is on ‘year’.
Beautiful! Isn’t it? :’)
The Real Beauty of Machine Learning is the ability to solve much much more complex tasks that have not even been defined yet (general intelligence).
Word2Vec was revolutionary and RNN was Revolutionary. Notice I used past tense? Because they are already mainstream. Yet most of us are going to spend years simply using them, barely understanding why they work.
Without understanding the real reason for an algorithm’s effectiveness, we can not come up with a much better alternative/optimization.
Also! Some people are ruining third wave for others
Solving Tutorials does not mean you are able to understand why or how the underlying concepts work.
Most people are hence not even close when it comes to solving non trivial problems. Using ‘black boxes’ like Tensorflow or Keras will hardly ensure you know even 10% of what you are building (Unless you go through their source code)!
Tensorflow and Keras provide complex optimisations with such ease that one barely reads their introduction and assumes they know what is going on.
Putting different Lego pieces together is easy, the real magic lies in how you create the Lego pieces. Because. With a few types of Lego pieces there is only so much you can do. Whereas if you are able to make your own Lego blocks, the scope is endless!!
I have seen people using CNN for sequential data. Not that the idea is taboo but what is taboo is that they never considered RNN for the task. Even when the problem was the definition problem for the RNN.
Soon enough market will be flooded with “Data Scientists” who are not equipped with knowledge to tackle the actual real world problems!
Andrej Karpathy, a well known name in Machine Learning was once asked about the libraries he uses for writing ML codes, he replied :
“Python and Numpy”
That is Insanely Bad-ass, believe me!
When you code most (if not all) of the concepts from scratch, then you know how something works or why it doesn’t. The Black Box becomes a White Box and how you are equipped with enough Knowledge to mould this White box according to your need.
Measure of accuracy for all problems is NOT F-scores but accuracy depends upon what you are aiming for. The widespread believe that there exists a single comparison matrix for very different problems is quite frankly embarrassing.
Take, NER (Name Entity Recognition), F-Score is real bad matrix to work with in this case. Details in this link.
Instead different combination of concepts like :
- Negative Predictive Value
- False Discovery Rate
- Miss Rate
can be used to calculate the actual accuracy of a learner.
Understand the concepts. Play around with them. Be stupid. Be Random. Explore. Just don’t get boring.
Remember, Human brain is a great Motivation for Machine Learning concepts but Not the Goal.
On the brighter side I am really Happy, Excited, Enthusiastic and Optimistic about people entering the field of AI and ML :D
Yes ….. I actually am Happy :p