AI Idiots In NLP

Pratik Bhavsar | @nlpguy_
Modern NLP
Published in
4 min readFeb 6, 2020

Have you met them?

Photo by REVOLT on Unsplash

NLP idiots are people who burn everything and have fun at the expense of the organisation. I have been lucky enough to encounter many of such AI idiots during my interactions with colleagues, companies and interviewees. To be frank, I have also made a fair deal of mistakes 😅😬😞

I will throw light on 7 different types of NLP idiots you will meet when working with NLP problems. These idiots are categorised based on their level of idiotness. The more the level, the more damage they can do.

In ML, you can actually get away with a more complex impractical solution than a simple solution.

I delivered a talk on the same topic which you can watch over here!

Get all of my blogs topic wise here!

Level 1 Idiot — The regex idiot 🐶

I met this guy when we were interviewing for Python developer position. Later when we were discussing projects, he mentioned one where he used regex to fetch info from logs. On probing further I came to know that the pattern came in a fixed form like all logs and a simple split() and indexing of word in the list would have worked. There was no pattern here to be found by regex.

PROBLEM: a developer converted a simple extraction problem into an unnecessarily slow and complex regex pattern problem.

Level 2 Idiot — The ML idiot 🐶🐶

A company had to find the tables associated with a given table. The problem on the surface looked complex enough to be deemed as an NLP classification problem. Later, it turned out to be solvable by 2 rules of token match with an accuracy of 90%.

PROBLEM: a developer converted a simple rule based problem into a ML problem.

Level 3 Idiot — The Model idiot 🐶🐶🐶

A company had to make a FAQ chatbot which can understand the question and reply with one of the pre-defined 90 answers. The person converted this into a classification problem and asked to create the training data. The model had accuracy of 80%. Every time new question-answer pairs are added, it requires retraining, hypertuning and deployment.

PROBLEM: a data scientist converted a simple unsupervised similarity problem into a fancy supervised problem. This puts pressure of creating data, training models again and deployment.

Level 4 Idiot — The AutoML idiot 🐶🐶🐶🐶

A company was dealing with a text classification problem. The guy tried out AutoML and found SVM to work best. He finalised the solution without looking further. He didn’t try more complex solutions like a neural network.

PROBLEM: a data scientist found something by AutoML(Tpot/H2O) and didn’t consider improving it or its pros/cons. He was not skilled enough to try DL approaches and was happy with the results.

Level 5 Idiot — The DL idiot 🐶🐶🐶🐶🐶

A company was dealing with a text classification problem. But it was more like next reply problem where the next reply depended on the past few replies. The guy tried to convert this into a seq2seq problem. Now, if you have dealt with seq2seq, they are more complex to deal with in every way compared to other NN approaches. Anyways, they tried out many ways and found that a CNN with a window of last 2 reply works best.

PROBLEM: a data scientist jumped to a more complex deep learning architecture without considering simpler solutions. There is tons of evidence that CNN perform great when the dependency of a token is not more than a window of 2. They are fast to train compared to seq2seq and light in code.

Level 6 Idiot — The Ignorant Idiot 🐶🐶🐶🐶🐶🐶

A company wanted to make a NER model. The data scientist decides to create training data. They get some data, tag it and train a model from scratch. They get poor performance. Now they don't know what to do because they have exhausted the project timeline.

PROBLEM: a data scientist jumped to making a model from scratch instead of considering transfer learning.

In such cases, first try an available model. If the performance is not enough, take the pretrained NER model and train it over your tagged data. Try different pretrained models as things can be empirical. Flair might be a good idea in general but BioBERT/SciSpacy might be what you need if you are dealing with medical data. Do not try to reinvent the wheel.

Level 7 Idiot — The Transformer Idiot 🐶🐶🐶🐶🐶🐶 🐶

This is the latest breed of idiot. A company wanted to make a sentiment analysis model. The data scientist came with a good experience of modeling from Kaggle and instantly said, let's train a transformer on it because it yields the best performance. He trained BERT-large with language model tuning and task fine-tuning to get an accuracy of 97% and proudly showed it to the manager. The manager was elated and proud of the team.

When the deployment work started, they found in the load test that server cost is coming out to be impractical because it has to be deployed for streaming tweet data and needs to be deployed on the GPU. Since it was a startup with low budget, no funding and no customer, they couldn’t afford it.

PROBLEM: a data scientist jumped to a more compute heavy model without understanding the production requirement and cost implication.

The data scientist should have ideally asked for the budget and metric requirement before jumping to using a transformer model.

Conclusion

Hope you had a good time reading this. I personally try to solve the problem using this approach although it also depends on past experience, budget, latency and development time.

Rules>Regex>Unsupervised>ML>DL>ML with high feature engineering

Want more?

Subscribe to Modern NLP for latest tricks in NLP!!! 😃

--

--