NLP LANDSCAPE FROM 1960s TO 2020s
“Unleash the power of words: Welcome to the realm of Natural Language Processing, where machines learn to speak the language of humanity.”
Did you know about NLP long ago? Or you just heard about it when Siri, Alexa, Google Assistance came into picture?
NLP (Natural Language Processing) began in the 1960s. At this time, people recognized the importance of translation from one language to another and hoped to create a machine that could do this sort of translation automatically. Natural Language Processing is a field that is mainly concerned with interactions between human language and computers. It empowers machines to understand, interpret, and generate human language, bridging the gap between humans and technology. With the ability of machines to understand human language it has impacted us on a very large scale. Due to NLP we can process vast amounts of textual data and it has revolutionized industries from customer services to entertainment and also the health sector. NLP is made from three fields : computer science, artificial intelligence and human language.
Let’s take a little brief history of NLP.
- 1950s — 1960s: Early beginnings with the development of “Logic Theorist” and exploration of machine translation.
- 1960s — 1990s: Focus on rule-based NLP systems for language understanding and information retrieval.
- 1990s — 2000s: Transition to statistical NLP using probabilistic models and machine learning techniques.
- 2000s — Advancements in machine learning and neural networks lead to a renaissance in NLP.
- 2010s -Pre-trained language models like GPT-3 become revolutionizing language understanding and generation tasks and also models like Siri which were untrained came into picture.
- 2020s — Chatbots came forward and took a hold on major fields.Conversational AI and chatbots became increasingly sophisticated and prevalent in various industries, including customer support, virtual assistants, and e-commerce.
Why is there so much demand for NLP?
- Handling large volumes of text data- Now-days due to online services there is a huge amount of data that is generated every second. Social Media, E-commerce, Healthcare almost every sector generates huge amounts of data and NLP is very good at handling data.This is one of the main reasons that NLP has increased demand.
- Structuring highly unstructured data source- Lets understand this with an example, today’s young generation uses a lot of short words and slang language like how r u ? , btw, lol etc and machines need to understand and process this language to give proper answers like human beings. There’s vast amounts of unstructured data generated. So, NLP is highly used for converting unstructured data to structured data every minute, be it be social media posts or E- commerce product comments anything.
- Language Translation: NLP is fundamental to language translation systems, facilitating communication between speakers of different languages and fostering global communication and collaboration.
Application of NLP-
- Chatbots
- Email Clients- Spam filtering, smart reply
- Social Media- removing adult content, opinion mining.
- Search engines
- Contextual personal advertisement
- Google assistance, Siri, Alexa, Chat GPT
Google Search Engine Advancement
Common NLP tasks
- Text/Document Classification- Text/document classification, also known as text categorization, that involves assigning a predefined category or label to a piece of text or document based on its content. It is a supervised learning approach, where the model is trained on labeled data with known categories to learn patterns and associations between text features and the corresponding labels.
- Sentiment Analysis- Sentiment analysis, also known as opinion mining, that involves determining the sentiment or emotion expressed in a piece of text. The goal of sentiment analysis is to automatically identify whether the sentiment of the text is positive, negative, neutral, or sometimes more specific emotions like joy, sadness, anger, etc.
- Information Retrieval- It is science that focuses on the process of retrieving relevant information from large collections of unstructured data, typically in the form of text documents. The goal of information retrieval is to provide users with the most relevant and useful information in response to their queries.
- Parts of speech tagging- It is also known as POS tagging or grammatical tagging, that involves assigning grammatical categories (parts of speech) to each word in a sentence. The goal of POS tagging is to identify the syntactic role and function of each word in the context of the sentence.
- Language detection and machine translation- These are two different processes, language detection is also called as language identification, that involves identification of the language in the given text. Here the machine needs to analyze and process the given text, it includes content tagging, text analysis, multilingual search engine.Machine translation is the process of converting text from one language to another language. It can be text or document.
- Knowledge Graph and QA system- It is a structured and organized representation of information, data or facts about that particular domain. Its main task is to extract main content from text or documents. It can be used in decision making, problem-solving and knowledge sharing. Question Answering is an application of knowledge systems and natural language processing that involves automatically answering questions posed by users in natural language. QA systems utilize the knowledge stored in their knowledge base to process user queries, find relevant information, and generate appropriate answers.
- Text Summarization- Text summarization is the process of converting a long text or document into concise and coherent short text/document.The goal of text summarization is to capture the main ideas and key information from the source text while preserving its meaning and context.
- Topic Modelling - Topic modeling is a technique used to identify hidden topics present in a collection of text documents. It is an unsupervised learning approach that allows us to discover the underlying themes or subjects in the text data without any predefined labels.Also helps to find patterns and algorithms in the data.
- Text Generation- Text generation is the process of automatically generating a text or document in human language by machine in a short time span. It is highly used for getting information in a meaningful, concise and coherent format.
- Spelling Checking and Grammar Correction- It is a process of improving the accuracy and readability of written text. These tasks involve automatically identifying and rectifying spelling errors and grammatical mistakes in a given piece of text.
- Text parsing-Text parsing is the process of analyzing and extracting relevant information from a given text or document. It is a fundamental task to understand the structure, meaning, and patterns within the text.
- Speech to text- It is a process in which a machine converts human speech to human language text.t is a crucial component of natural language processing (NLP) systems, enabling computers to understand and process human speech.
Approaches to NLP:
- Heuristic methods
- Machine Learning based methods
- Deep Learning base methods
- Heuristic Methods- Heuristic methods, often referred to as heuristics, are problem-solving strategies or techniques that use practical approaches to find approximate solutions when an exact solution is either too time-consuming or not feasible.The term “heuristic” comes from the Greek word “heuriskein,” which means “to find” or “to discover.” Heuristics are commonly employed in various fields, including computer science, artificial intelligence, optimization, decision-making, and problem-solving.
Example- Collecting positive and negative words from the comment or review of a product and analyzing its positive or negative review.
Advantage- Quick approach, Accurate
- Machine Learning based methods- Machine learning-based methods in Natural Language Processing (NLP) involve using algorithms that can learn patterns and relationships directly from data. Instead of relying solely on handcrafted rules or heuristics, machine learning models are trained on large datasets to automatically discover patterns in language and make predictions or generate text. It creates Algorithms so no need to put much manual human approach.
Workflow of ML- converting text to number, Applying Algorithm to it. Algorithms like Naive Bayes, Logistic Regression, SVM, LDA, topic modeling, Hidden markov models.
Example- Text Classification using Support Vector Machines (SVM):
Text classification is a common NLP task where the goal is to assign a label or category to a given piece of text. For example, classifying emails as spam or not spam, sentiment analysis (positive, negative, neutral), topic classification, etc.
Advantage-Automated Feature Learning,Flexibility, Scalability,. Etc
- Deep learning methods-
Deep learning methods are a subset of machine learning that use artificial neural networks with multiple layers to learn and represent complex patterns and relationships in data. These methods have revolutionized various fields, including Natural Language Processing (NLP), and have significantly improved the performance of language-related tasks. Deep learning models can automatically extract hierarchical features from raw text data, enabling them to understand and generate human-like language.
In machine Learning approaches we convert text to number and sometimes the sequence used is lost in this process. In Deep Learning sequence is proper and not lost.Feature Generation are automatically created in deep learning but in ML its not created automatically.
Architecture used -RNN, LSTM, GRU/CNN, transformers, Auto encoders.
Example-Sentiment Analysis using Bidirectional LSTM:
Sentiment analysis is the task of determining the sentiment or emotion expressed in a piece of text, such as positive, negative, or neutral. Deep learning models, specifically Bidirectional Long Short-Term Memory (LSTM) networks, have shown remarkable performance in sentiment analysis tasks.
Advantages- Handling Large-scale Data,Performance Improvement, Representation Learning.
Challenges in NLP-
- Ambiguity
- Contextual Words
- Synonyms
- Irony, Sarcasm and tonal difference
- Spelling errors
- Creativity like poetry etc
- Diversity
Conclusion- Natural Language Processing (NLP) is a rapidly evolving field that focuses on enabling computers to understand, interpret, and generate human language. NLP has made tremendous strides over the years, thanks to advances in machine learning and deep learning techniques, as well as the availability of vast amounts of text data.
About me(Krupa Dharamshi),
Hello there! I’m Krupa, a tech enthusiast with an insatiable curiosity for all things tech. Join me on this thrilling journey as we explore the latest innovations, embrace the digital age, and dive into the world of artificial intelligence and cutting-edge gadgets.This is my first blog and I hope you all like it. See you all in the next blog !!! Let’s unravel the marvels of technology together!