How does machine process and understand human language — Part 1

Can a machine do it? If so, how does a machine do it?

Mohammad Nuruzzaman
Analytics Vidhya
3 min readMar 18, 2021

--

Poster presented at ACM Turing Conference [Ref. 1], May 2019, China

Human language is highly ambiguous … It is also ever-changing and evolving. People are great at producing language and understanding language and are capable of expressing, perceiving, and interpreting very elaborate. In theory, we can understand and even predict human behaviour using that information. Can a machine do it? If so, how does a machine do it?

Yes, a machine can do it.

Great progress on AI made it easier for a data scientist. Understanding human or natural language is part of computational linguistics known as natural language processing (NLP). Not only understanding languages but also language translation, question-answering, text summarization, spam detection, sentiment analysis, human-like conversation and many more. One of the examples is AI Chatbot or Conversational system. Conversational systems are becoming one of the top strategic communication technologies nowadays. Many researchers predict that by 2020, the average person will talk more with bots than their family members [Ref. 2]. Despite the barriers to this technical optimism, some commentators have noticed that chatbots are not fulfilling the initial promise that apps and websites can be replaced [Ref. 3]. Platform vendors- such as Facebook said to be hosting 100,000 bots, and more than 285,000 chatbots for pandorabots. The expectation of this generation is a human likely conversation chatbot with automated language translation and grammar correction capability. Unfortunately, the output from many chatbot models sometimes gives us some nonsensical output.

Understanding human language or unstructured data is one of the most complex tasks for a machine, but with the current NLP techniques, it is becoming easier day by day and enables a machine to understand human language and perform tasks such as relationship extraction, abbreviation, sentiment analysis, named entity recognition, and speech recognition. Furthermore, correcting noisy, ungrammatical text remains a challenging task in NLP. Ideally, given some piece of writing, an error correction system would be able to fix minor typographical errors, as well as grammatical errors that involve longer dependencies such as non-idiomatic phrasing or errors in subject-verb agreement. Existing methods, however, are often only able to correct highly local errors, such as spelling errors or errors involving articles or prepositions. Classifier-based approaches to error correction are limited in their ability to capture a broad range of error types [Ref. 4].

In the next couple of series, I am going to show you hands-on examples of how it can be achieved. Before diving into that it is important to know which processes are involved that help the machine makes sense of what it is ingesting.

Natural Language Processing (NLP) Tasks

As shown in Fig. 2, a list of NLP processes includes lowercase conversion, tokenization, abbreviation determination, POS tagging, grammar checking, removing stop words, lemmatization, entity extraction and punctuation removal. You can use a natural language toolkit (NLTK) or StanfordNLP as a tool to pre-process input and obtain more accurate information.

Fig. 2 — List of Activities in Natural Language Processing (NLP)

In the following article (Part 2) I will discuss each of the NLP tasks with a hands-on example. Please share and stay tuned. Thank you for reading.

References:

[1] M. Nuruzzaman and O. K. Hussain, “Identifying facts for chatbot’s question answering via sequence labelling using recurrent neural networks” presented at the Proceedings of the ACM Turing Celebration Conference — China, Chengdu, China, 2019.

[2] Gartner. (2016) Top Strategic Predictions for 2017 and Beyond: Surviving the Storm Winds of Digital Disruption. Gartner.

[3] J. Guynn, “Facebook Messenger takes another swipe at bots” in USA TODAY, ed: Gannett Satellite Information Network, 2017.

[4] H. T. Ng, S. M. Wu, T. Briscoe, C. Hadiwinoto, R. H. Susanto, and C. Bryant, “The CoNLL-2014 Shared Task on Grammatical Error Correction” in Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, Baltimore, Maryland, 2014, pp. 1–14.

--

--

Mohammad Nuruzzaman
Analytics Vidhya

Data Scientist at Ausloans Finance Group … Deliver High-impact AI solutions through MLOps & Predictive Analytics.