Ethics in Natural Language Processing

7 min readFeb 11, 2022

Ethics forms a fundamental basis of our everyday functioning, and thus constitutes the need to identify where and how it can be worked upon. The popularity of natural language processing applications brings forth challenges that lead to dangers during the implementation and a need to revisit one’s ethical considerations.

What is Natural Language Processing?

Natural language processing (NLP), a field of artificial intelligence (AI) that handles the processing and analysis of large volumes of unstructured data, is a real game-changer.

Machine learning (ML) and deep learning with cognitive capabilities are the technologies of AI and, therefore, we see rapid evolution in NLP applications.

NLP is a branch that is formed at the intersection of artificial intelligence, computational linguistics, and computer science. It aims to give computers the ability to understand texts and spoken words in a similar way to that of human beings. It leads to the development of systems that processes information gathered into natural language text and makes communication with humans easier. NLP started as a branch of artificial intelligence and borrows from various fields such as statistics, linguistics, cognitive science, and psycholinguistics. The technologies used by NLP include machine learning, deep learning, and statistical models which enable the computers to comprehend the full meaning of the text and voice data that is provided to it. The usual NLP methods include Markov models, neural networks, naïve Bayes, etc.

History of Natural Language Processing

The history of NLP goes back to the 17th century when thinkers like Descartes and Leibniz talked about forming codes to relate words between languages. The proposals forwarded by these thinkers were based on theoretical models and had very little to do with machine learning. However, these ideas paved the way forward for NLP. The first work on a translating machine was proposed by George Artsrouni. Later a revolution in NLP capabilities was witnessed which improved the understanding of the application.

In 1950, Alan Turing in his article computing machinery and intelligence proposed the ‘Turing test’ which focused on the ability of machines to exhibit intelligent behavior like that of humans. Later on, models such as Naom Chomsky’s ‘Syntactic structures’, and William Woods ‘Augmented transition network’, contributed to making NLP the new buzzword of the century. The two phases of development of NLP- first from the 1950s-1990s and the second phase from 2003- 2018. Below is a graphical representation explaining the phases of development:

Natural Language Processing Tasks

The testing of various models of NLP has made tasks of everyday life easier, faster, and less burdensome. Many NLP tasks help the computer to process the data in the form of human text and voice using the following:

Natural Language Generation

In a layman’s language, it is referred to the task of translating the structured information into human language. It is popularly known as a speech-to-text task.

Speech Recognition

This task is used to convert the data in the form of voice into text. It is most reliable for applications that follow voice commands or need spoken answers for questions. It encounters challenges in the form of different accents, quick delivery of words, using incorrect grammar, etc.

Sentiment Analysis

This task of NLP aims to extract the subjective qualities of the data such as the focus on emotions, suspicion, attitude, confusion, etc.

Use Cases of Natural Language Processing

Acquiring the center place in managing high-scale data, the uses of NLP are vast in dealing with real-world applications. Below are some of its cases:

Machine Translation

Life has been made easier since the invention of google translation, an example of a widely used NLP tool. Machine translation aims to capture the tone and meaning of the input language and its translation to text in the desired output language. It also involves the translation of words from one language to many languages.

Chatbots and Virtual Agents

Chatbots figure out how to perceive context-oriented signs about human demands and use them to give far better reactions or choices after some time. Virtual agents examples such as Alexa and Siri which respond to voice commands with the appropriate action. There is continuous improvement that is being worked upon on these applications to create responses that sound very natural.

Detection of Spam

Most of the times spam detection is associated with the application of machine learning, but the best spam detection techniques make use of NLP’s text classification to scan emails for language that indicates scam or phishing. The experts of the field consider spam detection as mostly solved because of the accurate detections over the years.

Social Media Sentiment Analysis

NLP has become a major application for the deconstruction of hidden insights from the language used in various social media posts. It tracks how various tools for advertisement, promotion, opinions are created through the usage of a specific language.

Steps used in Natural Language Processing

There are five major steps used in the functioning of the NLP. They are stated as follows:

Morphological and Lexical Analysis

Lexicon refers to the vocabulary used in a language in the form of words and expressions and morphology depicts the analysis of the description. In NLP, this analysis leads to the categorization of the text into words, paragraphs, and sentences. This marks the first step in its functioning.

Syntactic Analysis

This step deals with the analysis of the words in the sentence to reflect upon their grammatical structure. Using NLP, the words are put in such a structure whereby it can be shown how they are related to each other.

Semantic Analysis

Semantics is concerned with the literal meaning of words or phrases. In this step, the structuration obtained in the previous step is the given meaning, and abstractions are minimized.

Discourse Integration

At this stage, NLP focuses on the sense of the content. It highlights the association of sentences together and how the earlier sentences influence the meaning of the later ones.

Pragmatic Analysis

This is the last step in the working of NLP. It is concerned with overall communication and world knowledge. It is ensured that what is being said is replicated to what it means.

Locating Ethical Issues in Natural Language Processing

Our everyday lives are governed by various moral values, teachings, limitations on what is right and what is wrong. In similar terms, when we talk about making NLP an internal part of life there are ethical issues that pop up that need to be looked into it. Despite being a powerful tool for businesses and individuals, NLP encounters shortcomings and has its own set of limitations to deal with.

Locating bias in Natural Language Processing during Data Generation

The world of artificial intelligence claims to be neutral and objective, but studies have shown that human subjectivity, understanding, stereotypes find their way into the models of NLP as well. Three major biases are encountered at the stage of data generation:

Historical Bias

This puts forward how the everyday generalizations and stereotypes crawl up in how the machine interprets the data. For instance, a word like a nurse is highly associated with women, signifying a discriminatory attitude towards a particular gender.

Representation Bias

It occurs when some part of the population is either over-represented or highly neglected in the data. It leads to false generalizations and weak insights by the models.

Measurement Bias

This bias is faced when the measurement value stands against a proxy value in the process of choosing, collecting, and computing features. It leads to oversimplification and wrong conclusions.

Identification of Bias after Data Generation

The ethical issues associated with NLP do not subside with the process of data generation but are recurrent at every stage. Below are some concerns encountered in the later stages:

Learning Bias

This bias impacts the under-represented, marginalized, and subgroups among the population whose accurate information is not available in the model functioning. It thus reduces the reliability of the model. For instance, the DP-SDG (Differentially private stochastic gradient descent) while performing the function will degrade accuracy for the dark-skinned faces as compared to the lighter skin tones.

Evaluation Bias

It seems that the benchmark data used for a specific task fails to represent the entire population. Reports state how the facial algorithms used by IBM, Microsoft perform better on white male faces.

Aggregation Bias

This bias arises when an approach of one category fits all is employed. It at times fails to take into consideration the differences between the various groups in the population.

Deployment Bias

This occurs when there is a mismatch between the intended problem and the way its solution is approached.

Other Varied issues in Natural Language Processing

Apart from the above-mentioned biases and ethical issues, other limitations are of concern when we talk about NLP.

Ambiguity

It refers to cases where the sentences or the phrases used have more than one meaning. Ambiguity is of three types i.e. Lexical, Semantic, and Syntactic.

Errors in text and speech

Commonly used applications and assistants encounter a lack of efficiency when exposed to misspelled words, different accents, stutters, etc. The lack of linguistic resources and tools is a persistent ethical issue in NLP.

Usage of Slang and Colloquial words

Slangs are formed on regular basis then and, it’s hard to tap on every new phrase that gets popular within a short period. Similarly, colloquial words have no definite dictionary meaning and present a high chance of problems for the usage of NLP.

Conclusion

The issue of understanding ethical responsibility has to be shared by everyone. It is not just the developers or the data scientists who propose the models and the programs, but as an individual and, as a customer, everyone has to contribute from their end. The limitations and shortcomings can be addressed by continuously updating the models, knowledge sets, and programming techniques. NLP shares the vision for an impactful future. Today NLP is a significant AI-driven application that enterprises can harness to create business value. To start defining a strategy and implementation roadmap for your NLP solutions please connect with us in selecting the best-suited AI-enabled platform.

We hope that this article was helpful to you and provided valuable insights. Thank you for showing interest in our blog and, if you have any queries/suggestions/feedback/comments, you can write to us at info@futureanalytica.com.