Text Mining and Analytics using Natural Language Processing.

Isha Gupta
5 min readAug 10, 2020

--

Isha Gupta
NMIMS’s Mukesh Patel School of Technology Management and Engineering.

Introduction

Our leap into the online world has a great impact on surveying people based on the growing text data which can provide knowledge into industries, from manufacturing to marketing to life sciences with Text mining and analytics.

Organizations can mine text data for identifying different categories of users, identifying the intent of a text, and identifying different categories of user reviews and feedback which can save operational costs, assist with predicting the future, and uncover insights previously not available.

But ever wondered how much of this data is generated in a day?

image source: in.pcmag.com

About 90% of all the data in the world we have created in the last 24 months, averaging 2.5 quintillion bytes per day and about 90% of that is unstructured data, which is things like texts, Tweets, pictures, and videos (Griffith, 2018). The goal of discovering meaning and purpose in this electronic torrent has created the industry of text mining and analytics.

What is Text Mining?

Text mining is the process of examining large collections of documents to discover new information or help answer specific research questions.

It identifies facts, relationships and assertions and once extracted, this information is converted into a structured form that can be further analyzed, or presented directly using clustered HTML tables, mind maps, charts, etc.

The seven fundamental techniques in Text Mining:

The seven text mining practices of text mining with its six related fields. image source: textanalyticsworld.com
  1. Information Extraction:
    This process of extracting meaningful information from vast unstructured data. It focuses on identifying the extraction of entities, attributes, and their relationships from semi-structured or unstructured texts.
  2. Information Retrieval:
    Extracting relevant and associated patterns based on a specific set of words or phrases. It covers indexing, searching, and retrieving documents from large text databases with keyword queries.
  3. Natural Language Processing:
    It is a form of “supervised” learning wherein normal language texts are assigned to a predefined set of topics depending upon their content. Thus, categorization or rather Natural Language Processing (NLP) is a process of gathering text documents and processing and analyzing them to uncover the right topics or indexes for each document.
  4. Clustering:
    Clustering is one of the most crucial text mining techniques. It seeks to identify intrinsic structures in textual information and organize them into relevant subgroups or ‘clusters’ for further analysis.
  5. Classification:
    Text classification refers to the process of automatically generating a compressed version of a specific text that holds valuable information for the end-user.
    It integrates and combines the various methods that employ text categorization like decision trees, neural networks, regression models, and swarm intelligence.
  6. Concept extraction:
    Grouping of words and phrases into semantically similar groups.
  7. Web mining:
    It is an application of mining techniques to find information patterns from the web data.

What is Text Analytics?

Two terms text mining, and text analytics are roughly the same.
Mining emphasizes more on the process. So it gives us an error rate medical view of the problem. Analytics, on the other hand emphasizes more on the result using advanced machine learning algorithms and natural language processing (NLP) on the mined text data.

image source: textanalyticsworld.com

A list of algorithms traditionally used for text analytics can be seen in the following table.

image source: textanalyticsworld.com

Applications of Text mining and Analytics:

  1. Identify and categorize important concepts:
    Classify a broad range of entities in text, such as people, places, organization’s, date/time and percentages, using named entity recognition. Detect and extract identifiable information, including protected health information (PHI), in documents.
  2. Better understand customer perception:
    Detect positive and negative sentiment in social media, customer reviews and other sources to get a pulse on your brand. Use opinion mining to explore customers’ perception of aspects, such as specific attributes of products or services, in text.
  3. Detect language of your text:
    Evaluate text input in a wide range of languages, variants and dialects with advanced language detection.
  4. Process unstructured medical data:
    Extract insights from unstructured clinical documents such as doctors’ notes, electronic health records and patient intake forms using the health feature of Text Analytics. Recognize, classify and determine relationships between medical concepts such as diagnosis, symptoms and dosage and frequency of medication.
  5. Cybercrime prevention:
    The anonymous nature of the internet and the many communication features operated through it contribute to the increased risk of internet-based crimes. Today, text mining intelligence and anti-crime applications are making internet crime prevention easier for any enterprise and law enforcement or intelligence agencies.
  6. Social media monitoring:
    Social media is a goldmine of consumer stories and opinion data. Data analysts upload, process and analyze mountains of social text data to understand the conversations surrounding products, brands, people and services.

Postscript

Text analytics offers extraordinary insights into analyzing public sentiment for economics, finance, e-commerce, and social and geopolitical issues. This is because human behaviors on social media and the digitization of communications are creating such a massive corpus of textual data to mine.

References

An Updated Text Analytics Primer: Key Factors in a Text Analytics Strategy
https://towardsdatascience.com/a-text-analytics-primer-key-factors-in-a-text-analytics-strategy-d24dc84a5576

Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012

Text Analytics vs Text mining
https://www.educba.com/text-mining-vs-text-analytics/

Text Analytics: 5 Examples To Open Your Eyes To Your Own Opportunities
https://www.zencos.com/blog/text-mining-examples-advanced-analytics/

90 Percent of the Big Data We Generate Is an Unstructured Mess
https://in.pcmag.com/news/126973/90-percent-of-the-big-data-we-generate-is-an-unstructured-mess

--

--