Bringing Light to Dark Data

Steven Astorino
Inside Machine learning
5 min readApr 16, 2018
Shutterstock

“Dark data” is data acquired through various computer operations that is generally excluded from decision making or analysis because it can be difficult for computers to categorize or understand. In fact, much of this data is unstructured, and sometimes retained only for legal purposes. This IBM Big Data Analytics Hub article estimates that only 20% of data is visible to computer systems, leaving 80% categorized as dark. Instead of leaving that untapped potential just sitting there, unlocking and allowing AI systems to learn from this data could help organizations reveal new insights and knowledge that might yield greater competitive advantage.

IBM Watson Explorer Deep Analytics Edition

In an information-driven world, all the employees within an organization, from the front lines to the executive suite, share a common need: the right information delivered at the right time, in context. The very act of looking for information is time-consuming and imposes a cognitive burden on knowledge workers, which reduces their effectiveness and capacity.

IBM Watson Explorer Deep Analytics Edition (WEXDAE) helps find information scattered across your enterprise by leveraging machine learning (ML) models to identify relevant and related data. With a redesigned Content Miner, organizations can discover trends and anomalies hidden in unstructured data, often in just minutes.

WEXDAE introduces the oneWEX platform, a containerized next generation cognitive exploration and content analytics solution. It’s built from the ground up to include native support for cognitive capabilities on private cloud and leverages the latest foundational and analytical components.

It has a highly scalable architecture that is easy to deploy and configure, providing improved “embed-ability” with modularized services for all key functionalities including text analytics — with plans to support other private cloud deployments.

Three verbs — Explore, Analyze, Advise — put WEXDAE into context:

1. Explore: Revealing Insights from data across the enterprise

WEXDAE is infused with cognitive capabilities and a hybrid cloud architecture, allowing customers to manage their data, keeping their most sensitive data on-premise while unlocking new insights through Natural Language Processing (NLP) enriched enterprise and 3rd-party content in the cloud. Customers can use the scalability and flexibility of cloud-based technology, including not just returning documents but extracting answers within those documents. In short, it can help organizations:

  • surface information from both internal and external data sources — news, financial data, social media, other general web data
  • find information from across your enterprise
  • build a custom dashboard to dynamically deliver relevant information

2. Analyze: Understanding the “Why” behind the “What”

WEXDAE identifies phrases, sentiment and connections in data. It visually guides users through data using an interactive dashboard to find anomalies.

The Content Miner capability provides an interactive interface to help business professionals mine large amounts of text for new business insights. Textual information is analyzed and visualized using a series of views that can show trends, patterns, and anomalies in information. For example, a user might see a rising number of references to a specific component or product in call center logs over time, indicating the need to investigate whether the increased references indicate a problem or new interest in a product capability. Out of the box, there are ten analytic frameworks for detecting patterns in unstructured content. It works hand in hand with the Watson Explorer Content Analytics Studio, enabling your organization to create and use content analytics projects without requiring extensive programming or coding.

3. Advise: Guiding users to effective decisions

Too often, valuable data and insights are obscured or difficult to access. If organizations can unlock these insights, they gain the opportunity to leverage data into measurable business benefits. The new cognitive assistant shown in figure #1 helps provide users with the specific data they’re looking for by helping them filter out extraneous information. Combined with WEXDAE’s machine learning capabilities, user queries can be more relevant and effective than ever before.

The primary business benefit of WEXDAE across all types of applications, whether delivering general information access, analytics, or advanced cognitive capabilities, is to scale expertise by providing users with the best information possible. The cognitive assistant helps reduce the cognitive burden, allowing users to focus on leveraging data insights instead of wasting time trying to find information.

How WEXDAE works: Leveraging machine learning for cognitive advice

WEXDAE leverages an end-to-end, ML-based classifier to recommend an action for a given document.

The classifier then uses NLP to extract words and phrases, along with structured metadata for cognitive decision-making.

And lastly, WEXDAE uses the training data to build a machine learning model for classification.

Customer Churn Scenario

Dan works as a Data Scientist at a food retail company. Based on a customer’s recent complaint, he wants to predict whether that customer will stop shopping (churn). Dan’s historical data includes “Product Info”, ”Customer Info”, “Complaint (text data)” and “Indicator of churn”.

WEXDAE provides an API wrapper (in Python), and Dan uses the oneWEX API within his DSX Notebook.

WEX Document Classification can convert “Text Data” to “Nominal Data” so that text data can be an input to create a prediction model using other ML libraries.

Summary and Next Steps

This blog post touches the surface on just some of the capabilities found in WEXDAE. Its cognitive assistant helps guide uses to better insights without the need for advanced data science skills. Users can build Watson-infused enterprise solutions using the embeddable engine and augment an organization’s knowledge workers with advice from AI models.

This video provides further insight along with this white paper. As always, my advice is to try the technology, which you can do by signing up for a free trial of the Watson Explorer Community Edition.

--

--

Steven Astorino
Inside Machine learning

Vice President of Development, Data and AI. Tweets and opinions are my own https://stevenastorino.com