Helping Provide Insight in an Unstructured World

Steven Astorino
Inside Machine learning
6 min readSep 18, 2018

In my previous blog I talked about the coming together of SPSS with the IBM Data Science Experience a valuable add-on to the Data Science Experience platform (since renamed Watson Studio). In this blog I look at another addition to the platform — the pairing of Watson Explorer with Watson Studio.

Data Scientists — Traditionalists and Mavericks

If we strip away everything but the analysis portion, I believe data scientists generally fall into two main camps: the traditionalists and the mavericks.

  • Traditionalists are used to and prefer packaged solutions. Analytics are a means to an end, and there are typically multiple people working together on various analytics projects either within a department or across a given company.
  • Mavericks are coders by nature and are experimenters at heart. They don’t look for packages to help do analysis but instead look for specific techniques or a way to play with an unusual data type (such as streaming data, social data). Mavericks may start with an application they want to create in mind and then come up with takes on the available data that can be woven together. The maverick tends to work on the data and an analysis and will then pass it on to someone else to interpret and operationalize.

Both of these profiles need to be taken into account by an analytic platform.

IBM Watson Explorer (WEX) can help organizations understand the “why” & “how by recognizing patterns behind unstructured data and can help deliver actionable insights that could help organizations make decisions based on such data. Let’s look at three key areas of WEX:

  • Explore: Find information scattered across your enterprise fast, by leveraging ML models for relevancy
  • Analyze: Discover trends and anomalies hidden in unstructured data rapidly — using the cognitive content miner
  • Advise: Introducing Cognitive Advice — suggest the next potential action with confidence using machine learning models for text analytics and classification

WEX can help extract information and context from unstructured text data so that it can be used like structured data. Statistical scores such as frequency and correlation are computed to further describe the extracted information. Key information within a document is extracted, describing the document as meta-data in the form of keywords or facets. The facets are captured in a text index and used to enhance findability, enable the discovery of unknown patterns using the content miner, and provide the basis for the delivery of cognitive advice.

WEX for Watson Studio is a new offering positioned as shown in figure #1 that provides the text analytics and content mining capabilities of Watson Explorer as an integrated add-on to the IBM Watson Studio. With this new offering both novice and expert data scientists can more easily:

  • Extend an analysis to unstructured data using visual capabilities
  • Directly access and analyze the native data sources of Watson Studio
  • Manage Watson Explorer collections as a set of Watson Studio assets
  • Use the Watson Explorer Content Miner to see unknown patterns in text sources and refine information extraction
Figure #1: Watson Explorer for Watson Studio portfolio positioning

WEX for Watson Studio Integration Scenarios — A closer look

This first scenario introduces the use of Watson Explorer in tandem with SPSS Modeler for Watson Studio.

The new WEX for Watson Studio add-on introduces an SPSS Watson Explorer node to Watson Studio which enables the extraction of keywords or facets from text documents and creates structured output that is used to feed an SPSS prediction model.

The second scenario introduces Python library access to Watson Explorer APIs. API access provides a Watson Studio Notebook access to all the information in a Watson Explorer document collection including statistical scores. This can assist Data Scientists in creating documents that combine code, equations, visualizations and narratives with keywords, facets and statistical scores from Watson Explorer.

A subset of capabilities of Watson Explorer are integrated into this new add-on for Watson Studio including seamless integration of the Cognitive Content Miner and Administrator user interface.

Integration Scenario #1

WEX for Watson Studio provides a set of standard annotators for extracting basic and advanced information from unstructured data. Annotators represent a model of patterns for understanding the text. An administrator or data scientist defines the sources of information to be accessed which includes access to available Watson Studio platform data sources.

When developing a text analytics model, a data scientist can select from and enable out of the box annotators and a typical configuration for information extraction would include the parts of speech annotator which provides basic linguistic analysis including phrase constituents. Sentiment analysis and named entity recognition can also be enabled.

Users can use the dictionary annotator along with a new feature called the “domain curator” to create dictionaries from keywords, taxonomies and ontologies.

The output of this activity is a Watson Explorer node which represents the text analytics model.

The next step is to create an SPSS modeler flow that uses a Watson Explorer node, shown in Figure #2.

Figure #2 SPSS Watson Explorer natural language processing in a model flow

The Watson Explorer node, representing the Watson Explorer NLP model created in the first step, is inserted into the SPSS Modeler flow.

The user then refines a prediction model to incorporate keywords — for example those extracted from customer complaints about a retailer’s location. The input data source contains a text body that might describe a recent event in a retailer’s store. Text information provides context as to why a customer was dissatisfied with an experience in a particular store location.

The Watson Explorer node produces a table representing the keywords extracted from the body text which are then used to refine and improve a customer churn prediction model.

After creating the Watson Explorer model and customizing the SPSS Modeler flow the SPSS prediction model which includes the WEX NLP runtime extraction model can be run in the flow as an SPSS worker job on Watson Studio Deployment Manager.

Integration Scenario #2

The second use case scenario features the integration of Watson Explorer APIs into a Watson Studio Notebook. Information in a WEX collection from annotated documents — meaning tagged with extracted keywords, entities, sentiment to analytics like frequency and correlation are accessible in a Notebook for analysis and collaboration. The add-on integrates a subset of WEX into Watson Studio helping organizations to extend their collaborative data analytics environment to leverage important data trapped in text based disparate data sources.

This Notebook integration makes information in a WEX collection accessible by the WEX API and the resulting Notebooks that include WEX text analytics models can be deployed to Watson Studio Model Management & Deployment.

The SPSS Modeler for Watson Studio integration uses the new WEX node to augment flows with structured data created from unstructured sources.

Organizations can use WEX with Watson Studio in a range of development and deployment scenarios. They can start by using the Notebook integration available in WEX Deep Analytics Edition and use the WEX for Watson Studio add-on when their needs progress to using the WEX node with SPSS Modeler for Watson Studio.

Summary

I’m excited to see this new addition to the IBM Watson Studio. WEX can help offer value with its combination of open-standards-based Apache UIMA text analytics and natural language processing and content mining capabilities — helping to deliver an environment for detecting patterns in unstructured data. Let me close with the news that IBM was named a leader in this report The Forrester Wave™ AI-Based Text Analytics Platforms.

For more information on WEX for Watson Studio click here.

--

--

Steven Astorino
Inside Machine learning

Vice President of Development, Data and AI. Tweets and opinions are my own https://stevenastorino.com