How to Build Entity Recognition Models in a Jiffy using Watson Discovery-III

Jade Zhou
Inside Machine learning
4 min readOct 16, 2019

Written by Kunal Sawarkar & Jade Zhou

A step by step guide to understand how data scientists can upend their game with sophisticated but efficient handling of NLP issues — while exploiting the integration of Watson Discovery with Jupyter Notebooks for business applications.

This is a final chapter in a three part series on this topic. You can see previous stories here about how Watson Discovery can help business analysts jump-start entity recognition model building with zero coding. And how Watson Knowledge Studio empowers subject matter experts with a self-service domain-aware entity extraction capabilities.

Although Watson Discovery has good a graphical interface for people who are not familiar with coding to build their own entity recognition models, we also provide APIs in different languages for developers to integrate Watson Discovery service into their applications. Continuing on our entity problem, even after company entities have been identified in emails, it’s by no means the end of story. These entities are often used for other purposes like email routing.

Let’s assume you have multiple account managers to implement trading demands. “Peter” is one of these managers and is responsible for a bank beginning with “A”. “Mary” is another manager responsible for a different bank beginnig with “B”. To avoid Peter and Mary having to read all emails, we need an engine that can analyze various trading actions across different banks referred to in numerous emails — then allocate them to the corresponding account manager. This is where Watson Discovery APIs can be leveraged. Now let’s build that engine by consuming results from the Watson Discovery API to build the engine in python.

First, we need to connect to the API using the correct credentials. Username and password refer to your account information on Cloud Pak for Data. The url can be found in the Watson Discovery instance page below.

from ibm_watson import DiscoveryV1 discovery = DiscoveryV1( version='2019-08-21', username='***', password='***', icp4d_url=***', authentication_type='icp4d', url='***' ) discovery.disable_SSL_verification()

Next, let’s send an empty query to our collection to take a look at what the response looks like.

import json env_id = 'default' collections = discovery.list_collections(env_id).get_result()
query_results = discovery.query(env_id, collections['collections'][4]['collection_id'], return_fields='enriched_text', count = 93).get_result()
Fig 1. Query Results From Watson Discovery Python API

Basically, the response covers document_id and its enrichment. It extracts company entities which we will use for building the engine. If you add other enrichment like sentiment analysis, it will show up here as well.

Finally, we can get the list of corresponding emails for different managers based on what entities we found in the email.

def email_allocate(company_lst):
eml_lst = []
for doc in query_results[‘results’]:
try:
enrichment = doc[‘enriched_text’]
for entity in enrichment[0][‘entities’]:
if entity[‘text’].replace(u’\ufeff’, ‘’) in company_lst:
eml_lst.append(doc[‘document_id’])
break
except:
continue
return eml_lst
peter_company_lst = [‘bank of america’, 'boa', 'bofa', 'bac']
mary_company_lst = [‘barclay’, 'BAR']
peter_email_list = email_allocate(peter_company_lst)
mary_email_list = email_allocate(mary_company_lst

Now, it’s time to embed this function into your enterprise application to finish the email routing task.

Fig 2. Routed Email List

Peter and Mary should now receive emails which are automatically extracted by their relevance and responsibility. The emails are routed to them which significantly improves process efficiency. This can be integrated to any front-end application for action and visualization. Note that extracted entities include all variations, abbreviations and formats for a given company name. So Peter will receive emails that have actionable items for Bank of A, BOA , bofa, BAC, etc,.

Lastly, data scientists can combine results from Watson Discovery with other NLP packages (like NLTK or LDA modeling results) using the same methods. This will continue to extend and integrate the above model to the NLP pipeline.

Just to recap, Watson Discovery on Cloud Pak for Data includes the items shown in the image below.

Fig 3. Overview of Watson Discovery on Cloud Pak For Data
  • Smart Document Understanding - Visually annotate and enrich documents, including image formats using OCR
  • Integration with Watson Knowledge Studio for domain specific problems
  • Identify entities in customer conversations using Watson Assistant

Github Link:

https://github.com/greenorange1994/EmailRoutingByWatsonDiscovery

--

--