Developing with IBM Watson Retrieve and Rank: Part 3 Custom Features

IBM Watson Natural Language Classifier

Machine Learning Features

R&R has a set of native feature scorers that score lexical overlap between a given query/document pair. Those scores are generated through a custom solr plugin in Retrieve. Scores are then sent to the ranker which outputs a ranked list of documents for the query based on its learning. Depending on solr configuration, each feature will score various fields or combinations of fields within each document. Figure 1 shows an example of a call center solution where each document represents an incident and contains a text field for Short Description, Long Description and Tech Notes. Each numeric score in the table represents the output of a given feature scorer where the input is the query “What does the X221 error message mean?” and a document field.

Fig. 1 Retrieve and Rank example flow

Custom Features

For many R&R implementations, these native lexical features are sufficient to meeting the success criteria of the application e.g. migrate search implementation to the cloud, enable natural language search, drive x% higher relevance for some subset of query types. Once the R&R cluster is in production, we can imagine improving relevance over time as we collect and refine additional training data. That improvement would look something like figure 2 where the x axis represents time and the y axis represents some average relevance metric.

Fig. 2 Relevance Improvement over Time

Lexical Answer Type

In the Jeopardy system, the first stage in the question answer pipeline was called question analysis. In question analysis one of the key tasks was to identify the lexical answer type (LAT) which refers to the term in the question that indicates what entity type should be returned as the answer. For example the LAT in “What is the capital of California” is capital and the system should further generalize that to capital city such that the system knows to return a city as its answer.

Natural Language Classifier

From WDC Documentation:

The Natural Language Classifier (NLC) service can help your application understand the language of short texts and make predictions about how to handle them. The service supports English, Arabic, French, Italian, Japanese, Portuguese, and Spanish. A classifier learns from your example data and then can return information for texts that it is not trained on.The service employs a new set of technologies known as “deep learning” to get the best performance possible. Deep learning is a relatively recent set of approaches with similarities to the way the human brain works. Deep learning algorithms offer state of the art approaches in image and speech recognition, and the Natural Language Classifier now applies the technologies to text classification.
  • Cause — What causes Appendicitis?
  • Symptom — What are the symptoms of Appendicitis?
  • Complications — What are complications associated with Appendicitis?
  • Diagnosis — How is Appendicitis diagnosed?
  • Treatment — How is Appendicitis treated?
  • Prevention — How do I prevent Appendicitis?
How does * develop?,condition_cause
What causes *?,condition_cause
what is the cause of *?,condition_cause
How does * develop,condition_cause
What causes *,condition_cause
How to prevent *?,condition_cause
what is the cause of *,condition_cause
What is *?,condition_definition
explain *,condition_definition
more detail on *,condition_definition
whats *?,condition_definition
what are *,condition_definition
what is *?,condition_definition
what are the symptoms of *?,condition_symptom
What are the symptoms of a *,condition_symptom
what are the signs of *?,condition_symptom
signs that I have *?,condition_symptom
How do I know if I have *?,condition_symptom
symptoms for *,condition_symptom
are * a sign that I have *?,condition_symptom
what are the complications of *?,condition_complications
complications of *,condition_complications
what complications arise from *?,condition_complications
* complications,condition_complications
what issues come from *?,condition_complications
How is * diagnosed?,condition_diagnosis
* diagnosis,condition_diagnosis
How would I know if I have *?,condition_diagnosis
Tests for *?,condition_diagnosis
Diagnose *,condition_diagnosis
What are treatment options for *?,condition_treatment
* treatment,condition_treatment
is there a cure for *?,condition_treatment
cure *,condition_treatment
how to treat *,condition_treatment
treatment for *,condition_treatment
effective ways to heal *,condition_treatment
How can I avoid *?,condition_prevention
How to prevent *,condition_prevention
What are ways to prevent *?,condition_prevention
* prevention,condition_prevention
what are ways not to get *?,condition_prevention
curl -u “username”:”password” -F training_data=@nlc_train.csv -F training_metadata=”{\”language\”:\”en\”,\”name\”:\”niddk_classifier\”}” “"
curl -u “username”:”passw” “"
curl -G -u “username”:”password” “ is Proctitis”

LAT Custom Feature Logic

In our query-document scorer, we need to compare the LAT identified by NLC to the document returned by solr so that we can return a definition for a definition question, symptoms for a symptom question etc. We could use NLC to classify sections of documents as well, but in our case we can use the structure of the documents to simplify this process. Each document in our NIDDK dataset has section titles called “What is *” “What causes *” “What are the symptoms of *” and so on. If you recall in Part 1, while segmenting and formatting our documents for solr we added basic logic that tagged each section based on these titles. As a result, we have a field in solr called doc_type that maps to each of our LAT types. So our feature scorer logic will look like the following:

Question = What are the symptoms of Appendicitis?curl -G -u “1a288545–7f80–4502–99ac-ef420c91f17a”:”oVeEI6ncmaTF” “"{“classifier_id” : “9a8879x44-nlc-969”,“url” : “",“text” : “What are the symptoms of Appendicitis”,“top_class” : “condition_symptom”,“classes” : [ {“class_name” : “condition_symptom”,“confidence” : 0.9526760239113281}, {“class_name” : “condition_complications”,“confidence” : 0.01749617382185962}, {“class_name” : “condition_cause”,“confidence” : 0.011224721995999037}, {“class_name” : “condition_diagnosis”,“confidence” : 0.006870593080181918}, {“class_name” : “condition_prevention”,“confidence” : 0.004564829920859066}, {“class_name” : “condition_definition”,“confidence” : 0.0036075458415429787}, {“class_name” : “condition_treatment”,“confidence” : 0.0035601114282294262} ]}
Doc 1:{
 “id”: “a9e69b96–099e-4a02-b1ae-96a0956c484b”,
 “source”: “Appendicitis”,
 “doc_type”: “symptom”,
 “topic”: “What are the symptoms of appendicitis?”,
 “text_description”: “The symptoms of appendicitis are typically easy for a health care provider to diagnose. The most common symptom of appendicitis is abdominal pain. Abdominal pain ...“
 }Doc 2:{
 “id”: “b5eae497–17df-4510–8c07–12d8e18bd6bc”,
 “source”: “Appendicitis”,
 “doc_type”: “definition”,
 “topic”: “What is appendicitis?”,
 “text_description”: “Appendicitis is inflammation of the appendix. Appendicitis is the leading cause of emergency abdominal operations. Spirt MJ. Complicated intra-abdominal infections: a focus on appendicitis and diverticulitis. Postgraduate Medicine. 2010;122(1):39–51. “
 }Doc 1 LAT Feature Score = 0.9526760239113281
Doc 2 LAT Feature Score = 0.0036075458415429787

Custom Feature Proxy

To use custom features with R&R, we need to effectively split the Retrieve and Rank services. We will use a proxy to handle an incoming query, collect the documents along with their Retrieve scores, collect the NLC confidence scores, combine scores and send them to the ranker. Figure 3 shows how this works conceptually.

Fig. 3 R&R Custom Feature Proxy Architecture
  1. Custom feature scorer builder

Build Custom Feature

  1. Clone the custom scorers project
  2. Navigate to /rr_scorers/query_document
pip wheel .

Configure R&R Proxy

  1. Clone the custom scorer proxy project
  2. Create logs and answers directories (this will be updated in the project)
mkdir answers
mkdir logs
“description”:”Score based on Document/Query alignment”,
“service_url”: “",
“service_username”: “username”,
“service_password”: “password”,
./ ./config/service.cfg ./config/sample_features.json

Train a ranker through the proxy

Training the ranker is the same basic process as followed in Part 2 except we will point the train script to the proxy instead of directly to R&R. There is a file on github that has this set up. Note the field names are hardcoded for the fl parameter on lines 83 and 95 so if you are using this project, you’ll need to change those. The feature depends on the doc_type field so we need to ensure it’s returned by Retrieve.

python -u username:password -i gt_train.csv -c sc3689b816_2b07_4548_96a9_a9e52a063bf1 -x niddk_collection -r 30 -n lat_ranker -d -v

Call the Proxy server at runtime

Your application will call the proxy directly and the proxy will handle the call to R&R. That can be done at the following endpoint.

Fig. 4 Relevance@N


Custom features are a powerful way to provide additional relevance signals to the ranker. There are a set of open source tools that enable creation of these features and the process will simplify in future versions of R&R.

Machine Learning with IBM Watson

Chris Ackerson is a solutions architect with IBM Watson. In this publication, I’ll highlight how developers can leverage Watson to add Machine Learning, Natural Language Processing and Computer Vision capabilities to their applications.

Chris Ackerson

Written by

Machine Learning with IBM Watson

Chris Ackerson is a solutions architect with IBM Watson. In this publication, I’ll highlight how developers can leverage Watson to add Machine Learning, Natural Language Processing and Computer Vision capabilities to their applications.