Embed NLQ and Augmented search into your applications, with Db2 Augmented Data Explorer

Shruthi Machimada
IBM Data Science in Practice
6 min readOct 31, 2019

IBM Db2 Augmented Data Explorer (ADE) is a search and discovery tool that understands your Natural language search terms and returns data from your database and suggestions for subsequent searches.
Db2 ADE understands your metadata, infers relationships between your tables and uses statistical detectors and NLG to present the results of the analyses in natural language and in the form of visualizations, telling you a story with your data.

With Db2 ADE V0.7 we bring REST APIs that allow you to embed NLQ and Augmented search in your applications.

Before you begin:

If you do not have Db2 Augmented Data Explorer, download the beta version here, and use the resources to help you get started.

After you install ADE on a server, you can access it in two ways:

  • By typing in https://hostname:5000 on a browser, to perform all these operations through the UI
  • Using ADE’s REST APIs

Here, we will use REST APIs in a Jupyter notebook to connect to a Db2 database and perform a natural language search to get insights about the ‘Average money spent’ by customers. We will then use the data that was retrieved to plot a graph and also read the insights that ADE generates for us.

You can access the original notebook that has more detailed analysis here.

The API Workflow:

1. Authenticate the user with the POST /api/v1/sessions endpoint

2. Connect to a datasource with the POST /api/v1/connections endpoint

3. Crawl and index the data in your connected data sources with the PUT /api/v1/status endpoint

4. Send a natural language search text to get recommendations using the POST /api/v1/suggestions endpoint

5. Retrieve the data and natural language insights using the POST /api/v1/results endpoint

Setting up

We are going to import the required libraries and create a variable that stores the base URL for the REST API. Update this variable to the host name of the server that has Augmented Data Explorer.

  1. Getting authenticated

Use the POST /api/v1/sessions endpoint to authenticate a user and create a session and returns an auth token, which will be used by the subsequent API calls.

You must be authorized as a user to access the endpoints. The level of access the user has depends on the level of permission the user was assigned, while the user was added by the Admin user. Here we are authenticating with a user, who has the privileges to create connections and conduct searches.

Three permission/roles are available to assign to users:

  • Search, which is read-only access to search and exploration
  • Setup/edit data, which allows for modification of connections and crawls
  • System administration, which allows the creation and deletion of users.

To make it easier to issue subsequent calls with the proper authorization, we will create a header variable with the token.

2. Setting up connections

The POST /api/v1/connections endpoint creates a new connection to a Db2 data source, using the hostname, username and password for the database. You can mention the schema and tables that you want to make available. All the columns in the selected tables will be available for crawling.

If the connection was successful, you will see a ‘Successfully created connection.’ message in the response.

3. Crawling the connections
Using the PUT /api/v1/status endpoint.

Before you can search or get results from Db2 Augmented Data Explorer, you need to crawl the connection that you set up previously. The crawling process connects to the database and stores information about the data in an elasticsearch index.

You can use the GET /api/v1/status endpoint to check the progress of the crawling process. When crawling is False, crawling is complete and we can start searching.

4. Searching

Using the POST /api/v1/suggestions endpoint.
Searching in Db2 Augmented Data Explorer involves sending search text and getting back query suggestions for this search text. These suggestions are valid query objects, which we can use to get data.

Objective of our search:

Get insights for `Average Money Spent` by our customers, from the Customers table in the ADEDEMO schema. The schema has a “Customers” table and a “Transactions” table.

Tables in the ADEDEMO schema

We want to find the relationships between ‘Average Money Spent’ with Gender, Education, Age or Marital Status

We first want to get some suggestions from Augmented Data Explorer, about the analyses we can do regarding the Avg Money Spent.

We get all the search responses as suggestions from ADE in JSON format. Suggestions are composed of of queries. Each query object has details of the column names, aggregate functions or groupings it did, to retrieve the result. We then use the query object to get Data and Natural language insights using the /results endpoint.

5. Getting data and natural language insights

The POST /api/v1/results endpoint uses the query object that is generated by the suggestions endpoint, to retrieve the data, a recommended visualization type that can be used to represent the data and natural language insights.

We use the query object that had Avg_Money_Spent and Region :

Displaying the data fetched from Db2

Reading the insights generated

The insights from Augmented Data Explorer tells there is a weak relationship with Gender.

Visualization

ADE recommends a chart type that would best suit the data fetched. To get the recommended visualization, the user types…

Plot a graph with the data retrieved:

We use the recommended chart type to plot a chart using matplotlib

We followed the same steps with other query objects and using statistical detectors and NLG we quickly know that Average Money Spent has:

  • Weak relationship with Gender and Region
  • Strong relationship with Customer Value
  • No meaningful relationship with Month
  • Moderate relationship with Education

Conclusion

We have conducted a bivariate analysis of Avg_Money_Spent, without writing any complex queries or conducting correlation and exploratory analysis.

With ADE, you do not have to run SQL queries to retrieve data from your database through your application, and you do not need to know the table names, their structure or the exact column names.
ADE provides suggestions for searches that can be performed, based on the search text and will summarize the results of the statistical analyses it performs, in natural language.

--

--