Talk to your database

Alejandro Martín
5 min readNov 11, 2019

DISCLAIMER: this post in based on personal experiences and the situations here explained may not apply in other context.

Figures displayed on the examples are just samples for demo purposes, not actual data.

In every company in the world their employees need access to the information. Most companies purchase and install expensive software solutions, or even spend years developing complex reporting systems on-site.

However, they all fall short satisfying user needs. They either are too complex, and non-technical people can’t understand how to use those tools; or they are too user-friendly and they lack the flexibility these users need.

So, how about removing these applications and let the people just ask what they need in plain English?

In this post I’ll explain a Proof of Concept of this approach for a clothing retail store.

1. The data

First thing you need is a place where you can have all your business data together.

In the best scenario you have a corporate data warehouse where all the information is available under strict data quality measures. Another alternative is having an Operational Data Layer providing access to different datasources seamlessly.

Having this, you are able to run queries against a single datasource or data service.

2. The interface

OK, so how can non-technical people use all this data without a complex tool? Easy, just let them ask whats they need in their own words.

After studying the domain and all its details, so you can design a language model that represents your business reality. We have:

Intents

They represent the user intention, what they want to get. For example the intent for “How much did we sell yesterday?” might be Sales.getTotalSales.

As you can see, in this PoC we split intents by domain. We have Sales.* and Stock.* groups. Then, each intent refers to a use case the same way you write methods in a class.

Entities

Among intentions, we had to detect specific entities the user may be referring to. For example, consider this:

Total units sold yesterday in the UK, by product type and section.

Our user wants to get the units sold yesterday in the UK, grouped by product type (Clothing or Shoes) and section (Woman, Man or Kids).

So, how do we get this SQL query? Let’s start by splitting entities in the following groups:

  • Attributes: group by fields, which are also projected.
  • Indicators: figures or similar, usually using a group function.
  • Filters: conditions used in the where clause.

Once these entities are detected and classified, building the actual SQL query is quite easy.

3. LUIS: Language Understanding Intelligent service

For this PoC, we’ll be using LUIS (https://luis.ai/) to perform the language analysis, it’s a great simple-to-use service by Microsoft that fit great into our approach.

As we discussed early, we defined the entities in the language model:

  • Sales.get: when user wants to get sales-related data.
  • Stock.get: same for stock.

For now, this modelling is really simple as it’s on PoC. To make the system work, we just have to provide some examples for each intent and then train and publish the model. All this without leaving the dashboard.

And the entities as well, which you must be already familiar with: Attributes, Filters and Indicators.

Once the model is defined, built and trained we can do some testing right in the web portal.

The “hack”

For us it’d be perfect the user just wrote “section=1” in his phrase, however this isn’t exactly natural language, right?

So we figured out a little workaround using LUIS, consisting on declaring the entities as these cryptic expressions, and then giving plain English synonyms to each expression.

Using it as a service

Another cool feature from LUIS is exposing this model as a service, allowing us to query some HTTP endpoint.

4. Putting it all together

To test this entire flow we’ll need a simple webapp with a search box, where the business users can write theirs queries using their own words.

Submitting this query triggers the text analysis request, providing the application the language analysis and the actual SQL query.

Afterwards the app runs this query on the database and sends the resultset back to the frontend.

We could just render a table with the resultset, but since we have the full language analysis we can do better :)

By having a couple rules, like rendering a line chart when date attribute is present, we can suggest the user different visualizations depending on the query attributes.

Thanks for reading!

--

--