MODERN Business Intelligence: NLP: It’s not a Feature, it’s the Future!

John Thuma
Oct 15, 2018 · 5 min read

Natural Language Processing, or NLP, is a field in computer science that enables interaction between a computing service via the human language and various human interfaces. You have been using it for decades! Most of us have used Google, Bing, Yahoo, or DuckDuckGo to interact with the world wide web. You probably didn’t need to take a training class to use it. Google is so popular it is now a verb! Don’t believe me go ahead and check the Merriam-Webster Dictionary online:

Image for post
Image for post

There is no doubt that making computing seamless in our daily experiences is an absolute when considering adoption. Business Intelligence is finally starting to take advantage of this capability and it is a very exciting place. For decades all users of BI and reporting have been waiting for easier ways to interact with its data. Excel was one of the first tools and it was wildly popular. However it was still somewhat complex. Then drag and drop BI tools came around and made it much easier to deal with data from more complicated and larger datasets. These tools still required complex IT skills and did not allow people to use the English language to analyze data. The BI NLP Search capability is not a feature, it is the FUTURE! It is the ultimate is SELF SERVICE BI! The rest of this document will detail the three things you need to enable NLP in your BI tool:

  1. Easy setup and works across the data lake.
  2. Ad-Hoc and Dashboard development.
  3. Zero data movement and Zero ETL.

Easy setup and works across the data lake:

If you have a data lake on Apache Hadoop or use a cloud based service like S3 from AWS or ADLS from Azure it should be very easy to configure and leverage this data. It should be easy to setup an NLP semantic model that allows you to search for and perform analytics using the English language. You should be able to easily setup your business vernacular and language and just start using it. As new data comes into your sources that data should be instantly available too! There are a few solutions that do a pretty good job at NLP, for instance ThoughtSpot and AnswerRocket. However, these solutions require heavy technical work to become usable. They also require data movement from one platform to the next and have a proprietary data model you must conform too! This is not easy and also creates latency. With Arcadia Data you just setup a Dataset, make it available to the Search feature, and then provide some synonyms and hints. The setup is easy. Your searches also work against any Dataset you have turned on for NLP. As new data arrives it is ready for Search.

Ad-Hoc and Dashboard development:

I want to have a conversation with my data! I want it to be easy. I want a browser like interface that allows me to search all my Datasets just like I would if I was using a Web Browser. I want to be able to save and bookmark my searches. I also want to be able to look at my history. Sound familiar? It sounds just like Google Chrome, or Microsoft Edge, or Firefox. These Web Browsers are used to search the web based on pre-indexed web servers and its content. If I make a mistake, I just start over! That is exactly the way Arcadia Enterprise NLP search works but it works against your data in your data lake. It also ranks your searches and puts them in the order that the AI engine believes to be the most accurate. Arcadia Enterprise also selects the best visual for you but allows you to change visuals if you wish! See below as we select the bubble visual type!

Image for post
Image for post

I can also use this same NLP feature to build dashboards that can be used by others on my team. This means that I can share my work with others. This is the ultimate feature for business self service and dashboard development!

Zero data movement and Zero ETL:

ThoughtSpot and AnswerRocket force you to move data into its proprietary system and data structure. That means ETL. ETL is an acronym for Extract, Transform, and Load. ETL is the most expensive letters in BI! :) I joke around about this but everyone knows this is true. I also have to setup the ETL to get new data into the system as it arrives. That means latency! I also have another copy of my data that I have to concern myself with! Yes, another version of the truth! The other huge aspect of this is that as my data changes, which it will, I will have to make changes to my ETL as well. Arcadia Data does not require any ETL, zero data movement, and makes change simple. There is also zero latency as late arriving data is automatically a member of the search party!

Check out this video and see for yourself how easy it is to use Arcadia Data for NLP and Search Based BI.


In my 30 years of technical experience I have always tried to limit data movement as much as possible. It is my belief that having many copies of data increases my security threat surface area! It also means that I have to do a lot more testing to ensure things add up across those different data copies. As things change in my data, which they always do, having ETL and copies of data means that those changes are expensive and timely. I also have latency concerns. NLP should be as easy as turning it on, setting it up, and providing hints to my users. Arcadia Data and its NLP Search engine is not a feature, it is the FUTURE! It is the ultimate is SELF SERVICE BI!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store