Writing an Elasticsearch Chatbot with Spring Boot & OpenNLP for Teams

Published in

NEW IT Engineering

8 min readJul 23, 2020

The logbook of a journey through the jungle of buzzword technologies.

Short disclaimer: Obviously my story does not claim to be complete or even to be correct and therefore the way is clearly the goal on this journey. But for all those who don’t care and want to checkout the project, here is the link to the repository.

Exploring the map

In order not to let any operational blindness arise and to have a technical change of scenery, I thought a little excursion into the surroundings of today’s technologies would be nice. But before I could start with anything, I had to come up with an idea and a rough roadmap. For me always the hardest part, but I wanted to give it a try…

And since the topic was going around at work anyway, I thought it would be really cool to code a chatbot and as I have worked with natural language processing (NLP) a long time ago, I wanted to put this topic back on the agenda too. So far so good, but the bot should actually do something meaningful and so I thought that the famous ELK Stack could deal with one more interface besides all the good ones it already has. I mean, I found Kibana as a newbie quite cumbersome to use and it would be nice if queries could be done in a bit more “natural” way.

Okay, so here’s my idea. How about just asking a bot if any of my services in production is currently throwing any exceptions? Or maybe get a report spit out for any environment without much effort? Interesting I thought, simply because it would be pretty cool if you didn’t have to keep going through all the logs and so on by yourself and instead use the time for other reasonable purposes. And yes, I know that you can build funky dashboards in Kibana, but you can’t really compare that with the coolness of a chatbot 😎.

Anyway, the plan was ready now and so I packed the following tools into my backpack for this journey.

Spring Boot as basic framework
OpenNLP for message categorizing
MS Teams (or rather the Azure Cloud) as communication channel
ELK Stack (Elasticsearch, Logstash, Kibana) for log data pooling etc.

And since this is not my first trip, I was lucky to be able to rely on my server which I have in the cloud to ship everything. Since I couldn’t see any obvious showstoppers anymore, I can actually start, I assumed…

Expedition Preparation

To get started, I first had to take care of the necessary infrastructure. Mainly the ELK Stack had to be installed on my server, but surely a local instance (e.g. in a docker container) would have done it too.

Thankfully I only had to follow the documentation here which they provide. Anyway, actually I only had to set up Elasticsearch and Logstash for this, but for crosschecking I installed Kibana too. Later this turned out to be very useful, because it allowed me to check the calls of Kibana for my required composition, which made it easier for me to model queries against Elasticsearch by myself 😁. Bad side effect, I had to double my server memory… but hey, since my server’s in the cloud, 5 min downtime for upscaling…

And last but not least I needed an application whose logs I could feed into the system. Luckily I have some web services running on my server already which I could use for that. So I “quickly” configured and started Logstash with the following command (on my Debian server) and I finally was ready for my coding journey.

# sudo /usr/share/logstash/bin/logstash -f logstash.conf

By the way the configuration I used is here and if you also want to setup the port forwarding already, use this command on your local machine. It forwards connections to the Elasticsearch instance and also the forthcoming application the other way around.

# ssh -R 127.0.0.1:8070:127.0.0.1:8070 -L 127.0.0.1:9200:127.0.0.1:9200 john@my-super-duper.tld

Hint: Now would be a good moment to configure the proxy in the Apache2 webserver (or whatever you prefer) to forward connections from e.g. https://www.my-super-duper.tld/teams/api/ to 127.0.0.1:8070 😉.

Setting up the base camp

Like every lazy developer, I started with the Spring Initializr to click my Spring Boot App together and to make the first steps even easier I added Lombok, but apart from that, it’s ready to go. Downloaded, unzipped and installed the Gradle Wrapper… So far so simple.

Building the command center

The Alpha and Omega of every good software is smooth communication, so the first thing I wanted to do was to enable the connection to Microsoft Teams. Therefore it was initially necessary to set up a simple endpoint which receives activities (I call them Actions) and can respond to them independently. This could then be stored in the Azure Portal as messaging endpoint of my newly created “Bot Channels Registration” resource. By the way, a good example how the basics work can be found here.

very basic REST endpoint in Spring Boot

Of course I could now reply to messages directly in the endpoint and pack the answer into the response, but who knows how long it will take to calculate everything and collect all the data!? So obviously it would be awesome if the whole process would also work outside the original request. In order to be able to reply to conversations independently, I needed a token, which I got as follows.

fetching a bearer token from Microsoft

Very good. So now, if I want to reply to a conversation, I can simply do so with this bearer token and the REST endpoint /v3/conversations/{conversationId}/activities/{replyToId} from Microsoft (btw you can find the base URL which you have to use in the original activity under the attribute serviceUrl).

Communication has been established. Let’s move on to the interpretation of the text which the bot will receive.

Decrypting the radio messages

Now it’s getting exciting 😬. I wanted to train a small model using OpenNLP to categorize the received text messages into one of the following:

conversation_[greeting|continue|complete]
log_[request|environment]

and then give a meaningful answer. If the message is from any category of type log_*, the bot has to search for environment names in the text by using an easy regex like for example “.*\sPROD\s.*” . Obviously this can be done better, but that’ s enough for now.

In the next step, in case we have all information we need to respond with an proper answer, we mark the message as complete . Otherwise it’s indeed an incomplete message… So finally the bot should be able to parse a text like e.g. “Can you check the PROD logs for me?” to (log_request, PROD, complete) or “Hi!” to (conversation_greeting, null, complete). And by the way, if e.g. the information about the environment is missing and the request was from type log_*, I ask for it again. That’s what I do until I have everything and can finally mark the message as complete.

So to cast that into code I first start a small training for a categorizer which consumes a couple of simple text examples from a file as shown below.

train a simple categorizer with OpenNLP

And finally we are using this model to parse each Action to an internal Message which contains the information we need.

calculate the most likely category of an Action

Responses to the outside world

So what remains to be done is to reply to such a message. Therefore I simply created a list of possible answers for each category, which I then choose from randomly. Possible for conversation_greeting would be e.g. “Hi”, “Hey”, “Hello” or “Cheers”. And of course the same logic applies to all other categories, although an additional parameter (such as the environment) can be added from time to time. So this should be enough to work with for now. I mean, it’s not like I have to over-optimize the whole thing right away 😉.

Exploring the jungle

Phew, ok, but there’s still the fetching of the log data from Elasticsearch… Luckily there is already a great client which I could use directly. Basically I only have to query two things:

all available indices
the log entries for an specific index

The former can be easily achieved by using the GetIndexRequest with an index pattern like e.g. my-app-index-* or however you configured Logstash previously. The result should be an String-Array containing all known indices in Elasticsearch matching the pattern. And since I have previously agreed with myself that each index on my instance has an specific environment suffix like *-prod or *-dev, selecting the right index became quiet easy 😄.

Last but not least I wanted to search for log messages by using a simple wildcard and filter by a specific time range. So in my case I always ask for any "*Exception*". And to achieve this, I just combined those two filters and fired the whole thing against my Elasticsearch instance.

query log entries matching a wildcard and filtered by a time range

Tape everything together with a few more classes e.g. for thread management & message queuing and that’s it… but wait, is it really working? Let me figure this out.

Evaluation of the travel report

Great that there is already an option to do this right in the Azure Portal. I just have to navigate to my previously created resource and select Test in Web Chat in the bot management menu.

menu of my bot channel resource in Azure

Or as I wanted to test it in my Teams desktop application too, I selected the Channels menu and added Microsoft Teams as a featured channel. If you then click on Get bot embed codes there will be an popup which provides you a link which can be used to integrate the bot into your Teams chat window as contact simply by opening it in the browser… So here we go with an example.

Well, it looks pretty good for the first shot, I think. Could have been more complicated, right!? 😅

And last but not least for all those who thought this is about a real-world travel report, here is a picture from my last vacation in order not to disappoint you completely…

Happy coding and enjoy your journey.