Serverless architecture for banking chatbot

Who is provides mortgage lenders with an AI-driven assistant that automates large parts of the lending process, focusing on first engagements. We provide tools for lenders to empower their clients to make informed decisions on the biggest financial decision of their lifetime, to better engage them, and ultimately — to reduce the cost of the process for everyone involved.

Qanta’s product is based on a hybrid chatbot — webview solution, delivered as a customized plugin onto a bank’s Web or Facebook presence. This creates brand new ways for banks to engage with their clients.

During our journey, we were faced with interesting technical challenges and we would love to share our solutions with the community.

Key Technological Challenges

Radically fluctuating throughput, in terms of “bot messages per second”.

Usually chatbot developers can predict how many users their systems will need to handle, analyzing past behavior or similar use cases. 
However, we provide a white label solution for multiple clients, and each of them has a different use case, and operates in a manner that is uncoordinated with us. This typically resulted in peaks that we don’t have control of. For example — a client running a campaign that engages a large mailing list, sometimes resulted two or three orders of magnitude of traffic, in split seconds, by surprise.

Typical fluctuations can be approached with a traditional load balancing approach — a load balancing server, adding servers and directing traffic to adjust to the traffic. This re quired a whole different way to approach scalability.

Banking Requirements.

Facing financial institutions, we were approached with infrastructure requirements that were challenging to respond to. Typically we were asked for the same configuration, to be implemented in a trusted cloud in an external POC, hybrid or private cloud for pilots, and provision for on premise solution for full production. We needed an agile, open source infrastructure, that could be easily deployed to several use cases, without having to rewrite the code base.

The Architecture

We decided to to try out OpenWhisk’s serverless infrastructure, on IBM Bluemix, since its specifications looked promising to our use case. We rewrote our core node.js invoke which was based on constant listener capability on HTTP calls to invoke our core AI function. We isolated the AI capability to run independently, and added a new routing server that handled multiple front end interfaces by specific webhooks.

The serverless architecture also enabled us to remove effects of “stacking” and delays in sending messages back to the users. With “serverful” architecture, one server was in charge of aggregating messages to every user. Here each instance can send messages independently, so after synchronization we get same performance with any number of users, instead of delays proportional to the number of active users.

The flow of a “chat” call is as follows:

  1. A call comes from facebook or a web interface through a secure webhook.
  2. The communication server authenticates the call and invokes the “conversation” action in OpenWhisk, with the information about the user, the bank and the incoming message.
  3. User info and its state in the conversation is pulled using the user id from Redis.
  4. The conversation flow is pulled using the bank id from Redis.
  5. The message is sent to NLP processing engine which returns intent and entities.
  6. Based on the received (3–5) our AI engine chooses the next required step and returns the answer message to the user.
  7. Based on the conversation reports, notifications or API calls may be invoked. For example: a request to generate an interactive graph that explains the lifetime of a specific mortgage, the graph will be implanted into the chat interface.

Additional advantage of serverless architecture is that additional functionality can be implemented using a separate OpenWhisk Action, which is much easier than setting up a dedicated server. For example, analytics operations can be run without creating a dedicated system, which is 99% idle by definition.

We added integration with Redis by Compose which is IBM solution for scalable, fast data cache. The retrieval of small data items from Redis while using the IBM Bluemix servers was dramatically faster than our previous solution of mongoDB on Heroku instance. When deployed locally (debug mode) we achieve x3 throughput improvement, while in production (when deployed on the Bluemix cloud) we achieve x10 throughput improvement (<50ms per call).

A main challenge using the OpenWhisk was debugging. We solved this issue by writing a wrapper function that is calling the local version of our OpenWhisk actions code. We have built a specific webhook for debugging on our communication server. The webhook is connected to an ngrok local server that runs the debugging wrapper function. This function calls the actions code according to the PAI call. This “hack” allows us to use a step by step debugger on our code.

The Bottom Line

Switching to Bluemix, using OpenWhisk and Redis by Compose achieved impressive results. Qanta has achieved full scalability in small effort without the need to worry about load balancing, throughput or memory issues for multiple simultaneous users. Before the change even with as little as 4 simultaneous users each of them was slowed down by the need to handle the others, now we can support endless number of users with the same efficiency. This new infrastructure will allow us to focus our future development efforts on the business logic to bring more value to our users.