Journal of a bot wrangler

Tom Swales
8 min readMar 9, 2017

--

Those of you who enjoy reading The Economist may be familiar with the “World in…” publication which is published annually just before the new year. It provides a wealth of predictions and analysis for the coming year, for browsing during the holiday period.

The World in 2017

One of the career-related predictions for 2017 related to a hot new emerging profession: the ‘bot-wrangler’:

“An emerging trend is the practice of dealing with companies and online services through conversational interfaces — speech or text messages. Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana are the most prominent examples, but many companies are creating corporate “chatbots” that can respond to customer-service queries or dispense information. Rather than visiting an organisation’s website, you may end up talking to its bot instead. Just as websites need designers and programmers, bots will need specialists to devise their business rules, write their dialogue and keep them up to date: a job category that might collectively be termed bot-wranglers.”

Somewhat intrigued, I decided to spend a couple of weekends exploring what this job (or collection of jobs) might involve, through prototyping a simple chatbot to solve a very specific problem: helping business users and software developers to identify cognitive APIs (public, cloud-based AI and machine-learning services) that can help them solve a business problem. This article describes the process, and provides my initial conclusions on what the role of ‘bot-wrangler’ is, and whether it is something to aspire to.

The goal

I wanted to create a simple bot that would allow a user to describe their business problem in their own words. The bot should then translate this into a functional requirement, and instantly return a list of cognitive APIs meeting that requirement, with a weblink to the homepage of the API. For example, a user might want a service that could help them: “search for all documents containing similar content”. The chatbot should then interpret this specific sentence as a more general functional requirement for a ‘find related content’ service, and recommend any APIs in the app database tagged with that feature. The high-level process would therefore be:

Capture user text input -> recognise intent -> apply filters -> present results

The process

  1. Carry out market research.

As the basic purpose of the chatbot is to recommend cognitive APIs to busy users, first an understanding of the basic categories of API was needed. This basically involved looking at the main providers, and reading the descriptions for each service, understanding which features they offer, and creating data entries for each of them, which I did as basic JSON objects:

Example Provider:

{created: new Date(), name: "Microsoft", hq: "United States", status: "Public", icon_url: "logos/microsoft.png", homepage: "https://www.microsoft.com"}

Example cognitive API:

{created: new Date(), providerId: one, name: "Computer Vision API", category: "Visual_Recognition", features: ["identify_image_objects"], industries: ["all"], providerName: "Microsoft", availability: "Public", icon_url: company_logo_one, api_homepage: "https://www.microsoft.com/cognitive-services/en-us/computer-vision-api"}

I followed this up by googling the categories (e.g. “natural language processing API”) to identify some more niche providers in the same categories as the big players. I certainly do not claim to have identified all of them; only enough to demonstrate some entries for each category.

2. Build a metadata schema.

Next, it is necessary to construct metadata that will be shared between our user-facing application and the backend chatbot. This involves creating a unique string for each cognitive API feature and tagging each data entry with the feature tag in a ‘features’ array.

features: ["identify_image_objects", "text_recognition",  "recognise_face"]

3. Set up a back-end chatbot service.

As this was a weekend project, I really didn’t want to spend months developing a natural language processing system of my own, so I decided to use the IBM Watson Conversation service to create the conversation part of my application. It provides a free tier, available through an IBM Bluemix account, which allows 1000 API calls per month — enough to get the app up and running.

To get this going, I first created a list of ‘intents’, corresponding to the feature tags defined earlier. The goal here is to train the Watson service to differentiate between the different requirements that a user might have based on their input. Since there is a limit of 25 intents in the free tier, I included around 20 API feature tags, plus some conversational intents, such as ‘greeting’, ‘goodbye’, ‘not_sure’, etc. to facilitate a more natural interaction.

#get_entities_from_text
#navigation
#not_sure
#linguistic_analysis
#machine_learning
#recognise_face
#sentiment_analysis
#provide_recommendations
#text_recognition
#convert_file
#goodbye
#translate_between_languages
#classify_documents
#analyse_graph
#see_all
#generate_natural_language
#identify_video_objects
#identify_image_objects
#greeting
#speaker_recognition
#natural_conversation
#speech_to_text
#text_to_speech
#find_relevant_documents
#identify_intent

You can also add ‘entities’ for Watson to recognise. These can be specific things that the intent relates to, e.g. for a ‘Cook’ intent, ‘Cook’ + ‘Pizza’, ‘Cook’ + ‘Chicken’, etc. I played around with this, but didn’t end up using it for the purposes of my prototype.

The final part of setting up the conversation is to create a dialogue. This is a structured way of determining which responses or further steps Watson should make based on the identified intents. My basic structure is shown here. Originally, I intended to have a follow up question (“what industry are you interested in?”) that was independent of the first question, so I found it preferable to aggregate all of the meaningful intents into a single node and use the condition if intents[0] to trigger it; doing this would follow the ‘don’t repeat yourself’ principle for any subsequent child nodes. However, if your follow-up question is dependent on the first question (for example, “what kind of documents do you want to extract entities from?”), you will want a separate ‘get_entities_from_text’ node with its own child nodes.

Dialog structure

4. Train the chatbot service.

Watson Conversation must be trained to recognise intents by providing examples of user input corresponding to each input. I thought of the various different ways that a user might ask for each feature, and created some initial ‘seed’ sentences. I also copy and pasted some of the marketing selling points (phrased in terms of user benefit) from websites in each category of API. Once real conversations are taking place, the model can be further trained, so I didn’t worry too much about accuracy for now.

Some training examples for an intent

5. Build a user-facing application. I put together a Meteor JS application using React and Mobx on the front end (a previous article describes this architecture). This app was intended to be simple, modern-looking, and mobile responsive. I also loaded my API data objects into MongoDB collections, to be loaded in full with a single method call on initial client request. The reason for using Meteor as opposed to a static front-end app was to facilitate future extension of the app and/or more granular data fetching. The source code (minus Meteor.settings and NPM packages) is available on Github.

6. Connect the application to the chatbot service

Because the Watson service requires API secret keys and other credentials, for security reasons, all Conversation API calls must be made server-side, with the keys, Watson URLs etc fed in as settings. Meteor provides a way to do this using Meteor.settings (see link for how to do this).

When the user enters a message in the chat window, the client makes a method call to the server, which in turn makes an asynchronous API call to Watson Conversation, wrapped in a Future (see the NPM package used).

When the API response is received, the result is returned to the client, where it is processed by the mobx state store to pass new data to the user interface, including deciding whether to show the results, and which filters to apply automatically. The filters use the ‘features’ array described earlier to select relevant results.

The important thing to note here is that the Watson conversation service is stateless, so if you want to pick up the state of your previous conversation, you need to store and then resupply the context object on every new call.

7. Deploy the application

I used Meteor ‘build’ to generate a plain Node.js app and then deployed this on the IBM Bluemix platform (see this article for a good step-by-step guide).

The result

The result of this work can be seen in the live prototype here.

To try it out, think of a business problem for which you think an AI service might be of help, and describe the problem in your own words. Then see whether the chatbot helps you get to some options quickly. Also, try putting in different queries once you can see the results — the UI should update in a satisfying way according to your new input!

I will be reviewing all inputs and training the chatbot on a regular basis, so hopefully it should get more accurate over time. Be aware of the fact that it is limited to only 20 categories of function, so may not cover your use case at present. Please feel free to suggest some improvements below!

Learning points

  • Finding, extracting, categorising and ingesting data about the API services (i.e. the market research and data modelling) took even longer than the app coding and chatbot setup (or at least felt longer!)
  • If using Watson Conversation user interface, get comfortable with the JSON view of the node (using the {} button), and in particular adding and manipulating a context object, as this is critical for maintaining conversation state between your app and the API service

Conclusion

What was clear from this project is the potential for more sophisticated processing pipelines in order to complete more advanced tasks. For example, if I wanted to create an Alexa-style voice-driven service for recommending cognitive APIs, I could add two processing tasks to the processing pipeline:

Record voice input -> convert voice to text -> recognise intent -> apply filters -> present results

Each of these steps could be completed with APIs or libraries from different vendors (or open source projects), and then orchestrated in such a way as to deliver a useful result.

The job of the chatbot wrangler, then, might be to understand the goal of the bot, analyse which processing tasks are needed to accomplish the goal, to identify the combination of services which can deliver this with maximum speed, accuracy and cost-efficiency, to collect and structure the data that should be filtered or processed, and to integrate these into customer-facing applications and data models in a way that delivers a good overall user experience.

This job therefore requires elements of market research, data modelling, user experience design, process analysis, software engineering, and programming. Combining all of these in a single job would be both very challenging, but also very interesting, so I agree that the ‘bot-wrangler’ may be a hot job of 2017 and beyond!

--

--