At Nivi, we’re hard at work building a great chatbot for health. We started our journey with a simple interactive voice response service in Kenya that helped users identify and obtain methods of family planning that fit their goals. It worked great, but it was a pretty structured process. We ask you some questions, you press a key to respond to these automated audio prompts, and we text you a referral. Perfect if this specific referral is all you need, but follow-up conversations with users suggested that we were missing a bigger opportunity to provide some value.
We thought it would be interesting to open up communication and invite users to send free form text messages. We would then try to determine what they want and attempt to meet their needs. And like that, askNivi was born. Users send SMS messages to our toll-free shortcode, and we pipe these texts to an internal app that lets our customer success team read and respond. It was a success! Messages started pouring in. This presented a new challenge: responding to all of these messages.
What Are You Trying to Say?
To keep pace with the demand and offer users something truly special, we set out to turn askNivi into a platform for automated conversations about health. The first challenge for any bot maker is to figure out what the heck a user wants to do or know. We turned to the Wit.ai natural language engine for help classifying incoming requests.
To get started, we simply fed a few hundred messages to Wit’s web app and manually tagged each message with an intent label.
In the next iteration, we integrated Wit (API docs) into our chat app workflow. We started passing incoming requests to Wit, and Wit classified the message with one of our intent labels (or not, in the case of recall errors). Back in our chat app, a member of our customer success team would validate the suggestion and continue the conversation.
Managing Human Intelligence Tasks
This workflow has served us well, but we quickly developed a backlog of messages that did not fall into one of our existing intents and therefore went unclassified. It was easy enough to identify new intent labels to close the gap, but we then had a “human intelligence task” on our hands, to use the language of Amazon’s Mechanical Turk platform: we needed people to read several thousand bite-sized messages in our unclassified pile and match them to the newly updated list of intents.
We considered a range of options, starting with MTurk. At first it seemed like an ideal solution: (Step 1) Create a project with discrete HITs; (Step 2) Amazon serves each message to multiple raters (“Workers”) who use a simple web interface to classify the message; (Step 3) Analyze the data for agreement between raters. Technically speaking, the MTurk platform would get the job done, but with a corpus of messages written in English, Swahili, and a mashup of the two, we came to the conclusion that it would be difficult (impossible?) to filter the pool of MTurk workers to find the right folks for the task.
At the other end of the difficulty spectrum, we also thought about just creating a spreadsheet with messages and a dropdown menu of intents. But what would be the fun in that? Besides, we had a better idea that would not be very hard to implement.
Our Desired Specifications
- Create a web app that would serve raters with one message at a time and ask them to classify the intent of the message.
- Stop serving specific messages once agreement is reached across 2 raters.
- Don’t serve repeat messages to raters.
Getting Started with Shiny+Flexdashboard
I’m going to show you how to use R Markdown and the
flexdashboard package to create a simple shiny app to run a MTurk-style process.
I’m a diehard R user, lover of R Markdown, tidyverse convert, and general fan of RStudio. I had been looking for an excuse to develop my Shiny skills for creating interactive documents, and this seemed like perfect opportunity.
If you’re not familiar with Shiny, the developers describe it as:
They sure did get the marketing right for me: a data analyst who wants to build a web app, but can’t because he googles how to specify font colors each and every time he is forced to use HTML and CSS!
Step 1: Get a free account at shinyapps.io
Unlike static R Markdown outputs that can served by any web server, interactive documents with Shiny apps require a Shiny server. Right now there are three options: (a) host your own Shiny server; (b) host your own instance of RStudio Connect, a broader platform for sharing your data science products; or (c) put your app on shinyapps.io, a hosted service from RStudio. I’ll show you how to do Option C.
Step 2: Setup your local machine
You’ll also need to install a few packages:
install.packages(c("flexdashboard”, “shiny”, “rdrop2", “tidyverse", “shinyWidgets", “DT"))
My preferred workflow is to create a git repo and create a new project in RStudio that maps to this directory on my local machine. Projects let you forget about working directories and have some other nice features.
Here’s a link to the Github repo that I created for this example.
Step 3: Setup persistent remote storage (required for Option C)
If you are not using a self-hosted option, you need to setup some form of remote data storage because, currently, shinyapps.io does not store data generated from one Application Instance to another. Dean Attali has a great guide for solving this problem.
I decided to use the
rdrop2 package to connect my Shiny app to my Dropbox account. To authenticate with Dropbox, run the following commands once in the R console:
token <- drop_auth()
saveRDS(token, file = “droptoken.rds”)
This generates a token that you will upload to shinyapps.io with your app. Remember, there are other approaches. See Dean’s summary. (Shoutout to Clayton Yochum for helping me get things running.)
On Dropbox, create a folder to store the results and copy over the
raters.csv file from my repo to this folder (leave copies in the local repo). Do this before going to the next step.
To run my example without any changes, create this folder in your top-level Dropbox directory and name it
[Fair warning: Using Dropbox as the remote storage solution does create a race condition. It’s technically possible for one rater to overwrite another rater’s submission if both raters submit at the same exact moment. In our use case this is not a big threat, however.]
Step 4: Design your app and dashboard in a .Rmd file
I’ve done this step for you in the
classify-example.Rmd file in my example repo. If you installed all of the required packages in Step 2, you should be able to open the document in RStudio, hit “Run Document”, and get the plain vanilla app to open.
The default setup in line 27 (
remote <- 0) is to use local storage so you can get the basic app running, but as explained previously, this will not work on shinyapps.io. You need to switch to remote storage. If you’ve completed Step 3, you can change
1 in line 27, which will tell R to interact with Dropbox at several points in the file where there is a check for
(If you tested locally with
remote <- 0 before saving a copy of
Dropbox/dash/master.csv, grab a fresh copy from my repo and save it to Dropbox so the remote example will work properly.)
Step 5: Customize your content
With your app up and running locally, it’s time to replace the content in the
raters.csv files on Dropbox. In
master.csv, just paste or pipe in your messages, assign unique message IDs, and set all other values to
raters.csv is straightforward.
You should also replace some content in the
.Rmd file. Search for the vector of intents for
inputId = “classification” and replace it with your own intents (search for two instances in the file). If you remove or edit the final two elements of the vector, “Another option not in this list” or “Cannot make sense of text”, search for this text in the file so you can update some conditional logic about when input selection boxes are shown/hidden.
Step 6: Publish your app
The last step is to publish to your Shiny server. If you’re using shinyapps.io, follow these instructions to
install.packages('rsconnect'), obtain your token on the website, and configure RStudio to publish directly to your shinyapps.io account (or self-hosted server).
Step 7: Make it better
Here are a few ideas for iterating on this example:
- Integrate rater confidence. In this example, raters indicate their degree of confidence when labeling each message, but this information is not used.
- Use a better remote storage option that avoids a race condition.
- Create a process to evaluate more than just exact agreement between raters. Maybe you want to push raters to use very nuanced intents that are similar yet distinct. Requiring an exact match between raters could be limiting.
- Replace dropdown user selection with authentication.
Learning More About R Markdown, Shiny, and Flexdashboard
This is a fairly simple example that should enable you to play around and explore what parameters control how shiny apps look and function. To learn more, check out the awesome resources from Rstudio and get help over at Rstudio Community.
(This gist is named with a
.R extension, but this is only for the syntax highlighting. This is really a