Zero State: Meaningful Chatbot Conversations without Storing State
I spent this summer at Microsoft doing my second internship with the company, and I had an amazing time! One of the more interesting problems I faced originated from my project for the summer, which was improving an existing chatbot that was written with v3 of Microsoft’s Bot Framework. This was a fairly large and complex bot, Who bot, and since it was already being used within the company daily there were a few critical restrictions on any new features being added to the bot. The most interesting of those being that use of external storage was discouraged, as the current architecture did not need one and making a new one added overhead. This severely limited the types of features I could add to the bot until I got a bit more creative.
State in Chatbots
Most production ready bots that are written with the bot framework leverage some kind of data store for keeping track of user information and conversation state. User state is used to keep track of information about a user ( i.e. their name, favorite pizza toppings, or messaging preferences) that the bot can later reference. Conversation state is required if anything the bot says or does depends on information relevant only in this conversation. So that when the user sends a message in response to a prompt, the bot framework can understand exactly what dialog in the stack the user was responding to. (From a more technical point of view, user state is keyed to a user id, while conversation state is keyed to a conversation id).
With both conversation state and user state being stored, the bot can handle multiple inputs from multiple different users and respond to each query with the correct context from the previous inputs and replies (and can scale by using multiple instances of the bot, sharing a data store).
So this leads to the core challenge with my bot enhancement:
Can I have meaningful and complex interactions with a bot, without creating a place to store user or conversation state?
And the answer, perhaps spoiled by the title of this article, is yes!
The Interaction
The meaning of “meaningful” interactions varies wildly, but for my purpose, the focus was on providing a quiz to the user and grading their responses. While this might not sound “groundbreaking” at first, it involves sending a question to a user and later processing their response to the question. This traditionally requires the bot to have some understanding about what was just asked in order to grade the user’s response.
Adaptive Cards
Adaptive cards were a huge help for the solution since they provide a much cleaner UI than what more traditional cards provide. While you could do something similar with a simpler card type and buttons, adaptive cards provide a more customizable card and allow you to easily change the data your card is sending based off of user input.

All messages that are sent to the bot framework are in the form of a JSON object. Cards in the bot framework allow you to attach values to your JSON payload to be sent when a button is pressed. Adaptive cards are awesome because they can dynamically change which values are sent based off which selection the user has made (similar to the way an HTML form works).
Stateless
“Stateless” can sometimes refer to the separation of state and core application logic instead of truly having no state. This typically is implemented with a database, and this is exactly what we are trying to avoid. “Stateless” can also mean having no notion of state whatsoever, however, our application must have some way of understanding user and conversation context. The key insight for this stateless solution is not removing state entirely but instead changing how and where state is stored. So I decided to call this paradigm shift Zero State.
Encoding State
When a quiz is generated, the bot has 4 answer choices and it knows which one of those choices is the correct answer to the prompt. The adaptive choice element from adaptive cards allows us to set the text to be displayed to the user, as well as the value to be sent in the JSON response if the user selects that choice. In this case, we set each choice’s value to be a boolean denoting if it was the correct answer choice to the prompt. Thus, we remove the need for the bot to understand the context around the content it’s sending. The adaptive card only sends the user’s response to the bot as a value telling it if the user guessed correctly or not on the previous question, which can be seen on line 21 in the code snippet below.
While encoding the “correctness” of each choice in the adaptive card is a great way to eliminate the need for some context, this doesn’t solve all of our conversation state problems. Let’s say our bot prompts the user initially with an adaptive card asking what category they would like a quiz on. How do we process a response to that card since we cannot pre-determine a “correct” answer for quiz category beforehand?
Zero State
This is where being able to add our own JSON data to the card comes in handy. As seen in the code above, the submit action in an adaptive card can also take additional JSON data, by setting the DataJson property. We use this to set the value of the key type, in this case setting it to QuizResponse. We can use this additional data in the card to effectively keep track of conversation state for us. So we could easily have multiple different values for type, one for the quiz category responses, one for geography quiz responses, and one for math quiz responses.
In our quiz category example, we would set the value of each adaptive choice element as a string of the category name, and the type value on the card to QuizCategoryResponse . Then when the bot receives a response, we switch on the value of type in the response. And if the typeis QuizCategoryResponse, we can go ahead and process the category name.
The code above shows how we can process different values for type, and if each different kind of prompt to the user always sets a certain type value, we can know where responses of that type came from, effectively giving us conversation context. This design could be taken much farther, using elaborate and nested switch statements to know exactly what path a response has taken when we process it. And for me, this starts to sound similar to Redux’s reducers, but instead of parsing out relevant state, this technique envokes relevant code.
Design Shift
A Zero State bot’s biggest limitation is that all responses that require context must flow through a card, as the card is what is being used to store conversation context. However, complex queries can still be handled with this method, albeit somewhat awkwardly.
We can start off a flow with a bot by asking to book a trip. In traditional bot design, this would set off a sub-dialog asking the user what airline they prefer. Any responses to this query would be understood in the context of this airline sub-dialog and sent to LUIS or some other natural language processor to crosscheck the response against known airline entities.
A Zero State bot would instead respond to the initial request with an adaptive card with a text field. The type of this card could be something like airline selection, allowing the bot to process the response accordingly and send the value of the input field to a LUIS model.

Then based on this response, different actions could be taken by the bot. Again, the key is that the card itself is storing some conversation context and sending that along with the next message. Using this method, the bot could easily differentiate between Delta the airline and Delta the sink brand, as long as the context of the question is attached to the card.
In fact, it could be possible to not only keep track of conversation context from one query to the next but throughout an entire dialog. You could use values such as travel.airline.departing to understand that the user is not only responding to a departing time prompt but also that they first responded to a travel prompt, then an airline prompt, and now a departing prompt. You could also keep a record of previous responses to each of those previous prompts in the JSON on the card as well to give your bot complete context.
User State
The above method works great for storing conversation state, but cannot be used to store user state. So how can we store a user’s preferred name, or going back to our quiz scenario, their number of quizzes correct vs incorrect?
Open Extensions
Open Extensions are a way to add JSON data to a resource in Microsoft Graph. And since this chatbot was being used within an O365 tenant, it made sense to leverage an open extension on a user to store user state. Technically speaking, this solution is not database-less, as Open Extensions are backed by a Cosmos DB instance. However, the great part is that this is all managed by Microsoft, so no overhead (or an Azure resource) is added to your application. And open extension storage adheres to data sovereignty laws, so you do not have that additional worry.

These extensions are well documented, and it was a breeze to set up an extension on a user containing the number of quizzes they got correct and incorrect. Quiz settings, such as what category a user would like to be quizzed on, were also stored on the extension. The bot could easily read and write these settings from the user’s extension throughout the conversation flow discussed earlier.
Conclusion
Zero State bots are an interesting alternative to using a database for bot state. Although storing user state without external storage is not really possible, keeping track of conversation context without a database is promising. And while a database is not terribly expensive or difficult to set up, it's also not needed to store conversation state. And aside from projects with strict requirements, I think there is potential to expand on Zero State, and perhaps even create a bot framework that stores conversation state in this manner automatically.
