Diagnosing COVID-19 with Alexa

8 min readMay 31, 2020

Performing an at-home diagnosis of COVID-19 with Alexa, using a question-answer interview similar to that a doctor may give a patient as a primary method of diagnosis.

This application was not in any way meant to provide a clinical diagnosis nor to replace any medical advice.

Alexa giving the patient their diagnosis after numerous questions/answers

Introduction

The Alexa Skills Kit is a collection of self-service APIs, tools, documentation, and code samples that help developers to create Alexa skills. This includes tutorials and code samples on GitHub for both beginners and advanced developers. I used these resources to gain a better understanding of how an Alexa skill works. I also used the Java Software Development Kit to build my skill, making use of the boilerplate code to get me started quickly.

An Alexa skill is composed of three parts:

The Request/Response that is captured or output by the Alexa device
The Skill Interaction Model that converts our speech request into a request that is understood by our backend code.
And the Skill Application Logic which is the backend of our skill.

The Interaction model is the frontend of our Alexa Skill. It takes our speech request and formulates a request comprised of intents. An intent represents something that the user is trying to ask. For example, if a user says something along the lines of ‘that sounds right’, this may be converted into a YesIntent. There is more to learn about the Interaction Model. But for this article, I will stop at Intents.

COVID-19 API

When designing my skill I first had to think about where I was going to get the questions from, and how I would use the answers to those questions to formulate a diagnosis. I found that Infermedica’s COVID-19 API was perfect for what I wanted. This allowed me to fetch questions and post the answer to previous questions. All I had to do was produce a request like the

curl "https://api.infermedica.com/covid19/diagnosis" \
  -X "POST" \
  -H "App-Id: XXXXXXXX" -H "App-Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
  -H "Content-Type: application/json" -d '{
    "sex": "male",
    "age": 30,
    "evidence": []
}'

With the evidence list containing a list of question IDs along with the answer to those questions.

Within this API, there are 3 types of question that we could ask the patient:

single
group_single
group_multiple

A question of type single is a simple yes/no question. A question of type group_single is a list of questions about a group of related but mutually exclusive symptoms, of which the patient should choose only one. And a question of type group_multiple represents a list of questions about a group of related symptoms where any number of them can be selected.

I had to then think about designing my intents to match the responses that a user may reply with when asked any one of these questions.

Application Front End

For the single question type, a user will respond with a simple yes or no, or something that has the same meaning as a simple yes or no. Hence, I have used the built-in AMAZON.YesIntent and AMAZON.NoIntent to catch these responses. Now, whenever a user replies to a question of type single, the InteractionModel will convert this into a request that includes either a YesIntent or a NoIntent.

The same answer will be given for a question of type group_multiple. And so I reused the same intents for both single and group_multiple questions. Whereas for a question of type group_single the user should reply with only one out of the list of related but mutually exclusive symptoms. When designing my application I thought about the many ways that I could implement this kind of dialog. Let’s see an example of a question and how I can expect the patient to answer it:

"question": {
  "type": "group_single",
  "text": "How high is your fever?",
  "items": [
    {
      "id": "s_3",
      "name": "Between 37.5°C and 40°C (99.5°F and 104°F)",
      "choices": [
        {
          "id": "present",
          "label": "Yes"
        },
        {
          "id": "absent",
          "label": "No"
        }
      ]
    },
    {
      "id": "s_4",
      "name": "Greater than 40°C (104°F)",
      "choices": [
        {
          "id": "present",
          "label": "Yes"
        },
        {
          "id": "absent",
          "label": "No"
        }
      ]
    },
    {
      "id": "s_5",
      "name": "I haven’t measured",
      "choices": [
        {
          "id": "present",
          "label": "Yes"
        },
        {
          "id": "absent",
          "label": "No"
        }
      ]
    }
  ],
  "extras": {}
}

This is an example question given in the response from the /diagnosis API. As you can imagine, in a conversation between a patient and a doctor, the patient may reply with:

‘My temperature has been below 40 degrees’
‘My temperature has sometimes been greater than 40 degrees’
‘I’m not sure’

However, it is difficult if not impossible to implement this sort of interaction with Alexa because this would likely mean that I would have to know the questions before asking them. Since the questions are retrieved from the /diagnosis API and are subject to change, I do not know the questions before I ask them.

I could not find a way to implement this type of response handling, and so instead I opted to give each separate option a number. Then I would expect the patient to reply with ‘one’, ‘two’, or ‘three’.

And so to handle this type of response I created a custom MultipleChoiceIntent which is created when a reply like the above is heard.

Apart from the intents needed to handle the answers to questions that I have already asked, I also needed an intent that would be flagged when the user wishes to begin their diagnosis. I called this BeginDiagnosisIntent. This intent has 2 slots (slots can be thought of as variables). They hold the age and the gender of the patient, as I need this information to begin the diagnosis. I will talk more about how I used this later on.

Dialog Interface

A dialog manages a multi-turn conversation between the skill and the user. This can be used to ask the user for the information you need to fulfill their request without writing any code. By delegating the dialog to Alexa, I allow Alexa to ask the questions that it needs to. The dialog is configured inside of the Interaction Model, which has already been linked above.

Inside the dialog model, I need to define each intent. Since I only want to use the dialog for the BeginDiagnosisIntent I have only defined that intent. Within the intent I defined the intent slots, specifying whether confirmation is required (‘you are twenty one, is that correct?’) and whether elicitation is required (should the dialog be used to fill the slot?). I can specify a list of prompts that can be used by Alexa to ask for a slot value. And I can also specify some validation factors that can be applied to the patient’s answer to check that they have replied with a valid slot value (such as age isLessThan 110) If the user then answers with an invalid value, Alexa will prompt them to say something more sensible.

Alexa handling the dialog to find the patient’s age and gender.

Application Back End

Now that I have designed the frontend interaction model, I can start thinking about the backend. In the backend, handlers define how each separate intent is handled. The first intent handler that I wrote was the BeginDiagnosisIntentHandler. This handles the overall conversation flow between the Alexa device and the patient. Let’s see how this works:

This class implements the IntentRequestHandler class, and hence implements the canHandle() and handle() methods. The canHandle() method defines the intents which this handler can handle. I define this to be the BeginDiagnosisIntent.

In addition, the handle() method defines what operations will be executed when handling an intent of type BeginDiagnosisIntent. The general flow of the handle() method is listed in the following bullet points:

Check to see if a question has already been asked (and answered) by looking in the session storage.
If a question has already been asked, collect the evidence (knowledge that I have gathered from the patient).
If a question has not already been asked, delegate the dialog to Alexa so that I can gather the information needed to begin the diagnosis (patient’s age and gender).
Once I have all of the information needed to make a new request to the API, make the request.
If the should_stop flag in the response is true then I extract the diagnosis from the response and return it to the patient.
Fetch the question type and question.
Depending on the question type, handle the question in the correct way, asking the question so that the intended handler catches the response from the patient. In order for the handler to do its thing. I also need to store the question that was just asked, in the session storage.

Now I can begin to look at both the YesNoIntentHandler, and the MultipleChoiceIntentHandler. The YesNoIntentHandler handles both the YesIntent and the NoIntent, and this is specified in the canHandle() method.

The YesNoIntentHandler checks to see if a question has been asked by looking in the session storage. If no question has been asked, I begin a help dialog using the HelpIntentHandler . If a question has been asked, I get the id of the question and save the id along with the patient’s answer (yes or no) in the evidence list that is also in the session storage. I then check to see if there are any more questions to ask inside the current question (A question of type group_multiple has multiple yes/no questions inside of it). If so, I ask the question. When all questions have been asked, I move back to the BeginIntentHandler to continue controlling the conversation flow.

The conversation flow of a group_multiple question.

The MultipleChoiceIntentHandler works in a similar way, collecting the asked question from the session storage, saving it along with the answer into the evidence list. But since each group_multiple question only has 1 question, I can immediately return to the BeginDiagnosisIntentHandler .