P6: Behavioral Prototype (Wizard of Oz)



For this week’s assignment, we were given the task to work with a group to experience using behavioral prototyping techniques to explore a user interaction scenario. We were given three different scenarios and we chose the speech to text dictation. An application for voice recognition of spoken text, targeted at creating work documents.

Project Design Concept

We tested real-time user feedback for speech to text software where the user can use voice commands to create a work document. We created a list of commands and a cheat sheet that displays all the commands. The user spoke into the computer with the commands and created and edited an entire work document without having to use the keyboard.

This is a similar concept to voice to text software such as Siri on the iPhones. However, it will be more advance by being able to edit the document with features such as deleting, bolding, italicizing and formatting.

Wizardry Plan

We used Google Docs to simulate the real-time updates as the user spoke. Google Docs has a feature where you can hide the header to make it seem like a regular document (to make sure the user does not know they are working on Google Docs). We also created a new Google Docs account so when the operator is typing in another room on the Google Docs it displayed “Genus Text” as the username instead of the persons name.

We also planned on having the user speak into a microphone connected to headphones that will be on call with our “wizard.” In order to give commands that are not to be typed out, the user will say “Genus” before each command.

User Test

We ran the user test in study room 314 in Odegaard library. Sharla was the moderator who spoke directly to the user. Kailey was the operator who gave real-time feedback as the user spoke. The operator typed out the text or responded to the commands given as the user spoke. Kailey was sitting on the second floor of Odegaard and had headphones in to be able to hear the participant. I was the scribe who documented the entire user testing process as well as take notes of the process.

We used Sharla’s phone stand she created for an earlier project and used my iPhone 6s to record the user study. We used Sharla’s phone to call Kailey and put the phone on speaker and placed it face down on the table.

We gave the participant a consent form to record and use the footage for our user testing video.

Consent Form

We also gave a typical introduction to a user study so the participant was aware of what was going on. We made sure to let the participant know that we were testing the system and not them so any positive and negative comment would be helpful. We also made sure to remind the participant to always think aloud so we could follow her thought process. Then we gave the participant the tasks.

Below are the list of tasks we gave the participant to accomplish:

Task 1: Please interact with Genus to type the paragraph in front of you.

IN a house besieged

“In a house besieged lived a woman and a woman. From where they covered in the kitchen the man and woman heard small explosions. “The wind,” said the woman. “Hunters,” said the man. “The rain,” said the woman. “The army,” sakid the man. The woman wanted to go home, but she was already home, there in the middle of the country in the house besieged.”

Task 2: This task will involve a few different steps.

  1. Please interact with Genus to delete the first sentence, and then undo the action
  2. Please interact with Genus to select the last sentence, and then italicize it
  3. Please interact with Genus to select the title, and bold it

Criteria for Success

We decided to measure success by whether the participant believed that the program works. We also want to gather information on whether the participant felt comfortable using our “program” and if it felt natural. We did this by asking the participant post test questionnaires.

Post Test Questionnaire

  • How usable was this product?
  • How did you feel while using this product?
  • Did you enjoy using this product?
  • What are some of the issues that came up while using this product?
  • Would you use this product again in the future?

We expect there will be minimal errors (less than 10 in a test session) by the “program.” We also expect there will be error by the user in speech as they are attempting to speak the designated paragraph to the program. We knew that there would be some limitations when it comes to formats, views, etc. so we kept our user tasks simple, to get the best results.

Gathered Data

We recorded the user as they used our program, with a iPhone 6s camera. We also had the ability to track the changes that are done, through our use of Google Docs, meaning were able to track the number of edits that were successful on the first try for the users. We also tested the number of errors that the user speaks, how often they need to erase the last words, and when they needed to navigate back into their sentences.

The Room that the User Study was Conducted In
Operator on the Second Floor of Odegaard Library

Genus User Study Video Demo

User Study 2 Min Video Demo


What Worked Well

As you can see in the user study video demo above, the participant completely believed that we created Genus. She didn’t suspect once that we were using Google Docs or that there was someone in another part of the library listening to her commands. She was very impressed and got really excited when she was able to use the command cheat sheet to accomplish the tasks without using the keyboard once. The tasks we created for the participant was easy to follow which made testing the system a lot easier.

Sharla’s phone stand worked well for filming because of the limited number of people and equipment for this project. Placing the iPhone that was on call with the operator face down on the table also worked really well. There were so many phones out and around that the participant did not notice at all that the phone was on call with the operator.

When the participant would get really happy and say “yes!” or think she made a mistake and said “oh no, I’m so sorry.” the operator on the phone could not hear her because she would whisper it under her breathe. Therefore, the operator never typed those comments out. At first we were nervous that the participant would notice that those phrases weren’t being typed out and become suspicious but she never suspected anything.

The operator made two typing errors but the participant did not think it was strange because she thought it misinterpreted what she was saying. She didn’t think it was the systems fault. She also noticed that it would have to catch up to her speaking speed but also was not bothered by it. She would not start the next sentence until she noticed that the entire first sentence was typed out.

Future Improvements

Filming Techniques:

Next time renting out two to three different cameras to be able to place around the room for different angles would have been good. It would also have been good to have a tripod. Or to use quicktime or Morae software for screen capture. We only had three people on the team that it was hard to keep track of the video recording. At one point the iPhone had slid and you could not see the participants face. I (scribe) was too busy taking notes and Sharla the moderator was in mid instruction that we did not pay attention to where the video was pointing.

Video Editing:

Next time, it would be good to incorporate what the operator, scribe and moderator were doing during the user study. To incorporate a split screen to show a screen capture and the participants actions at the same time to display real-time feedback. Also, to have a more distinct ending to the video.

Clearer Script:

The script we found online for the participant to read was a bit too complicated. The participant’s first language was not English so she had a hard time pronouncing some of the words on the script. This made the participant focus more on the word and be embarrassed rather than the functionality of the software Genus. It is important to take language barriers into consideration when creating a speech to text software. For example, when Siri on an iPhone first came out, my mom with a strong Chinese accent bought it to use Siri. However, at the time the feature was not developed well enough that Siri could not understand my mother’s accent. Therefore, there was not reason for my mother to buy the new version of the iPhone.

Technical Complications:

At the beginning of the study, “Genus” could not hear the participant well enough to type it out on the computer. We then removed the headphones and use the speaker feature on the iPhone and had the participant speak more loudly. We should invest in a mike that connects to the phone so “Genus” (operator) could hear the participant better without the participant having to shout at the computer.

Typing Speed:

We told the participant during our introduction to speak clearly and slowly but the operator still could not keep up with the participants speaking speed. She was a fast typer but we will need someone who is extra fast or have the participant speak even slower next time.

Clearer Command Cheat Sheet:

The command cheat sheet made for the participant was a little confusing. The participant glanced at the cheat sheet and did not realize there was an example on it. For example, it said “First say ‘Genus’, then say ‘Select followed by the the word or sentence you would like to use the following commands, then say the command.” After the participant saw this example, for the first half of the study, she would say “Genus Select” then the command she wanted to use. For example, she would say “Genus select delete first sentence.” When she should have just said “Genus delete first sentence.” After a while she noticed that she did not need to say “select” each time and got embarrassed. We should have had a clearer cheat sheet so the participant doesn’t feel like it was their fault.


Overall I think the program Genus was very effective. The participant said she would use the software if it were to be implemented. The Wizard of Oz testing worked very well in the sense that the participant was completely “tricked” into thinking that it was a developed software.

Show your support

Clapping shows how much you appreciated Sabrina Weschler’s story.