Zillow’s Zola

Published in

Exploring UE Prototyping Techniques

6 min readNov 24, 2020

Sprinkle in a little magic…

About Zola

Zola is a voice assistant for Zillow, an American online real estate database company. Zillow’s Voice Assistant allows customers to search for new homes hands-free and eyes-free. Available for access with Zillow’s new mobile application update, customers will be able to activate it on all voice assistant-enabled devices.

Zola helps customers navigate through an endless number of properties, regardless of whether or not the customers have a specific location in mind. It narrows down the search results by prioritizing the customers’ dream home characteristics and excluding properties based on constraints. After narrowing down the search results, Zola will then send the customers the results via email.

Purpose of Investigation

The purpose of this investigation is to measure the feasibility and desirability of creating a Zillow voice assistant to help customers narrow down their search results. Are there benefits for using a VUI that screen-driven interactions cannot offer? For this prototype, we focused on testing the VUI’s response flow and error messages. Aside from the resulting email, we excluded all visual features.

As stated above, customers will be able to access Zola on all voice assistant-enabled devices with the latest Zillow application. We have assumed that customers will want to look for new homes in their current living environment, thus we decided to test our user in their current home.

Dialog Flow Chart

Access the flow chart here.

Testing

Testing Zola with a participant.

Our 3-person team delegated the roles so that Jennifer would facilitate, Kevin would be the wizard, and Malik would be the scribe.

To achieve the wizardry, our test utilized a chatterbot called “Mumbler” that Damien had mentioned during his presentation. The Mumbler allows for very finite customization of the dialog (e.g., pitch, rate, and volume) along with a wide array of voices, including non-English ones to choose from. We ultimately settled on the “Joanna” voice from Amazon Polly. The Mumbler can be accessed here (Intuit account required): https://expo.futures.a.intuit.com/mumbler/.

For the test, we gave the user the following scenario: You just got a job offer that pays $120,000 a year. The job is a full-time work-from-home. Now, you’re thinking of moving to your dream home. So, your task is to use Zillow’s voice assistant to help you look for your dream home.

While the facilitator followed a script to introduce the participant to what’s happening, receive consent for recording, and introduce the above prompt, the wizard pulled up an animated image on their phone to show to the participant as if the animated image was the ever-listening VUI. In addition, the wizard used a laptop and desktop computer setup to allow for enough room to access 3 different windows during the test — one with the dialog flow chart, one with the Mumbler, and a final window with the tabs needed (Google Doc/Images, Zillow, and MailChimp) to create the final e-mail sent to the participant.

As the participant continued through the dialog, the wizard would then copy and paste Zola’s scripted response from the dialog flow chart into the Mumbler to play out loud. The wizard also noted down specific attributes the participant had said previously to confirm in later dialogs, such as location, price, or size. At the conclusion of the test sequence, the last prompt was for Zola to ask for an e-mail address, which would be used to send a mid-fidelity email prototype to the participant.

At the end of the test, we conducted a debrief session with the participant to gauge their experience with the “VUI.” One of the aims of the debrief session was to identify the desirability of finding a home using a VUI while the other aim was to determine if we were able to successfully mislead our participant into thinking Zola was a fully functional VUI.

Analysis

As revealed during the test and during the post-test debrief, there were a few hiccups in the testing process. By far the most glaring problem was the slowness that occurred with copying and pasting script from the dialog flow chart into the Mumbler, editing the fill-in-the-bracket portion of the response, and then playing the Mumbler. The time it took to go from receiving the participant’s input and then giving the Mumbler’s output created awkward silences, and since we didn’t have a visual component to the VUI as part of the wizardry while conducting the test, there were moments in time where the participant didn’t know if the VUI received their response or not. In the future, we could implement a few tactics to mitigate the delay. For one, we can try to have a second wizard who can begin prepping for the next sequence of dialog while the other wizard works on the current dialog. The wizard(s) could also have multiple windows for the Mumbler already open and with the dialogs pasted in to reduce the time it took to copy dialog from the dialog flow into the Mumbler. Lastly, we can have an audible beep, continuous audio cue, or visual component to show the participant that the VUI was “processing” the request. Luckily, we were able to make an excuse for the slowness during the test by saying the VUI was running into unexpected lag.

Regarding the dialog flow for our VUI, it was a success. We did not have to deviate from the intended fill-in-the-bracket flow (aside from one minor grammatical change) and our participant was surprised at how the VUI was “asking like the exact right questions like very very nuanced questions.” In fact, our dialog flow was able to provide direction to our participant and avoid responses that the VUI could not handle. The dialog flow heavily relied on asking questions that would produce responses containing adjectives or nouns that could be plugged into the next part of the dialog, and it worked. The most challenging part of the script was with the email address collection, where it was not easy to get the correct address the first time around. A more viable solution if there’s no preset email address attached to the device might be choosing a phone number for a message to be texted to, which is easier to verbally confirm than an email since phone numbers generally deal with individually-enunciated numbers only while email addresses can include a combination of letters, numbers, and punctuation.

In the end, we were still able to fool the participant. One thing noted during the debrief was that while the wizard was scrambling to copy, paste, and edit the dialog, there was a bit of click-clacking of the keyboard. The facilitator was able to help mute and hide the wizard’s video as they scrambled to get the dialog played back to the participant as quickly as possible. Our participant didn’t think much of the typing noise since the wizard was introduced as the notetaker — that part of the script probably helped a lot with maintaining the wizardry!

We learned a lot from our Wizard-of-Oz test, including the importance of more realistically piloting the study (e.g., not just reading from the dialog flow with our voice, but using the Mumbler during the pilot), which would have revealed more problems that needed to be addressed before the session with our participant. Overall, it was a fun experience and we’ve been able to collect much insight into conducting a future WoZ session!