Validate the UX of your new product with just one user research routine

Published in

Doctolib

6 min readJul 15, 2021

How our product team created a process to gather quantifiable feedback on usability (benchmark) and improve the UX from the early stages of product development.

For the past 10 months, a dedicated team has been developing Doctolib’s future Practice Management System (PMS) for the German market. In this domain, product managers, designers, UX writers, developers, product strategists and user researchers pursue 3 crucial missions: deliver comfort of use, time-efficiency and patient-centered care to German practices.

The domain is split into 6 feature teams, each led by a PM responsible for the development of their scope(s). Although feature decisions are based on user research (field studies, usability testing, surveys, etc.), how do you ensure that the user flows developed by different feature teams are consistent?

One year before product launch, we urgently needed feedback from our users on the overall existing UX to identify and address potential issues as early as possible! But it all started with many hurdles…

❌ our PMS was not ready yet for autonomous testing in a practice setting.

❌ the Covid-19 Pandemic has forced us to test exclusively remotely.

❌ we had only 2 available user researchers.

❌ we needed a large sample of participants — both healthcare practitioners and medical assistants (MFA) — and recruitment is always tricky.

Check out how 18 colleagues teamed up for the sake of user-centricity, carried out 36 hours of tests over 2 afternoons and built an iterative process to improve UX throughout product development!

Laying the foundation for a usability benchmark & tracking UX by using the SUS

We needed to get quantifiable feedback about the usability of 4 user flows related to fundamental practice workflows covered by the medical assistant (MFA — Medizinische Fachangestellte) and the HCP (healthcare practitioner). The test scenarios involved various test patients’ care journeys starting from their arrival at the practice. Each test patient had varying characteristics depending on the user flow (i.e. publicly vs. privately insured, new vs. recurring patient, medical conditions).

Research goal & methodology

The behavioural component — that is observing without interfering (cf. what people do in the article Performing user research during the COVID-19 pandemic) — was crucial for us to understand the natural touch points between HCP and assistants along their workflows. Thus, we opted for an unmoderated usability testing format and applied concurrent think-aloud (CTA) methodology. This means that we would not interact with the users once they have read the instructions for their respective tasks and the participants were encouraged to verbalize their thoughts as they performed the task.

This end-to-end test was also meant to be the first iteration of a quarterly routine. We aimed at metrics to be monitored (benchmark) for the different flows in order to track our product’s usability and UX over time (see example in the image below).

We chose measuring usability metrics for each flow (success rate, time on task, complexity assessment) and administered the well-known System Usability Scale (SUS) as post-test assessment for the usability of the 4 user flows. We aimed for a minimum sample size of 35 participants as scientific literature recommends.

*The quarterly tracking of the metric “success rate” for a specific task enables measuring UX evolution and adapt accordingly our product’s developments (merely illustrative).*

The System Usability Scale (SUS) measures the usability and the learnability of a product. Users had to rank each of the 10 assumptions of the SUS on a scale from 1= “strongly disagree” to 5= “strongly agree” (read more details in this article).

During the test all quantitative information was reported thoroughly by the note-takers in a single global Google Sheets observation matrix.

Testing protocol & environment

We had 5 groups each consisting of 3 team members. The experience of the moderators in user research was quite variable. Thus, the research protocol was crucial to guarantee an identical testing context and user researchers drafted explicit and detailed discussion guides. All participants received 4 simple test scenarios with corresponding patients’ mock health cards with the needed data (name, birthdate, insurance type, etc). Each participant was given exactly the same information and accessed the test software from their practices in their usual set-up to perform the tasks.

Recruiting participants & training team members

The recruitment task force lasted one month with support from our User Research Ops colleague. In parallel we counted on Doctolib pioneers who are involved in products’ co-creation (see article Performing user research during the COVID-19 pandemic). The participation of medical assistants was fundamental as they carry out both medical and administrative tasks in the medical software. So we put an extra effort to reach our sample target.

Collectively extracting the key-insights & making them actionable

The analysis and reporting was the user researchers’ responsibility. As time did not allow us to read through notes or visualize 36 hours of interviews, we organized two workshops with specific goals.

Workshop 1: Collecting, discussing and ranking

Insights were gathered asynchronously, moderators and note-takers could indicate having observed them (see example in the image below). During the session, insights were discussed, clustered and voted upon according to their UX value (1= disturbance but software can be used / 5= blocker that is detrimental to usage) so that each insight received a score.

*The insight card displays the UX value score of the insight.*

Workshop 2: Relating UX value to required tech effort

Insights were ordered on a vertical axis based on their UX value score and we discussed the effort required to solve them in order to place them on the horizontal axis. The criteria defining the effort were as follows: impact on finished product-parts, level of collaborative work required, men-hours of design & software development.

*The mapping of the insights on the UX value / effort matrix makes them actionable .*

Insights were now actionable according to their category in the mapping:

🏆 Quick win: high value / low effort

🔍 Strategic topic: high value / high effort

❓ Could have: low value / low effort

⛔️ Disregard: low value / high effort

Next steps & learning: improving the process for Q3

“I feel that we learnt much more than we expected!”

The value of this process was acknowledged by all team members (see the quote of a PM above) and we defined the main following next steps:

Address the quick wins in order to test the improved iteration during the end-to-end tests at Q3
Size the strategic insights accurately and include them in the product roadmap
Share knowledge about the approach and methodology with other domains throughout Doctolib

This study enabled us to set the basis for an accurate UX tracking and we are eager to measure the impact of our recommendations on the iterated software-parts during Q3 tests.

For the next session, we will minimize bias by implementing the following improvements: take extra-time before the tests to remove bugs, add up test data into the mock testing accounts to get closer to the real software’s visuals, select only HCP and MFA participants who work in the same practice, disqualify participants who have seen some software parts in previous tests.

Thanks again to all team members involved in this exciting project, amongst others our lead PM maud pennaneac’h and our designer David Brandau :) !

Did you like this article? Go subscribe to our tech newsletter for more!

Sources:

https://www.nngroup.com/articles/measuring-perceived-usability/

https://www.nngroup.com/articles/quantitative-user-research-methods/

https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html