by Nicole M. Hill
This work was a collaboration between Ling Hu, Dr. Kevin Nolan, and Dr. Nathan Carter. This piece is based on a presentation at the Strive 2019 Conference called, ‘Measuring trust when conversion isn’t enough’ by Ling Hu and Nicole Hill.
We had an all too familiar problem at Groupon. We wanted to improve our ratings and reviews platform to help our customers make informed purchasing decisions. We had a roadmap of new features and we went about methodically testing them through qualitative research and AB testing. The problem was, although one of our features was addressing a user need, during AB testing it negatively impacted conversion. The team felt that this feature was the right thing to do for our customers and we believed that focusing on conversion short-term would damage our relationships with customers long-term. Ratings and reviews are only helpful when they are information-rich, reflect the variety of experiences and perspectives both good and bad, and set the proper expectations for prospective customers. Over time, anything short of this will cause customers to lose trust both in Groupon merchants and Groupon itself. But if we were going to boldly move forward and do what’s right for the customer, we needed a way to measure the impact of that change on our customer, and that led us to measure consumer trust in Groupon’s ratings and reviews platforms.
So how do you measure trust?
This is actually trickier than it seems at first glance. Qualitative research is a great starting point for understanding the reasons why our customers trust and distrust a platform. There are many ways to do this, from intercepting and interviewing customers in context while shopping, to usability testing and compelled shopping studies. But then what?
At Groupon, we couldn’t just ask customers how much they trust us on a Likert scale because we understood that trust is multifaceted and each customer would answer this question differently. For example, some might be thinking about 1) how much they trust that the people submitting reviews are real customers, while others might be thinking 2) how much they trust those customers to accurately review their experience, and lastly, others may be thinking 3) if they trust Groupon to present those reviews in an unbiased fashion. Each of these perspectives is different and there are even more interpretations of ‘trusting our platform.’ Fortunately, due to my background in academic psychology, I was already familiar with a field concerned with precise psychological measurement: Psychometrics.
Psychometric scales are surveys developed through a rigorous process to ensure they are statistically valid and reliable. In other words, that means you are ensuring that you are measuring what you think you are measuring, in this case, trust, and you can reliably do so. It’s like creating a thermometer that can accurately measure your temperature at different times and tell you when you have a fever. Although you may not have heard of this approach, you probably have encountered it if you’ve ever taken a personality test, skill or aptitude tests, or a survey for depression or anxiety.
How do you make a psychometric scale to measure trust?
For the remainder of this article, I will demystify the psychometrics process by using a metaphor, creating a psychometrics scale is like creating a tailor-made tuxedo*. In part 2, I’ll explain how psychometrics scales can be applied and make the case for why companies should be measuring trust.
Psychometrics is a powerful approach that can be used to measure various attitudes and beliefs like satisfaction and delight, just like a tailor can make any number of garments such as a couture coat or a gown. This is a bespoke process that requires considerable resources and expertise so it’s not appropriate for every project. If your project requires a quick and dirty measure, such as making a tote bag, this is not the right approach because I recommend bringing in psychometric experts who understand how to create conceptual models, can help with writing scale items, and understand the statistics required to construct your scale.
A four-step process to create our trust tuxedo is outlined below. For a detailed explanation of this methodology, see Hinkin (1998).
Step 1: Conceptualization
First, you must model your psychological construct, in our case trust, which is similar to sketching the style and cut of your tuxedo. In practice, conceptualization involved creating a model of trust in Groupon’s rating and reviews platform. Our initial model had two dimensions: trust in Groupon and trust in reviewers. These dimensions were further broken down into three sub-dimensions, competence, integrity, and benevolence. Our model was informed by academic research on general trust and trust in the e-commerce domain. In addition, qualitative UX research was also used to construct our model to ensure that we captured the Groupon-specific experience.
This step should not be rushed — while this approach is flexible if you miss an important dimension of your model, you might be forced to start the process again. This is similar to forgetting to add the buckram, the stiff canvas interfacing material that allows the satin lapel to hold its shape. Without it, the lapel falls down and ripples and you have to open up the jacket and begin the process again.
Step 2: Questionnaire Development
Once you have a conceptual model you are ready to start writing survey questions. This is similar to gathering all the fabric and notions to construct your tuxedo. You have to write many survey items. Here the goal is breadth and depth. Be sure to follow the best practices of survey writing, such as using clear and concise writing and avoiding double-barreled statements. But you do want to include some redundancy in your items because it boosts the reliability of your scale. Use a thesaurus to ensure that you have variability in word choice (e.g., truthful, honest, sincere, genuine) while writing conceptually redundant or similar survey items. A good rule of thumb is to write 15–20 survey items for each sub-dimension of your model. In our case we wrote 130 items. Don’t worry, your final scale will be much briefer as the next two steps involve editing your tuxedo.
Step 3: Content Validity Assessment
So how do you know if your survey items truly capture your intended construct (i.e., trust) and are interpreted in the same fashion that they were intended? In this step, we relied on experts in psychometrics and experts in Groupon (i.e., our customers) to determine if we hit the mark. This is similar to consulting a master tailor to help you determine if you made appropriate fabric choices for your bespoke tuxedo.
We tested the survey items that the project team thought were most likely to be misinterpreted by customers. Groupon customers were recruited and asked to explain each item in their own words. Items that were misinterpreted were eliminated or revised and retested. Next, we moved onto the formal content validity phase which involved psychometric experts (i.e., graduate students trained in psychometrics) evaluating each statement for fit to our conceptual model of trust. The evaluators were first trained on our trust conceptual model, and then they had to map every item to one and only one sub-dimension. For each mapping, they indicated the extent to which they agreed (on a Likert scale) that the item was relevant or reflected the specified sub-dimension. Questions with less than 70% inter-rater agreement or less than a 4 out of 5 relevancy score were cut from the scale.
Step 4: Construct Validity Assessment
The final step involves two rounds of data collection with the items that remained after the content validity assessment. The goal of this step is to use statistics to construct the final version of your scale by cutting out items that don’t hang well with your model or items that statistically map to more than one dimension. This is similar to when a tailor drapes the fabric to the body and begins to construct the garment.
This is a bottom-up data-driven approach in which factor analysis is used to find patterns and cluster together related survey items. Conceptually, factor analysis is similar to what we do as qualitative researchers when affinity diagramming. The researcher must interpret the meaning of these sub-scales (i.e., survey items clustered by factor analysis), just like the researcher must identify the themes represented in qualitative data.
This process happens twice. First, the survey is deployed to customers and exploratory factor analysis is used to group questions in sub-scales and cut out any questions that fail to cluster. Then the revised scaled is deployed a second time to another group of customers and confirmatory factor analysis is used again to confirm the structure with this new population. Once again the researcher must interpret the meaning of any sub-scales that the model identifies.
At that point, we’ve finished constructing our tuxedo! Our psychometric scale of customer trust in Groupon’s ratings and reviews had 12 questions distributed over 3 sub-scales: Trust in Groupon, Trust in Reviewers, and Trust in Content. This process also results in a mean baseline score for each survey item, which allowed us to determine which areas our ratings and reviews platform excelled in and which areas were lagging behind.
In part 2, I’ll make the case for measuring trust and you will see how we are applying this scale.
I encourage researchers to reflect on what other attitudes or beliefs would be valuable to measure? Have you created a psychometric scale for your organization? If so, what kind and how are you using it?
Please leave your ideas and questions in the comments section.
I would like to acknowledge Dr. Nolan for creating the initial tuxedo metaphor which we have expanded on for this article and during our conference presentation.
Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey questionnaires. Organizational research methods, 1(1), 104–121.