Continuously Improving Top Tasks
(Chapter 15 from Transform: A Rebel’s Guide for Digital Transformation)
Task Performance Indicator
The Task Performance Indicator gives you a management metric that measures how easy and quick it is for your customers to perform their top tasks. It involves live remote observation of customers as they seek to complete their top tasks. It will give you defensible, trackable data, but often, the most important thing it will give you is video evidence of real customers trying to carry out real tasks. You need to do this because:
a) You get to “see” your customers and thus you have the most valuable resource of all. The videos of your customers as they try — and often fail — to complete their tasks are the raw material of empathy. In a digital world, there is no scarcer and more valuable resource. Remember, there is less and less physical contact between employees and customers, so the normal sources of empathy and understanding are greatly reduced. It’s like with digital your organization is lacking in “iron.” You must take extra “iron” empathy supplements. You must bring the customers’ experience into the working week, into the daily conversations, into the thinking, into the culture, and one of the best possible ways to do that is with videos of customer trying to complete their top tasks. Because data on its own simply does not encourage empathy.
b) You will see patterns of customer behavior as they seek to complete these tasks. These patterns are often the keys to unlock all the other data you have on customers. They help you make sense of all this data, giving you lightbulb moments: “Ah, this is what’s actually happening. This makes sense now.”
When we carry out a Task Performance Indicator, we spend most of our effort in observing the participants and seeking out key patterns of behavior. There are always patterns — typical ways that humans behave in a particular situation. One of your most important skills will be in identifying, communicating and acting on these patterns.
When we identify a major pattern of behavior that is causing task failure, we compile a video containing 3–6 customers who were affected. For each participant we select typically less than a minute’s worth of video that illustrates the pattern. We then edit these 3–6 snippets into a combined video, which we try and keep under three minutes. Then we get as many stakeholders as possible to watch it. I have found that this is the single most powerful way to help the transformation from organization-centric to customer-centric. It has a particular potential to reach and influence senior management. You should seek to distribute these videos as widely and as often as possible. In an age of “digital touchpoints”, that actually distance the organization from seeing their customers, these videos may be one of the rare times when a wide variety of employees will actually “see” the customer.
This is a management model
As we’ll see in the next chapter on how Cisco uses the Task Performance Indicator, this is a customer experience model of management that focuses on customer outcomes. It gives you reliable and defensible data. You will be able to say things like: “This task has a 60% failure rate.” If nothing is done to address the issues and you measure again in 6 months’ time, it will still be a 60% failure rate. You are measuring customer outcomes as they seek to do what is most important to them, and these metrics are repeatable. This might sound simple and basic to you, but most organizations don’t have reliable metrics to truly measure the customer experience.
The following chart shows why we can do this.
In traditional usability testing, it has been long accepted that if you test with between 3 and 8 people, you will find out if there are significant problems. This is true. But what you will not find out is what precise success rate and the time-on-task is involved. As a result of thousands of tests over many years, we have discovered that if you test with between 13 and 18 people, you will get reliable and stable patterns. (The makeup of the test participants should reflect the type of customers that are most important to your organization.)
Why is that so important? Because now you have a customer experience management metric. You can stand in front of management and say: “The Task Performance Indicator (TPI) score is 45. We have an average of a 40% failure rate for all tasks, and for those who are succeeding, it is taking them an average of 4 times longer than our target time.” If nothing is done and you test again in 6 months, you will have pretty much the same figures to communicate. However, if you do make real improvements, you will be able to say something like: “The TPI is now 54. As a result of our efforts, the failure rate has dropped to 30%, and the time-on-task has been halved.” Being able to say something like that is genuinely transformative. So few organizations have reliable management metrics that measure and track customer experience. Believe me, this can be a career-changing moment when management sees you as someone they must listen to because you’ve got the numbers.
How the Task Performance Indicator score is calculated
The Task Performance Indicator (TPI) is a single score that reflects the overall customer experience. The following chart shows the average TPI scores for a particular website. The TPI is 61.
The TPI score is calculated based on the following elements:
1. If every task is successfully completed within the agreed target times, then the TPI will be 100.
2. Task failures reduce the TPI:
a. A time out is where someone takes longer than the maximum time allocated. Most of the tasks we test have a target time of one minute or less. We set a maximum time limit of 5 minutes. If someone goes over that time, we mark the task as a Time Out.
b. If someone says that they want to give up, then that is marked as a Give Up.
c. If someone gives the wrong answer but if the answer is close to the right answer, and they express low confidence in their answer, then we mark it as a Wrong answer.
d. If someone gives the wrong answer and acting on the information in that answer could have serious implications, and they express high confidence in their answer, we mark this as a Disaster. A Disaster will reduce the TPI significantly more than a Wrong answer will.
e. When someone has low confidence even though they have got the correct answer, that has a slight negative impact on the TPI score.
f. We set a target time for each task. The more above the target time someone is, the more it impacts the TPI score.
The following chart shows how the TPI was calculated for a particular organization. At 40, the TPI was not very good, and the chart shows what factors had the most impact on reducing the TPI.
What pulled down the TPI most in the preceding example was people giving up, timing out, and getting the answer wrong. A Success Non-Confidence penalty occurs when people get the right answer but state that they are not confident in it. A Disaster Confidence is where people have got a seriously wrong answer and they are completely confident about it. This has a significant impact on the TPI when it happens. A Success Time Penalty is where they get the right answer but take significantly longer than the target time.
We have developed a rating scale to understand the relative importance of a TPI score. Anything less than 35 is considered critical. In other words, you are giving the customer a horrible customer experience. A TPI of 80 or more is shows that you are delivering an excellent customer experience.
In measuring time we need to start off by establishing what we call the “Target Time.” We must estimate how long it should take to complete a particular task, because, otherwise, the actual time on task is somewhat meaningless as we have nothing to compare it with. The following should be considered when establishing a target time:
1. Establish an ideal navigation path for the task, then measure how long it takes to go down this path.
2. Establish the best practice time for this sort of task. For example, on good websites, how long does it take to change your password?
3. Wait until the testing is finished and then analyze the times, with a particular focus on what the fastest times are.
Setting a target time will require discussion and debate. You don’t have to be absolutely precise. We tend to work in blocks of 5 seconds (40, 45, 50 seconds), because it’s only when the participant is taking twice or longer than the target time that the TPI score begins to get seriously affected. This brings us to the question of how should time affect the TPI. The following chart shows how we deal with it.
As you can see for the line running across and curving down the chart, the TPI doesn’t reduce very much if the actual time is just twice the target time. An actual time that is 6 times higher than the target time reduces the TPI by 40 points. So, if everyone completed a particular task but they took 6 times the target time to complete it, then the TPI would be 60.
Benefits of remote testing
We have found remote testing, where you observe and record the person’s screen and listen to them using a screen sharing tool, is both faster, cheaper and better than traditional lab-based measurement. The key point here is that you observe them in a live setting. In unmoderated remote testing, it is very difficult to ascertain whether someone has successfully completed the task or not, and that is a major disadvantage from a design and continuous improvement point of view. In a 2015 study, Measuring Usability found that while 93% of participants said they had completed a set of tasks successfully, only 33% of these tasks were verified as being actual successes. “The gulf between actual and reported behavior is the topic of many studies in the behavioral sciences, user research and that’s also the case here. It’s no wonder it’s a cliché to ‘watch what users do and not what they say.”
Moderated, remote testing has the following advantages over lab-based testing:
- Faster: You can set up tests much more quickly and more often. It’s a lot easier for someone to give you 1 hour of their time online than for them to spend a morning visiting your lab. Because of this, remote testing can become part of the work-week, not something that is done occasionally. That’s a transformative advantage because it allows you to make customer measurement an inherent part of the work week.
- Cheaper: The cost of setting up a remote test is much lower than setting up a lab-based test. For remote, you don’t have all the costs of a lab, for starters. For a lab-based test, a participant needs to travel. A one hour test can take up their morning. This makes it significantly harder to get participants and you will need to pay them more.
- Better: “Now, please sit down at this computer that’s not yours. And don’t mind me and all my facial expressions as I sit beside you with my notepad. And forget about that camera that is broadcasting your every move to the web team in the next room. And ignore these strong lights and unfamiliar surroundings. Now, what I’d like you to do is imagine you’re at home using your own computer and that I’m not here.” You get better, more real and more natural behavior if the person is actually at home or in their own office, using their own computer with nobody sitting beside them with a notepad. As a result, you will get better, more reliable and accurate task metrics and this is absolutely vital in building your new customer experience metrics.
Remote testing removes a lot of the “noise” that is likely present during lab based testing. However, the one piece of “noise” that impacts most the likelihood of the participant behaving in a normal, natural way, is the task question itself. In choosing a task question, keep in mind the following:
- Based on customer top tasks: You must choose task questions that are examples of top tasks. If you measure and then seek to improve the performance of tiny tasks, you may actually be contributing to a decline in the overall customer experience.
- Repeatable: Create task questions that have a longevity to them and that you can measure roughly every six months. This means you have a management model for continuously improving the customer experience, not just a once-off series of tests.
- Representative & typical — fix it, fix many: Don’t make the task questions particularly difficult. For example, choose questions for one of your most popular products or services, not something unusual that has special exceptions. Then, when you identify and fix the problems brought up by the testing, you’ll be fixing the problems for many other similar products.
- Universal — everyone can do it: Every one of your test participants must be able to do each task, so don’t choose a task question that only a sales person can do if you’re going to be testing a mixture of technical, marketing and sales people.
- One task, one unique answer: Each task question must only have one actual thing you want them to do and one unique answer. Remember, the more specific the task is, the better.
- Does not contain clues: The task question is noise. The participant will examine it like Sherlock Holmes would a clue. So, try and make sure that it doesn’t contain any obvious keywords that when searched with would lead directly to the answer.
- Emotionally neutral, not confidential: Create boring task questions; nothing funny, witty or emotional about them. Avoid asking for any confidential information from the participant, as this may make them uneasy and thus less likely to perform in a natural manner.
- Independent from other tasks: Don’t have a sequence in the task questions. You shouldn’t have to complete task 4 in order to be able to do task 5. This is because we likely to change the order we ask the questions in because if you keep asking the questions in the same order, there is a chance that the later questions will perform better because the person has got used to finding their way around the site.
- Clearly different from other tasks: Don’t ask the same type of task question twice. Otherwise, the person will learn something from the first task attempt and be likely to artificially do better at the second one.
- Immediately doable: Task questions should be immediately doable on the website or app. You shouldn’t have to sign up, for example, and wait for an email confirmation.
- Short — 30 words or less: Remember, the participant is seeing each task question for the first time, so aim for a question that is ideally less than 20 words, and definitely less than 30 words. Otherwise, they’re liable to forget what they have been asked halfway through the task.
- Doable within 2 minutes maximum: Remote testing is not really suitable for long, complex tasks. So, we aim for tasks that have a target time of no more than one minute, and definitely no more than 2 minutes.
- No change within testing period: Choose questions where the content or app is not likely to change significantly during the testing period. Otherwise, you’re not going to be testing the same environment.
Top Tasks for OECD customers included the following:
1. Country surveys / reviews / reports
2. Compare country statistical data
3. Statistics on one particular topic
4. Browse a publication online for free
5. Working papers
Here’s a sample of task questions that were developed based on these top tasks:
1. What are the OECD’s latest recommendations regarding Japan’s health-care system?
2. In 2008, was Vietnam on the list of countries that received official development assistance?
3. Did more males per capita die of heart attacks in Canada than in France in 2004?
4. What is the latest average starting salary, in US Dollars, of a primary school teacher across OECD countries?
5. What is the title of Box 1.2 on page 73 of OECD Employment Outlook 2009?
6. Find the title of the latest working paper about improvements to New Zealand’s tax system.
The Task Performance Indicator is a formal management model that gives you reliable metrics on customer task performance. It should be carried out on 10–12 of your customers’ top tasks on a 6 or 12 monthly basis in order to get an accurate measure of the state of the customer experience.
In your daily work, you should be using as wide a range of metrics as possible — from usage statistics to simplified usability testing with 3–5 people — to figure out that you are going in the right direction. When you get the data back from the TPI, take the big problems that are hurting the TPI score and focus on them. Just test around a specific page, link, form or piece of content, and make improvements there. Immerse yourself in the behavior every day if possible because that is the foundation stone of the new model — never ending, continuously improving.