Imagine that you have a porcelain store. One day a piercing scream rings through the shop, followed closely by that feared sound of things breaking on the floor. When you get to the place, you find an elderly sad-faced lady in the midst of a sea of broken pieces that once were your most exclusive tea set. You have somebody immediately remove the pile of broken fragments and start looking for the cause of this unfortunate event. You notice that the carpet has come loose in the immediate vicinity of the accident, creating a small wrinkle. You’ve never noticed this wrinkle before, and at first glance it actually appears to be rather inconspicuous. But apparently it was enough to make an older lady stumble. What can you do now?
Possibility No. 1: You have the carpet repaired and wait to see if after the repair another customer ends up in the same place in a sea of broken pieces.
Possibility No. 2: You wait to see if at least 36 other customers have the same accident so as to observe a representative number of cases and be able to say with statistical significance and absolute certainty, that there is indeed a problem.
The purpose of user testing is to observe the behavior of people who use your product, identify usability problems, fix them and thereby improve your product. As mentioned, we are not concerned with the opinions of individual persons, and therefore we don’t need a statistically relevant number of test participants. In user testing (as in focus groups or interviews) we’re dealing with a qualitative method (observing a behavior) and not with a quantitative method (collecting opinions). During user tests, we simply observe our test participants while using our product and try to identify the cause of the problems our product may have. And that’s something usually quite easy, once we become aware of the problem.
Qualitative Data Is the Key to Improving Your Product
Product teams often shy away from qualitative methods because they don’t seem to be “sufficiently scientific”. After all, a bar graph with percentages in a PowerPoint presentation looks a lot “truer” than referring to a single test participant who had problems in the most recent user test. But how much does quantitative data actually help you if your goal is to improve your product? If you know that 85% of your test participants complete the checkout process in under five minutes – is that a good or bad thing? For this quantitative data to be of value, you would need to know comparative values and, for example, test other web pages or different variants of your product. And this effort is usually not effective:
If your goal is to understand human behavior to make design decisions, qualitative methods are much more effective in providing you with the information you need.
– Kim Goodwin in “Designing for the Digital Age”
For this reason, we also recommend to not lose time measuring usability. Our goal is to observe human behavior when interacting with our product, thereby identifying errors in our solution and understanding the cause of usability problems. Sometimes you need two, three or even more testers to actually identify the problem, but if after just one test participant you’ve watched you can say for sure what the cause (carpet wrinkle) of a problem is, then you can fix this problem immediately and check if it’s actually been solved with another round of testing. And for such a procedure, a handful of test participants per round suffices.
Okay! but How Many Test Participants Are Actually Enough?
We were afraid that this answer wouldn’t seem to be enough for you. Therefore, let’s delve a little deeper into this because the perfect number of test participants is a popular subject of discussion among usability experts. Steve Krug, the author of the usability bestseller Don’t make me think!, recommends testing with three test participants per round.
His reasons include:
- Finding three test participants is less work than searching for more;
- Doing more than three tests on a single day means getting snacks for the people who moderate the tests and also for those who watch the tests;
- If you test with three test participants, you can do your tests and present the results on the same day.
Following Steve Krug’s quote, many people plunge into a user test with three testers. If Steve Krug says it, what can go wrong? Honestly? Not much. Nevertheless, let’s clarify a few things that are often overlooked when following Steve Krug’s suggestion:
Steve Krug recommends three tests per round. Per round means that we test more than just once. He also refers to moderated and onsite user tests in his recommendation, so many of his arguments are based on the extra effort that further tests of this kind would cause. After all, in these moderated tests, somebody has to find the right test participants, plan the test dates jointly with them, and then conduct and moderate each single test. While we strongly recommend to do such moderated user tests ourselves, every single test participant causes a lot of extra effort with this setup. Since we describe in this book how user testing really works in practice and suggest unmoderated remote user tests, this argument doesn’t apply in our case, because the search for new test participants is typically taken over by the user testing platform and the test participants moderate themselves. Finding further test participants doesn’t necessarily mean much more effort with our approach.
In addition, Steve Krug started doing user tests more than 20 years ago. His book, where the proposal for three testers we mention comes from, is already more than ten years old and was published before the iPad was launched. There has been a tremendous amount of technology innovations in recent years and we now have to make sure that our digital products can be used on desktops as well as on smartphones, tablets, smartwatches and other devices. Thus, we have to deal with a lot of additional influences that affect the behavior of our users. After all, you use an online store on your smartphone (short sessions, often interrupted, rather than browsing) differently than on your desktop computer (longer sessions in a row, often with the goal of completing the purchase). Thus, only three test participants are not enough for our requirements.
The Law of Diminishing Marginal Returns
If we continue to search for the perfect number of test participants for user testing, we’ll soon stumble upon a curve by Jacob Nielsen – another well-known name in the context of usability. In 2000, Jacob Nielsen published a statistic that has since been circulating in the specialized circles:
On the X-axis of this graph we see the number of test participants, while the usability problems are displayed as percent on the Y-axis. What’s immediately striking is that the number of detected usability problems rises sharply up to the first three test participants. Thus, up to the third testing user, each test also shows many new usability problems. The number of new discovered problems decreases sharply after the fifth test of a round, and at the latest, after the twelfth test you’ll discover virtually no new usability problems. What we have here is the law of diminishing marginal returns. At some point, new input (new test participants) only provide very little additional output (new usability problems). The perfect cost/benefit ratio in user tests is thus somewhere between the third and sixth participants in a testing round. What Jacob Nielsen is showing us with this graph is that from an economic point of view, it makes more sense to test with a smaller number of participants (about three to six per round), fix the discovered problems immediately and then conduct further testing, rather than conducting individual test runs with a larger number of test participants.
While Krug and Nielsen suggest a different number of test participants, they agree on one thing: It’s rather less about how many people in total you should test with but more about how user testing should be an ongoing activity. More important than finding the exact number of perfectly matched test participants, is to conduct your testing regularly. And in order to achieve this, you need a pool of available test participants as big as possible.