The usability data that we collect from remote testing comes in two forms: quantitative and qualitative. We can express quantitative data with numbers, such as how long a user stayed on a page, how many visits a page gets, or by serving up a survey asking users to rate an aspect of a site on a numerical scale. On the other hand, qualitative data is generally expressed in words, such as answers to open ended survey questions. Tests can also collect visual qualitative data such users’ facial expressions, but even then, those are translated into nominal data like “looked frustrated” for analysis and communication purposes.
Most usability professionals are familiar with the risks of overstating the reliability of findings based on quantitative data that comes from small samples. But a big mistake I see usability practitioners make is to understate the reliability of findings that are based on qualitative data, not realizing that small sample size does not diminish the believability of this kind of data.
One reason for this lack of confidence is that many testers are grounded in the Scientific Method with its reliance on hypothesis testing. In this traditional kind of research the purpose is to prove or disprove a specific hypothesis. This approach has validity in usability testing, especially in A/B testing such as “Which design installs more quickly?” In a word, we could say this type of usability testing is probative, that is, it’s trying to “test, try, or prove” something.
But a lot of qualitative research, and usability testing in particular, is illuminative, that is, it’s trying to gain insight into how the user experiences something. In usability testing, the user serves as a lens that lets the developers see the product from a perspective they can’t, namely someone who has no prior knowledge of the product trying to do something with it within their context.
Louise Kidder, an authority on qualitative research, describes a phenomenon she calls the click of recognition. In the context of usability testing, we can recognize this as hearing a user say something or seeing a user do something that causes a light go on in our heads. We have a clarifying moment or epiphany when we slap ourselves on the forehead and say, “Of course!” These Aha! moments occur because, in usability testing, participants let us see an application through fresh eyes. A widget or message that seemed crystal clear to us suddenly becomes vague or ambiguous when we see it from someone else’s frame of reference.
Let’s say I’ve written something and give it to my wife to look at. She finds a typo and points it out. Do I say, “Thanks, but let me have twelve other editors look at it, too?” No. It’s clearly wrong, and I can see it’s wrong. I was just too close to it when I wrote it and didn’t catch it. Her fresh eyes did, and I make the change with confidence based on an n of 1. In usability, the equivalent is a user interface bug or maybe where I failed to apply a known and widely accepted best practice. It takes just one user’s stumbling on it to trigger a click of recognition.
Now, my wife keeps reading and comes across this sentence: Tom told Dick to fire Harry, and it made him mad. She says, “I’m a bit confused about which of these characters you mean by “him.” Was Tom mad because he had to tell Dick how to do his supervisor’s job, was Dick mad because Tom was making him do the dirty work, or was Harry mad because he was getting fired?” What I was referring to was obvious to me when I wrote it, because I knew the details. But now that I have my wife as a lens, I can see how ambiguous the referent is. Do I need to get another opinion? No, now that I can feel someone else’s reasonable confusion, I have a click of recognition. Again, I make the change with confidence based on an n of 1.
But then she goes on to say, “By the way, I hate this font, you should use a different one.” I thank her and make a mental note to get some other opinions about that before changing it. No click, no confidence.
A hierarchy of clicks
The following is a hierarchy of clicks, starting at the most concrete and dependable level and moving to the more abstract:
- You knew better: It was just a mistake you didn’t catch. Some examples include links that don’t go where they are supposed to go and misspelled words. There’s no need to contemplate whether to make the change—you just do it.
- You were seeing the application through your deeper understanding of its structure: Common instances at this level include elements of a user interface that initially seemed obvious to you, but which suddenly look vague or non-intuitive once you see a user stumble. For example, you might show someone’s name on a Web page as a hyperlink, so a user can send an email message to that person by just clicking the name. But during a usability test, a participant clicks the link, voicing her expectation that she’ll navigate to a bio about the person. When her email client opens, it surprises her. Based on that one observation, you decide to add an email icon to clarify the link.
- You were seeing the application through your own world view: Seeing a context from the different perspective of even a few users can lead to a click of recognition. For example, I worked with a client who was testing the usability of a real estate property management software. It had a feature that let users generate their own custom reports. This was during a time when gerunds were popular with menu designers, and the selection to get to that particular feature was labeled “Building Reports.” We brought property managers in and asked to to create a report to show the rental revenue for a particular apartment complex for the last three months. No one would click on Building Reports. It turned out that property managers thought of “building” to mean a structure, as in “What’s on top of that building over there?” They didn’t want a report about a building, they wanted a report about rent. Click of recognition: Don’t say “building” to property managers unless you mean the brick and mortar kind.
A word of caution
I saw a similar phenomenon to the click of recognition when I was doing my doctoral research on cross-functional teams conducting usability tests. In that case, the team would watch a user struggle with a feature, then someone would say something like “I had that same problem.” Others would readily admit to having their own struggles with the same feature, but no one had ever brought it up before. The problem was that the members of the team had individually discounted themselves as dumb users when they had made that mistake. Seeing a user make the same mistake made it okay to admit to making it and talk about it.
There is an important difference between this phenomenon—which I call empathetic validation—and the click of recognition. With a click of recognition, your world view suddenly gets shifted. On the other hand, an empathetic validation reinforces a belief you already held. This doesn’t mean it lacks dependability, but it is your filter picking up on something the user says or does that aligns with your current world view.
In the case of the subjects in my doctoral study, one of the things that added to the dependability of the data was the concurrence of multiple members of the team who had encountered the same issue. I would advise a bit of caution, however, if you are observing someone alone and experience an empathetic validation. Challenge it a bit, and look for other corroborating evidence before you depend too much on it.
Quit being so timid about the believability of your qualitative findings because your sample size is small and the data is subjective. As I hope my examples have shown, sample size is not a factor in determining the dependability of data that is meant to illuminate. As for its being subjective, that is the strength of qualitative data: it is rich in the user’s context, which can be considerably different from the developers’ perspective.