Every three months, I conduct benchmark usability testing. I’m calling these tests ‘benchmark testing’ because the aim of these sessions is to measure our progress towards achieving a great user experience with Ubuntu. Last testing took place in October 2011. I am now preparing for testing 12.04 to take place a couple of weeks from now.
When I publish the results of usability testing, I get many questions about my process. So I have thought that the best way to explain how I approach usability is to take you along the preparation and execution of my benchmark testing. Over the next month, I will take you, step by step through my process, from recruiting participants, to writing a test protocol to conducting and analysing usability sessions and writing up results. This will afford you the possibility of ‘accompanying me’, so to speak, and of conducting usability in parallel, if you are so inclined.
For this post, I walk through the first stage of any testing: recruiting participants.
Recruiting
This is a crucial part of any successful and meaningful testing. Some argue that just anyone you can get hold of will do. This attitude, in my view, puts the software before the people who will use it, and carries the implicit assumption that software, by its very nature, is usable. But the simple fact, which we actually all realise, is that it isn’t. Take music players, for instance. The challenge for this type of software is to fit into the lives of people who want to listen to music. It doesn’t have to work well for those who don’t listen to music but who are, for instance, heavily into photo editing. In a word, testing your software with your grandmother or your partner might not provide all the feedback you need to create a user-friendly product if they are not engaged in the activities your software is meant to facilitate.
So, the basic idea is: in preparing the testing, recruit the right people. The type of participants you work with will determine the quality and reliability of the results you get.
There are some basic rules for writing a screener questionnaire.
Rule 1: Recruit according to your testing goals
Is your goal to test, for instance, adoption: that is, are you going to assess how new users respond to your software the first time they encounter it and how delighted they are by it? Alternatively, is your goal to test learning: do you want to assess how easily a novice can figure out how to use your software and how they progress over time? Or are you really interested in expert usage: do you want to assess how performative your software is in a specific context of use involving expert tasks? There are, of course, other scenarios as well. The point here is that you need to be clear about your goal before you begin.
With Unity, we have 2 basic goals: 1) adoption: we want to know how easy to use and attractive Unity is to someone who has not encountered it before; and 2) expert usage: we want to know how performative Unity is with highly competent users who are fairly familiar with it.
Given these very different goals, I will need to conduct 2 different user testing sessions with different recruiting screeners or questionnaires, and different protocols.
In this blog, I concentrate on my first project, to test for adoption.
Rule 2: Know your software
You need to review your software carefully: you need to (1) identify the main purpose of the software and the activities or tasks that it is meant to facilitate; and (2) identify where you think potential usability weaknesses are.
When I prepare a usability test, and before I even think about recruiting participants, I spend a significant amount of time trying out the software, and even more time discussing with the designers and developers their own concerns. From this evaluation of the usefulness and usability of the software, I’m able to sketch a profile of participants. Bear in mind that, given my goals as set out above, the participants will need to be able to use the software right away even if they’ve never used Ubuntu, since I am not testing for learning.
Given what Unity aims to allow users to do, we need to confirm (or not) in the testing that Unity users can easily get set up for and can conduct at least the following activities:
- writing, saving, printing documents
- finding, opening applications
- listening to music
- watching a movie
- managing and editing photos
- customising their computer: organising icons and short-cuts and changing setting
- browsing the internet
- communicating
Additionally, the OS should make it easy for users to:
- multi task
- navigate and use special features like alt-tab
- be aware of what’s going on with their computer
- create short-cuts
- understand icons, notifications and generally the visual language
In this instance, I want as well to test the new features we have designed since 11.10
Given my goals, my recruitment screener should be written in a way that will provide me with participants who engage in these activities on a regular basis.
Rule 3: Make sure you have an appropriate number of participants, with an appropriate range of expertise, with appropriately different experiences
I’ve often heard it said that all you need is a handful of participants – for example, 5 will do. While this may be true for very specific testing, when your participants come from a homogeneous group (for example, cardiologists, for testing a piece of cardiology software), it is not true generally. Much more often, software is meant to be used by a variety of people who have differing goals, and differing relevant experience and contexts of use.
You need to take these into account for 2 purposes: 1) to be able to test the usefulness and appropriateness of the software for different users; and 2) to be able to assess the reasons and origins of any usability problem that you find – these can be explained by comparing differences between users. A usability problem will have a different design solution if it is created by a user’s lack of expertise than if it is created by a shortcoming of the software that stumped all user groups. It will also help rate the severity of the discovered problems.
Some of the factors a competent recruiting will take into account are:
Different levels of expertise: for example, in the case of software for photo-editing, you probably need to assess the ease of use for people who have been editing their photos for more than 5 years, and for those who have been editing for less than 1 year. Expertise can be reflected in the length of time they have been engaged in the activity and also in the complexity of their activities. You may want to recruit people who do basic editing, like eliminating red-eye; and then, to compare their use of your software to the use by people who do special effects, montages, presentations and the like. This way, you get feedback on a wide range of the software’s features and functionalities.
Different kinds of uses: potential users will have different needs and different potential uses for the software. For example, if the software is healthcare related, it may well be used by doctors, nurses, radiologists – and sometimes even patients. It is useful, when considering recruiting, to include participants from these various professions and other walks of life, so that you will be able to determine how well your software serves the range of needs, processes and work conditions represented by the likely (range of) users.
Different operating systems: you may want to select participants who use, at least, Windows, Mac and Ubuntu. Users who are new to Ubuntu have acquired habits and expectations from using another OS. These habits and expectations become with time equated with ease of use for these users because of their familiarity. Recruiting participants with different habits and expectations will help you to understand the impact of these expectations as well as receptivity to innovation.
Recruiting your participants with precision will allow you to understand the usability of your software in a complex and holistic way and will dictate more innovative and effective design solutions.
Keep in mind, however, that the more diverse the kinds of persons who you envisage will be primary users for the software are, the larger the number of participants you will need. You should recruit at the very least 5 similar participants per group – for instance, in the healthcare example, at least 5 doctors, 5 nurses, and 5 patients.
A few more things to consider explicitly putting into your questionnaire/screener, particularly if you are writing it for a recruiting firm:
It is advisable to have a mix of male and female participants;
Participants from different age groups often have different experiences with technologies, and so you should include a good mix of ages;
The perceived level of comfort with a computer can also help the moderator understand the participant’s context of use. A question about how participants assess themselves as computer users can very often be helpful;
You should always add a general open question to your screener to judge the degree of facility with which the potential participant expresses ideas and points of view. The moderator is dependent on the participant to express, in a quite short amount of time, the immediate experience of using the software. Consequently, being able to understand the participant quickly and precisely is vital to obtaining rich and reliable data. The individual who makes the recruitment needs to be able to evaluate the communication proficiency of the potential participant.
Rule 4: Observe the basics of writing the recruitment screener
The most reliable way to obtain the desired participants is to get them to describe their behaviours rather than relying on their judgment when they respond to the screening questionnaire. For example, if you want a participant who has a good experience in photography, instead of formulating your questions as:
Question: Do you have extensive experience in photography?
Choice of answers:
Yes
No
You should formulate your question in a way to make sure the person has some level of familiarity with photography:
Question: During the last 6 months I have taken:
Choice of answers:
between 20 and 50 photos a month [Recruit]
Less than 20 photos a month [Reject]
By matching potential participants to actual behaviours, you can make a reasonable guess, for example, here, that someone who has been taking 50 photos every months in the last 6 months is indeed competent in photography, whereas when you rely on the person’s own assessment that they have extensive experience, you can’t know for sure that they are using the same criteria as you do to evaluate themselves.
Your screener should be created from a succession of questions representing a reasonable measure of familiarity and competence with the tasks you will test in your software.
That said, your screener should not be too long, as the recruitment agency personnel will probably spend no more than 10 minutes to qualify candidates they are speaking with on the phone. At the same time though, you need to ensure that you cover questions about all the key tasks that you will ask participants to perform during the test.
Summing up
Let me sum up the basics I’ve just covered by showing you the requirements I have in my screener for testing the ease of use of Unity by the general public user, not necessarily familiar with Ubuntu. They include that:
- there should be a mix of males and females;
- there should be a variety of ages;
- participants should not have participated in more than 5 market research efforts (because people who regularly participate in market research might not be as candid as others would be);
- there should be a mix of Windows, Mac and Ubuntu users;
- participants should:
- have broadband at home (being an indicator of interest in and use of computer during personal time);
- spend 10 hours or more per week on computer for personal reasons (which shows engagement with activities on computer);
- be comfortable with the computer, or be a techy user;
- use 2 monitors on a daily basis (I want to test our new multi-monitor design) to carry out a variety of activities online (part of the designs I want to test relate to managing documents, photos, music, and so forth and I want my participants to be familiar with these activities already);
- use alt-tab to navigate between applications and documents (another feature I intend to test for usability);
- have a general interest in technologies (I want to make sure that their attitude towards new technologies is positive, so they are open naturally to our design);
- express ideas and thoughts clearly.
In closing let me add that testing with friends and relatives is very difficult at many levels. First, you can’t ask all the questions you need to: there are many ‘common understandings’ that prevent the moderator from asking ‘basic/evident/challenging’ questions that might need to be asked to participants. Second, participants might not be sincere or candid about their experience: someone who knows you and understands your commitment to the software might not express what they think, and they may not identify problems they are experiencing and thus, they might minimise the impact of a usability issue or even take the blame for it. Third, of course, they might not fit as precisely as they should the recruitment screener.
Feel free to use this screener to recruit participants if you would like to conduct testing sessions along with the ones I will be doing at Canonical.
In a couple of days, I will write a blog post about writing the protocol for this round of testing – which is the next step you’ll need to take while you’re waiting for participants to be recruited.
Easy photo editing is a problem under Linux.
Red-eye correction, cropping, resizing, rotating picture, add effects, format conversion… In which program, if I am a beginner?
Thanks you for offering such extensive insight.
What’s the reasoning to exlude the currently unemployed?
Q10, “How do you go back and forth between applications or documents when you have many opened?”, excluding everyone not answering alt-tab seem troubling to me. Alt-tab is cool for switching between 2 or maybe 3 applications, when there is a 1:1 between windows and applications. If you have to deal with more than that, then old-school window-buttons on a panel can easily be more obvious/predictable/direct and there’s also Exposé and its offspring. Do you want to test alt-tab only with users who are “blind” to other options?
Thanks for your post. One question about the call: How do you TERMINATE it? Do you tell the person he/she is not qualified for the research and why?
Its always nice and interesting to read the design blog posts :)
I would be interested in a UX test of Samba. Sharing data in a local network is something that should be doable by everybody but problematic in Linux since I can remember. Samba is just too complicated and therefore not really average user friendly.
Hi Charline,
Although this might not relate to what you’re saying in this post, however, since it’s about designing the Unity UI in general, may I add one humble suggestion.
Now this is concerning the Unity’s Application launcher. It’s a pretty decent tool, however, I’m not that quite fond with the windows switching implementation in general as it feels a bit unintuitive.
In simple terms, I just don’t like the hassle we (I) have to go through whenever trying locate a specific window of the same application for an example (say that I’m trying to open the “Downloads” window of Firefox while running Firefox).
The current implementation might look good on a touch-screen based device (as it shows all the windows of the current app in full screen), but I’d prefer something that’s a bit, well, something like what we get with MS Windows Aero for instance (parley! ;-)).
I few months ago I created a lame mockup :D and I don’t know what others might think, but could you have a look at it and say what you think?
Here’s the link (it’s in the middle of the article, so please scroll down a bit).
http://www.hecticgeek.com/2011/12/ubuntu-unity-desktop-customizer/
Here’s a quick link of the image.
http://www.hecticgeek.com/wp-content/uploads/2011/12/unity-aero-window-preview-mock-up.png
I’m pretty sure there are better ways of doing this and the suggestion is in fact a bit lame, so my humble request is that, could you at least try to come up with a better way of switching between windows than what we have currently?
Have the design team done a user survey to know whether most people like the current implementation?
Thank you.
PS: Excellent post btw :).
Canonical’s user testing is always interesting. Thanks for the preview for the next round.
This particular round of testing looks to target a more computer-proficient crowd. For instance, my experience is that people using multiple monitors are pretty rare. And I thought previous testing excluded those already using Ubuntu.
Unemployed: I exclude them because I want participants who participate in the research to have a structured context in which they use their computer – which is, commonly work.
Participants – alt+tab: Not necessarily. The participants I will get are likely to use a variety of options to navigate between documents/applications according to what they are doing. For me, in the context of this study, I need people who are familiar with alt+tab because I want to gather feedback on its design. If participants have not use this feature before, they won’t be able to give feedback that I can rely on, because they have not experienced its usefulness and don’t have expectations.
By the way, the logic of the screener is not to eliminate people who use alt+tab and who also close windows or have another way to navigate. The aim is to get people who do use alt+tab at least some of the time. When recruiters ask this question, they encourage potential participants to list all the ways they have to accomplish something. What they want to find out is if they use alt+tab at all.
I generally use a recruiting firm for this. I think the standard interaction is to say that unfortunately they don’t qualify for the research at this time because we are looking for people who do x and they thank them very much. They will be considered for other research.
Jeremy,
I have not excluded people ‘using Ubuntu’ in the past – professional recruiting firms are challenged to find them. I’ve always been very pleased when I got any. As a general rule, I like testing with a mix of participants, some who are familiar with Ubuntu and some who encounter it for the first time who are Mac or Windows users.
Our participants, though, have often been proficient in something, in photo editing if we were focusing on it, etc. This time, we are focusing on multi-monitors, so, yes, we might have generally a more computer savvy crowd.
Remember that this is ‘benchmark testing’ – that means we have usability results from previous similar test protocols with users with different competencies. I will be comparing my results with the previous to assess the validity findings and their significance for Unity in general.
I would say its very problematic in general. I can’t tell you how many times I have WinXP-7 machines on the same network workgroup that can share data. It is not unique to Linux.
It’s very interesting to see the methodology used in usability testing, thank you for sharing that important info! I’m looking forward for the next postings!
I have one question regarding sample size. You mentioned it’s recomended that we have at least 5 similar participants per group, but how do you determine whether a number of participants is large enough to be representative of the target user population, allowing for an accurate generalization? Is there a way to verify if it is necessary to test more users before reaching conclusions?
i think not using people that are unemployed is not fair. The reason being as people still use computers even if they are unemployed, plus the fact that your also missing is that they once where employed, and was maybe using there computers for work also.
Yo me apunto!! Este tipo de propuestas, son muy interesantes dado que se pueden hallar dificultades en la usuabilidad. Voy a portar mi granito de arena! :)
Thank you for sharing the methodology you are using for the benchmark. I also suggest that you implement a new benchmark with less tech savvy users as well.
There is a quite interesting video on OMG Ubuntu, about an average user (not a tech savvy one) trying to use and understand Ubuntu Desktop experience: http://www.omgubuntu.co.uk/2012/03/video-a-users-first-time-with-ubuntu-11-10/
The main aspects that can be seen there is that, despite the fact that it was running Unity 2D and by so, lacking some usability hints and a few features, the system doesn’t “teach” itself.
Also the recently discussion around Global Menus being hidden by default, will help getting this experience worst.
Hidden things shouldn’t be the default, they should be optional. You can combine them all in one “give me a cleaner desktop” option if you like, but DO NOT hide things by default.
i have a idea :)
how to send you the pictures ?
Please post more articles in the series promised at the top :)