In many ways the most creative, challenging, and
under-appreciated aspect of interaction design is evaluating designs with people.
The insights that you'll get from testing designs with people
can help you get new ideas, make changes, decide wisely, and fix bugs.
One reason I think design is such an interesting field is its relationship to
truth and objectivity.
I find design so incredibly fascinating because we can say more
in response to a question like, how can we measure success?
Than, it's just personal preference, or, whatever feels right.
At the same time, the answers are more complex, and more open-ended, and
more subjective, and require more wisdom than just a number like seven or three.
One of the things that we're gonna learn in this class is the different
kinds of knowledge that you can get out of different kinds of methods.
Why evaluate designs with people?
Why learn about how people use interactive system.
I think one major reason for this is that it can be difficult to tell how good
a user interface is until you've tried it out with actual users.
And that's because clients and designers and
developers, they may know too much about the domain and the user interface.
Or have acquired blinders through designing and building the user interface.
At the same time, they may not know enough about the users actual tasks.
And while experience in theory can help it can still be hard to predict what really
users will actually do.
You might wanna know can people figure out how to use it?
Or do they swear or giggle when using this interface?
How does this design compare to that design?
And if we change the user interface, how does that change people's behavior?
What new practices might emerge?
How do things change over time?
These are all great questions to ask about an interface, and
each will come from different methods.
The value of having a broad toolbox of different methods
can be especially valuable in emerging areas like mobile and social software.
Where people's use practices can be particularly context dependent, and
also evolve significantly over time in response to how other people
use software through network effects and things like that.
To give you a flavor of this,
I'd like to quickly run through some common types of empirical research in HCI.
The examples I'll show are mostly published work of one sort or
another because that's the easiest stuff to share.
If you have good examples from current systems out in the world,
post them to the forum.
I keep an archive of user interface examples, and I and
the other students would love to see what you could come up with.
One way to learn about the user experience of a design is to bring
people into your lab or office and have them try it out.
We often call these usability studies.
This watch someone use my interface approach is a common one in HCI.
There's basic strategy for a traditional user centered design is to iteratively
bring people into your lab or office until you run out of time and then release.
And if you had deep pockets, these rooms had a one way glass mirror and
the development team was on the other side.
In a leaner environment this may be just bringing people into your dorm
room or office.
You'll learn a huge amount by doing this.
Every single time that I or a student friend or colleague
has watched somebody use a new interactive system, we've learned something.
As designers we get blinders to systems' quirks, bugs, and false assumptions.
In the real world, people may have different tasks, goals, motivations,
and physical settings in your office or lab.
This can be especially true for interfaces that you think people might use on the go,
like at a bus stop or while waiting in line.
Second, there can be a please-thee experimenter bias,
where, when you bring somebody in to try to use your interface,
they know that they're trying out the technology that you developed.
And so they work harder or
be nicer than they would if they had to use it without the constraints of
a lab setup with the person who developed it watching right over them.
Third, in its most basic form, where you're trying out just one user interface,
there's no comparison point.
So while you can track when people laugh,
or swear, or smile with joy, you won't know whether they would have laughed more,
or sworn less, or smiled more if you'd had a different user interface.
And finally it requires bringing people to your physical location.
This is often a whole lot easier than people think but
it can be a psychological burden, even if nothing else.
A very different way of getting feedback from people is to use a survey.
Here's an example of a survey I got recently from San Francisco
asking about different street light designs.
Surveys are great because you can quickly get feedback from a large number
of responses, and it's relatively easy to compare them all to alternatives.
You can also automatically tally the results.
You don't even need to build anything, you can just show screenshots or mock ups.
One of the things that I've learned the hard way, though, is the difference
between what people say they're gonna do, and what they actually do.
Ask people how often they exercise and you'll probably get a much more optimistic
answer than how often they really do exercise.
Trying to imagine what a number of different street light designs might be
is really different than actually observing them on the street and
having them become a part of normal every day life.
Still, it can be valuable to get feedback.
Another type of respondent strategy is focus groups.
In a focus group,
you'll gather together a small group of people to discuss a design or idea.
On the other hand, for a variety of psychological reasons,
people may be inclined to say polite things or generate answers completely on
the spot that are totally uncorrelated with what they actually believe or
what they would actually do.
Focus groups can be a particularly problematic method when you're looking at
trying to gather data about taboo topics or about cultural biases.
With those caveats, right now we're just making a laundry list and
I think that focus groups,
like almost any other method, can play an important role in your tool belt.
Our third category of techniques is to get feedback from experts.
For example, in this class we're going to do a bunch of peer critique for
your weekly project assignments.
In addition to having users try your interface, it can be important to eat your
own dog food and use the tools that you build yourself.
When you're getting feedback from experts it can often be helpful to
have some kind of structured format.
Much like the rubrics that you'll see in your project assignments, and for getting
feedback on user interfaces one common approach to this structured feedback is
called Heuristic Evaluation, and you'll learn how to do that in this class.
It was pioneered by Jacob Neilson.
Our next genre is Comparative Experiments, taking two or more distinct options and
comparing their performance to each other.
These comparisons can take place in lots of different ways.
They can be in the lab, they can be in the field, they can be online.
These experiments can be, more or less, controlled, and
they can place over shorter or longer durations.
What you're trying to learn here is which option is the more effective.
And more often, what are the active ingredients, what are the variable that
matter in creating the user experience that you seek.
My former PhD student, Joel Brand, and his colleagues at Adobe,
ran a number of studies comparing help interfaces For programmers.
In particular they compared a more traditional search style user interface
for finding programming help with a search interface that integrated programming help
directly into your environment.
By running these comparisons they were able to see how programmers behavior
differed based on the changing help user interface.
Comparative experiments have an advantage over surveys
in that you get to see the actual behavior as opposed to self report.
And they can be better than usability studies because you're
comparing multiple alternatives.
This enables you to see what works better or worse, or
at least what works different.
I find that comparative feedback is also often much more actionable.
However, if you're running controlled experiments online,
you don't get to see much about the person that's on the other side of the screen.
And if you're inviting people into your office or
lab, the behavior you're measuring might not be very realistic.
If realistic longitude in a behavior is what you're after,
participant observation may be the approach for you.
And it's more long term evaluation can be important for
uncovering things that you might not see shorter term more controlled scenarios.
For example, my colleagues Bob Sutton and Andrew Harvey then studied brainstorming.
The prior literature on brainstorming had focused mostly on questions like,
do people come up with more ideas?
What Bob and Andrew realized by going into the field was that
brainstorming served a number of other functions also.
Like for example, brainstorming provides a way for
members of a design team to demonstrate their creativity to their peers.
It allows them to pass along knowledge that can then be used in other projects.
And it creates a fun, exciting environment that people like to work in and
that clients like to participate in.
In a real ecosystem all of these things are important.
In addition to just how many ideas do people come up with.
Nearly all experiments seek to build a theory on some level.
I don't mean anything fancy by this, just that we take some things to be
more irrelevant and other things to be less relevant.
We might for example assume that the ordering of search results
may play an important role in what people click on, but that the batting average of
the Detroit Tigers doesn't unless of course somebody's searching for baseball.
If you have a theory that's sufficiently formal mathematically that you can make
predictions, then you can compare alternative interfaces using that model
without having to bring people in.
And we'll go over that in this class a little bit with respect to input models.
One example of this can be found in the shape writers system.
When Shunin Zhai colleagues figured out how to build a keyboard
where people can enter an entire word in a single stroke.
They were able to do this with the benefit of formal models and
optimization-based approaches.
And while we won't get to it much in this intro course,
simulation can also be used for higher level cognitive tasks.
For example, Pete Pirolli and
colleagues at PARC have built impressive models of people's web surfing behavior.
These models enabled them to estimate for example which link somebody is most likely
to click on by looking at the relevant link tags.
So that's a whirlwind tour of a number of different empirical methods that
this class will introduce.
You'll wanna pick the right method for the right task, and
here are some issues to consider.
If you did it again, would you get the same thing.
Does this hold for people other than 18-year-old upper middle class students
who are doing this for course credit or a gift certificate?
Is this behavior also you'd see in the real world or
only in a more stilted lab environment?
My experience as a designer, researcher, teacher, consultant, advisor,
and mentor has taught me that evaluating designs with people is both easier and
more valuable than many people expect.
And there's an incredible light bulb moment that happens when you actually get
designs in front of people and see how they use them.