Chapter 8: Evaluation in HCI

🎯 Learning Objectives

By the end of this study session, you’ll be able to:

Explain why evaluation is essential for creating usable systems that people actually want to use
Distinguish between formative evaluation (during design) and summative evaluation (with finished products)
Choose between expert analysis and user participation methods based on your project stage and budget
Design and conduct effective laboratory studies with proper protocols and guidelines
Plan field studies that capture natural user behavior while managing real-world challenges
Use observation and think-aloud techniques to gather meaningful user insights
Create effective questionnaires using appropriate question formats and design principles
Understand when to apply experimental techniques like A/B testing for design decisions
Analyze both quantitative and qualitative evaluation data to improve your designs

🌟 The Big Picture

Evaluation in HCI is your quality assurance system for making sure people can actually use what you build—and that they’ll like using it. Instead of guessing whether your design works, evaluation gives you concrete evidence about usability problems before you ship, helping you focus on real user needs rather than imaginary ones.

📚 Core Concepts

What Is HCI Evaluation?

Evaluation means systematically testing how usable your system is for real people. You’re not just checking if it works—you’re checking if it works well for humans.

The Four Main Goals:

Assess system functionality: Does it match what users actually need?
Measure interface effects: How easy is it to learn? How satisfied are users?
Identify specific problems: What causes confusion or unexpected results?
Validate user requirements: Can people use it? Will they like it?

Why Evaluation Matters (Your Safety Net)

Ensures usability: People can actually use your system and will want to
Focuses your efforts: Work on real problems, not imaginary ones
Provides improvement roadmap: Clear suggestions for making it better
Prevents shipping disasters: Fix problems before users encounter them

📚 Evaluation Timing and Types

Formative vs. Summative Evaluation

Formative Evaluation (During Design):

Happens while you’re still designing
Can be done by your team or with real users
Purpose: Catch problems early when they’re cheap to fix
Think of it like: Spell-check while writing, not after printing

Summative Evaluation (Finished Product):

Happens with your completed system
Almost always involves real users
Purpose: Verify that your final product actually works for people
Think of it like: Final exam to see if you achieved your goals

📚 Evaluation Methods: Expert vs. User-Centered

Expert Analysis Methods

When to use: Early stages, tight budgets, quick feedback needed

Advantages:

Relatively inexpensive (no user recruitment needed)
Can be used at any development stage
Quick turnaround time
Good for catching obvious usability violations

Key limitation: Doesn’t test actual system use—only whether your design follows established usability principles

User Participation Methods

When to use: Later development stages with working prototypes

The trade-off: More expensive and time-consuming, but gives you real user behavior data that expert analysis can’t provide

📚 Laboratory Studies: Controlled User Testing

When Laboratory Studies Work Best

Ideal situations:

System location would be dangerous for testing
Single-user tasks that don’t require natural context
Need to compare alternative designs under controlled conditions
Want to deliberately manipulate context to uncover specific problems

Laboratory Study Setup Requirements

Essential equipment and personnel:

Well-equipped usability lab with video monitoring
Multiple cameras (one for interface interactions, one for user expressions)
Voluntary participants willing to complete task lists
Observer team behind one-way mirror
Reality check: This is expensive, time-consuming, and complex

Laboratory Study Protocol: The Complete Guide

Before Testing Begins

Preparation essentials:

Establish clear objectives and information requirements
Pre-test everything before actual participants arrive
Create comfortable environment that reduces participant anxiety
Critical mindset shift: Emphasize you’re evaluating the system, not the user
Acknowledge that software likely has usability problems
Inform users they can stop anytime without penalty
Explain all monitoring equipment and its purpose
Obtain explicit consent for observation and recording
Guarantee individual results remain completely confidential

During Testing Sessions

Maintain optimal testing conditions:

Never assign unnecessary tasks—respect participant time
Keep atmosphere relaxed and supportive
Stay unobtrusive—don’t let yourself or equipment interfere
Hand out tasks one at a time to avoid overwhelming participants
Never show displeasure with user performance
Critical decision point: Stop immediately if session becomes too unpleasant for participant

After Testing Sessions

Proper closure and analysis:

Answer any participant questions
Analyze, summarize, and report findings relative to original objectives
Remember: Your data is only as good as your initial objectives

Laboratory Studies: Advantages and Disadvantages

Advantages:

Specialist equipment available for detailed measurements
Uninterrupted environment eliminates external distractions
Controlled conditions allow precise comparisons

Disadvantages:

Lacks natural context where system will actually be used
Difficult to observe multi-user cooperation and collaboration
Artificial environment may change user behavior

📚 Field Studies: Natural Environment Testing

When Field Studies Are Essential

Field studies work best when natural context is crucial to understanding how your system really performs.

Field Studies Advantages:

Natural environment preserves authentic user behavior
Context retained—you see real-world constraints and opportunities
Users interact with your system the way they naturally would

Field Studies Disadvantages:

Distractions can interfere with data collection
Environmental noise may affect measurements
Less control over variables that might influence results

📚 Data Collection Techniques

Observation + Think Aloud Protocol

How it works:

Observation: Watch users performing tasks with your system
Think Aloud: Ask users to describe what they’re doing, why, and what they think is happening

Think Aloud Advantages:

Simple technique requiring little specialized expertise
Provides valuable insight into user mental models
Shows how system is actually used (vs. how you think it’s used)

Think Aloud Limitations:

Subjective and selective—users may filter their thoughts
Act of describing can alter task performance
Users may think before speaking, changing natural behavior

What Observation Can Tell You

Valuable data you can collect:

Which features get used frequently
Time needed to complete specific tasks
Which interfaces cause confusion or frustration
How users react to error messages and system feedback

Managing the Observer Effect (Hawthorne Effect)

When people know they’re being watched, they often change their behavior. Minimize this by:

Reducing environmental distractions
Clearly stating you’re evaluating the interface, not their performance
Making observation as natural and non-intrusive as possible

Recording Your Observations

Choose your recording method strategically:

Hand-written notes (quick but limited)
Audio recording (captures verbal feedback)
Video recording (most complete but longest to analyze)
Rule of thumb: More complete records take longer to analyze—choose based on your time and analysis needs

📚 Query Techniques: Getting User Feedback

Interviews: One-on-One User Conversations

Interview Advantages:

Flexible format—adapt questions based on user responses
Deep exploration of issues as they arise
Elicit unexpected user perspectives and identify unanticipated problems
Personal connection can lead to more honest feedback

Interview Disadvantages:

Participants can be subjective and reluctant to criticize openly
Very time-consuming for both data collection and analysis
Interviewer skill significantly affects data quality

Questionnaires: Systematic User Feedback

Questionnaire Advantages:

Quick data collection from large user groups
Systematic analysis of responses across many users
Standardized questions ensure consistent data
Cost-effective for gathering broad user opinions

Questionnaire Disadvantages:

Less flexible than interviews—can’t adapt to unexpected responses
Limited ability to probe deeper into interesting issues
Require significant skill to design effective questions
Often suffer from poor response rates

Established HCI Questionnaires

Instead of creating your own from scratch, consider these proven instruments:

Historical questionnaires you should know:

QUIS (Maryland, 1988): Questionnaire for UI Satisfaction
PUEU (IBM, 1989): Perceived Usefulness & Ease of Use
PUTQ (Purdue, 1997): Purdue Usability Testing Questionnaire
CSUQ (IBM, 1995): Computer System Usability Questionnaire
SUS (Redhatch, 1986): System Usability Scale

Questionnaire Design: Question Formats Deep Dive

Closed Questions

Example: “Which aspects of the interface did you like most?” A. the color B. the sound C. the ease of navigation

Closed Question Advantages:

Easy for respondents to answer quickly
Responses are straightforward to analyze and compare
Clear data categorization

Closed Question Disadvantages:

May suggest ideas to respondents that wouldn’t naturally occur to them
Could distort findings by limiting response options
Miss unexpected insights outside your predetermined categories

Open Questions

Example: “What aspects of the interface did you like most?”

Open Question Advantages:

Respondents can answer however feels natural to them
Rich source of unexpected insights and detailed feedback
Captures user perspectives you might not have considered

Open Question Disadvantages:

Responses are difficult and time-consuming to analyze
Hard to compare across different users systematically
May intimidate some respondents who prefer guided options

Scalar Questions: Measuring User Attitudes

Likert Scales:

Range of values measuring agreement/disagreement with statements
Example: “The interface was easy to use”
- Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree
Limitation: Only measures agreement—hard to quantify other dimensions

Semantic Differential Scales:

Measure meaning and connotations of interface concepts
Examples of opposing pairs:
- Good ↔ Bad
- Quick ↔ Slow
- Important ↔ Unimportant
- Expensive ↔ Inexpensive
Purpose: Captures emotional and subjective reactions to your design

Questionnaire Design Excellence: Critical Guidelines

Essential Design Questions

Before writing each question, ask yourself:

Clarity: Can this question be easily understood by all respondents?
Precision: Is this question too vague or unnecessarily precise?
Bias: Does this question lead respondents toward a particular answer?
- Bad examples: “You support Arsenal, don’t you?” or “Are you a criminal?”
Necessity: Is this question essential to your evaluation objectives?
Willingness: Will respondents be willing to provide this information?
Applicability: Does this question apply to all your respondents?

Questionnaire Design Best Practices

Question Type Strategy:

Prefer closed questions whenever possible—offer range of potential answers
Use open questions only when you genuinely need unpredictable insights

Question Structure and Flow:

Start with general, easy questions
Progress to detailed, difficult questions
End with demographic information
This flow keeps respondents engaged and builds their confidence

Length Management:

Maximum recommended length: Two sides of A4 paper
Principle: Long questionnaires deter participation and reduce response quality

Survey Implementation: Step-by-Step Process

Complete survey workflow:

Define target population and sample size: Who do you want to ask, and how many?
Develop your questions: Use design principles covered above
Pre-test the questionnaire: Always test with small group before full deployment
Conduct the survey: Deploy using chosen method (online, paper, etc.)
Collect and analyze data: Plan analysis methods before data collection begins

📚 Experimental Techniques: Controlled Comparisons

Basic Experimental Design

Purpose: Test specific hypotheses about your design by controlling variables

The experimental process:

Choose specific hypothesis to test
Create experimental conditions that differ only in one controlled variable
Measure behavioral changes and attribute them to your variable manipulation
Key principle: Isolate the factor you’re testing from other influences

Experimental Design Approaches

Within-Groups Design

Structure: Each participant experiences all experimental conditions

Within-Groups Advantages:

Less costly—fewer participants needed
Less likely to suffer from individual user variation
Each person serves as their own control group

Within-Groups Disadvantage:

Transfer of learning problem: Experience with first condition affects performance on subsequent conditions

Between-Groups Design

Structure: Each participant experiences only one experimental condition

Between-Groups Advantages:

No transfer of learning between conditions
Clean comparison between different approaches

Between-Groups Disadvantages:

Requires more participants (more expensive)
Individual variation between groups can bias results
Need careful participant matching across groups

A/B Testing: Real-World Experimental Design

A/B Testing Process:

Create two minor variants of your design (A and B)
Show design A to even-numbered website visitors
Show design B to odd-numbered visitors
Monitor performance metrics (dwell time, click-through rates, conversion)
Choose better-performing design
Continuous improvement: Repeat process with new variations

Why A/B testing works: Real users, real context, real behaviors—but with controlled comparison

📚 Advanced Measurement Techniques

Eye Tracking Technology

Eye tracking reveals where users focus their attention, providing insights into visual design effectiveness and information hierarchy.

Physiological Measurement Methods

The connection: Emotional responses create measurable physical changes that can indicate user reactions to your interface

Measurement types:

Heart activity: Blood pressure, volume, and pulse changes
Galvanic Skin Response (GSR): Sweat gland activity indicating stress or excitement
Electromyogram (EMG): Electrical activity in muscles showing tension
Electroencephalogram (EEG): Brain electrical activity patterns

Current limitation: Interpreting physiological responses requires more research—we know they correlate with user experience but aren’t always sure exactly how

📚 Data Analysis Approaches

Quantitative Data Analysis

Statistical methods for numerical data:

Counting and frequency analysis
T-tests for comparing groups
Analysis of variance for multiple comparisons
Time-series analysis for behavior over time

Qualitative Data Analysis

Interpretive methods for rich, descriptive data:

Thematic analysis of interview transcripts
Comparative analysis across different user groups
Pattern identification in observational notes
Content analysis of open-ended questionnaire responses

📚 Evaluation Method Decision Framework

Method Selection Overview

Expert Analysis → Use when:

Early in development process
Budget constraints limit user testing
Need quick feedback on usability principles
Have usability expertise available

Laboratory Studies → Use when:

Need controlled environment for precise measurements
Comparing specific design alternatives
System context would be dangerous for field testing
Have budget for specialized equipment and facilities

Field Studies → Use when:

Natural context is crucial for realistic user behavior
System involves collaboration or complex workflows
Want to understand real-world usage patterns
Can manage environmental challenges

The key insight: Match your evaluation method to your specific research questions, development stage, and available resources. Each method reveals different aspects of user experience—often you’ll need multiple approaches for comprehensive understanding.

🔄 Connections and Review

HCI evaluation is fundamentally about reducing uncertainty in your design decisions through systematic evidence gathering. Whether you choose formative evaluation during design or summative evaluation afterward, expert analysis or user participation, laboratory control or field study authenticity, the goal remains constant: create systems people can use effectively and want to use repeatedly. Your choice of evaluation methods should align with your development stage, available resources, and the specific questions you need answered to improve your design.

Note51

Explorer

0.0 by Claude