Chapter 8: Evaluation in HCI

🎯 Learning Objectives

By the end of this study session, you’ll be able to:

  • Explain why evaluation is essential for creating usable systems that people actually want to use
  • Distinguish between formative evaluation (during design) and summative evaluation (with finished products)
  • Choose between expert analysis and user participation methods based on your project stage and budget
  • Design and conduct effective laboratory studies with proper protocols and guidelines
  • Plan field studies that capture natural user behavior while managing real-world challenges
  • Use observation and think-aloud techniques to gather meaningful user insights
  • Create effective questionnaires using appropriate question formats and design principles
  • Understand when to apply experimental techniques like A/B testing for design decisions
  • Analyze both quantitative and qualitative evaluation data to improve your designs

🌟 The Big Picture

Evaluation in HCI is your quality assurance system for making sure people can actually use what you build—and that they’ll like using it. Instead of guessing whether your design works, evaluation gives you concrete evidence about usability problems before you ship, helping you focus on real user needs rather than imaginary ones.

📚 Core Concepts

What Is HCI Evaluation?

Evaluation means systematically testing how usable your system is for real people. You’re not just checking if it works—you’re checking if it works well for humans.

The Four Main Goals:

  • Assess system functionality: Does it match what users actually need?
  • Measure interface effects: How easy is it to learn? How satisfied are users?
  • Identify specific problems: What causes confusion or unexpected results?
  • Validate user requirements: Can people use it? Will they like it?

Why Evaluation Matters (Your Safety Net)

  • Ensures usability: People can actually use your system and will want to
  • Focuses your efforts: Work on real problems, not imaginary ones
  • Provides improvement roadmap: Clear suggestions for making it better
  • Prevents shipping disasters: Fix problems before users encounter them

📚 Evaluation Timing and Types

Formative vs. Summative Evaluation

Formative Evaluation (During Design):

  • Happens while you’re still designing
  • Can be done by your team or with real users
  • Purpose: Catch problems early when they’re cheap to fix
  • Think of it like: Spell-check while writing, not after printing

Summative Evaluation (Finished Product):

  • Happens with your completed system
  • Almost always involves real users
  • Purpose: Verify that your final product actually works for people
  • Think of it like: Final exam to see if you achieved your goals

📚 Evaluation Methods: Expert vs. User-Centered

Expert Analysis Methods

When to use: Early stages, tight budgets, quick feedback needed

Advantages:

  • Relatively inexpensive (no user recruitment needed)
  • Can be used at any development stage
  • Quick turnaround time
  • Good for catching obvious usability violations

Key limitation: Doesn’t test actual system use—only whether your design follows established usability principles

User Participation Methods

When to use: Later development stages with working prototypes

The trade-off: More expensive and time-consuming, but gives you real user behavior data that expert analysis can’t provide

📚 Laboratory Studies: Controlled User Testing

When Laboratory Studies Work Best

Ideal situations:

  • System location would be dangerous for testing
  • Single-user tasks that don’t require natural context
  • Need to compare alternative designs under controlled conditions
  • Want to deliberately manipulate context to uncover specific problems

Laboratory Study Setup Requirements

Essential equipment and personnel:

  • Well-equipped usability lab with video monitoring
  • Multiple cameras (one for interface interactions, one for user expressions)
  • Voluntary participants willing to complete task lists
  • Observer team behind one-way mirror
  • Reality check: This is expensive, time-consuming, and complex

Laboratory Study Protocol: The Complete Guide

Before Testing Begins

Preparation essentials:

  • Establish clear objectives and information requirements
  • Pre-test everything before actual participants arrive
  • Create comfortable environment that reduces participant anxiety
  • Critical mindset shift: Emphasize you’re evaluating the system, not the user
  • Acknowledge that software likely has usability problems
  • Inform users they can stop anytime without penalty
  • Explain all monitoring equipment and its purpose
  • Obtain explicit consent for observation and recording
  • Guarantee individual results remain completely confidential

During Testing Sessions

Maintain optimal testing conditions:

  • Never assign unnecessary tasks—respect participant time
  • Keep atmosphere relaxed and supportive
  • Stay unobtrusive—don’t let yourself or equipment interfere
  • Hand out tasks one at a time to avoid overwhelming participants
  • Never show displeasure with user performance
  • Critical decision point: Stop immediately if session becomes too unpleasant for participant

After Testing Sessions

Proper closure and analysis:

  • Answer any participant questions
  • Analyze, summarize, and report findings relative to original objectives
  • Remember: Your data is only as good as your initial objectives

Laboratory Studies: Advantages and Disadvantages

Advantages:

  • Specialist equipment available for detailed measurements
  • Uninterrupted environment eliminates external distractions
  • Controlled conditions allow precise comparisons

Disadvantages:

  • Lacks natural context where system will actually be used
  • Difficult to observe multi-user cooperation and collaboration
  • Artificial environment may change user behavior

📚 Field Studies: Natural Environment Testing

When Field Studies Are Essential

Field studies work best when natural context is crucial to understanding how your system really performs.

Field Studies Advantages:

  • Natural environment preserves authentic user behavior
  • Context retained—you see real-world constraints and opportunities
  • Users interact with your system the way they naturally would

Field Studies Disadvantages:

  • Distractions can interfere with data collection
  • Environmental noise may affect measurements
  • Less control over variables that might influence results

📚 Data Collection Techniques

Observation + Think Aloud Protocol

How it works:

  • Observation: Watch users performing tasks with your system
  • Think Aloud: Ask users to describe what they’re doing, why, and what they think is happening

Think Aloud Advantages:

  • Simple technique requiring little specialized expertise
  • Provides valuable insight into user mental models
  • Shows how system is actually used (vs. how you think it’s used)

Think Aloud Limitations:

  • Subjective and selective—users may filter their thoughts
  • Act of describing can alter task performance
  • Users may think before speaking, changing natural behavior

What Observation Can Tell You

Valuable data you can collect:

  • Which features get used frequently
  • Time needed to complete specific tasks
  • Which interfaces cause confusion or frustration
  • How users react to error messages and system feedback

Managing the Observer Effect (Hawthorne Effect)

When people know they’re being watched, they often change their behavior. Minimize this by:

  • Reducing environmental distractions
  • Clearly stating you’re evaluating the interface, not their performance
  • Making observation as natural and non-intrusive as possible

Recording Your Observations

Choose your recording method strategically:

  • Hand-written notes (quick but limited)
  • Audio recording (captures verbal feedback)
  • Video recording (most complete but longest to analyze)
  • Rule of thumb: More complete records take longer to analyze—choose based on your time and analysis needs

📚 Query Techniques: Getting User Feedback

Interviews: One-on-One User Conversations

Interview Advantages:

  • Flexible format—adapt questions based on user responses
  • Deep exploration of issues as they arise
  • Elicit unexpected user perspectives and identify unanticipated problems
  • Personal connection can lead to more honest feedback

Interview Disadvantages:

  • Participants can be subjective and reluctant to criticize openly
  • Very time-consuming for both data collection and analysis
  • Interviewer skill significantly affects data quality

Questionnaires: Systematic User Feedback

Questionnaire Advantages:

  • Quick data collection from large user groups
  • Systematic analysis of responses across many users
  • Standardized questions ensure consistent data
  • Cost-effective for gathering broad user opinions

Questionnaire Disadvantages:

  • Less flexible than interviews—can’t adapt to unexpected responses
  • Limited ability to probe deeper into interesting issues
  • Require significant skill to design effective questions
  • Often suffer from poor response rates

Established HCI Questionnaires

Instead of creating your own from scratch, consider these proven instruments:

Historical questionnaires you should know:

  • QUIS (Maryland, 1988): Questionnaire for UI Satisfaction
  • PUEU (IBM, 1989): Perceived Usefulness & Ease of Use
  • PUTQ (Purdue, 1997): Purdue Usability Testing Questionnaire
  • CSUQ (IBM, 1995): Computer System Usability Questionnaire
  • SUS (Redhatch, 1986): System Usability Scale

Questionnaire Design: Question Formats Deep Dive

Closed Questions

Example: “Which aspects of the interface did you like most?” A. the color B. the sound C. the ease of navigation

Closed Question Advantages:

  • Easy for respondents to answer quickly
  • Responses are straightforward to analyze and compare
  • Clear data categorization

Closed Question Disadvantages:

  • May suggest ideas to respondents that wouldn’t naturally occur to them
  • Could distort findings by limiting response options
  • Miss unexpected insights outside your predetermined categories

Open Questions

Example: “What aspects of the interface did you like most?”


Open Question Advantages:

  • Respondents can answer however feels natural to them
  • Rich source of unexpected insights and detailed feedback
  • Captures user perspectives you might not have considered

Open Question Disadvantages:

  • Responses are difficult and time-consuming to analyze
  • Hard to compare across different users systematically
  • May intimidate some respondents who prefer guided options

Scalar Questions: Measuring User Attitudes

Likert Scales:

  • Range of values measuring agreement/disagreement with statements
  • Example: “The interface was easy to use”
    • Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree
  • Limitation: Only measures agreement—hard to quantify other dimensions

Semantic Differential Scales:

  • Measure meaning and connotations of interface concepts
  • Examples of opposing pairs:
    • Good ↔ Bad
    • Quick ↔ Slow
    • Important ↔ Unimportant
    • Expensive ↔ Inexpensive
  • Purpose: Captures emotional and subjective reactions to your design

Questionnaire Design Excellence: Critical Guidelines

Essential Design Questions

Before writing each question, ask yourself:

  • Clarity: Can this question be easily understood by all respondents?
  • Precision: Is this question too vague or unnecessarily precise?
  • Bias: Does this question lead respondents toward a particular answer?
    • Bad examples: “You support Arsenal, don’t you?” or “Are you a criminal?”
  • Necessity: Is this question essential to your evaluation objectives?
  • Willingness: Will respondents be willing to provide this information?
  • Applicability: Does this question apply to all your respondents?

Questionnaire Design Best Practices

Question Type Strategy:

  • Prefer closed questions whenever possible—offer range of potential answers
  • Use open questions only when you genuinely need unpredictable insights

Question Structure and Flow:

  • Start with general, easy questions
  • Progress to detailed, difficult questions
  • End with demographic information
  • This flow keeps respondents engaged and builds their confidence

Length Management:

  • Maximum recommended length: Two sides of A4 paper
  • Principle: Long questionnaires deter participation and reduce response quality

Survey Implementation: Step-by-Step Process

Complete survey workflow:

  1. Define target population and sample size: Who do you want to ask, and how many?
  2. Develop your questions: Use design principles covered above
  3. Pre-test the questionnaire: Always test with small group before full deployment
  4. Conduct the survey: Deploy using chosen method (online, paper, etc.)
  5. Collect and analyze data: Plan analysis methods before data collection begins

📚 Experimental Techniques: Controlled Comparisons

Basic Experimental Design

Purpose: Test specific hypotheses about your design by controlling variables

The experimental process:

  • Choose specific hypothesis to test
  • Create experimental conditions that differ only in one controlled variable
  • Measure behavioral changes and attribute them to your variable manipulation
  • Key principle: Isolate the factor you’re testing from other influences

Experimental Design Approaches

Within-Groups Design

Structure: Each participant experiences all experimental conditions

Within-Groups Advantages:

  • Less costly—fewer participants needed
  • Less likely to suffer from individual user variation
  • Each person serves as their own control group

Within-Groups Disadvantage:

  • Transfer of learning problem: Experience with first condition affects performance on subsequent conditions

Between-Groups Design

Structure: Each participant experiences only one experimental condition

Between-Groups Advantages:

  • No transfer of learning between conditions
  • Clean comparison between different approaches

Between-Groups Disadvantages:

  • Requires more participants (more expensive)
  • Individual variation between groups can bias results
  • Need careful participant matching across groups

A/B Testing: Real-World Experimental Design

A/B Testing Process:

  • Create two minor variants of your design (A and B)
  • Show design A to even-numbered website visitors
  • Show design B to odd-numbered visitors
  • Monitor performance metrics (dwell time, click-through rates, conversion)
  • Choose better-performing design
  • Continuous improvement: Repeat process with new variations

Why A/B testing works: Real users, real context, real behaviors—but with controlled comparison

📚 Advanced Measurement Techniques

Eye Tracking Technology

Eye tracking reveals where users focus their attention, providing insights into visual design effectiveness and information hierarchy.

Physiological Measurement Methods

The connection: Emotional responses create measurable physical changes that can indicate user reactions to your interface

Measurement types:

  • Heart activity: Blood pressure, volume, and pulse changes
  • Galvanic Skin Response (GSR): Sweat gland activity indicating stress or excitement
  • Electromyogram (EMG): Electrical activity in muscles showing tension
  • Electroencephalogram (EEG): Brain electrical activity patterns

Current limitation: Interpreting physiological responses requires more research—we know they correlate with user experience but aren’t always sure exactly how

📚 Data Analysis Approaches

Quantitative Data Analysis

Statistical methods for numerical data:

  • Counting and frequency analysis
  • T-tests for comparing groups
  • Analysis of variance for multiple comparisons
  • Time-series analysis for behavior over time

Qualitative Data Analysis

Interpretive methods for rich, descriptive data:

  • Thematic analysis of interview transcripts
  • Comparative analysis across different user groups
  • Pattern identification in observational notes
  • Content analysis of open-ended questionnaire responses

📚 Evaluation Method Decision Framework

Method Selection Overview

Expert Analysis → Use when:

  • Early in development process
  • Budget constraints limit user testing
  • Need quick feedback on usability principles
  • Have usability expertise available

Laboratory Studies → Use when:

  • Need controlled environment for precise measurements
  • Comparing specific design alternatives
  • System context would be dangerous for field testing
  • Have budget for specialized equipment and facilities

Field Studies → Use when:

  • Natural context is crucial for realistic user behavior
  • System involves collaboration or complex workflows
  • Want to understand real-world usage patterns
  • Can manage environmental challenges

The key insight: Match your evaluation method to your specific research questions, development stage, and available resources. Each method reveals different aspects of user experience—often you’ll need multiple approaches for comprehensive understanding.

🔄 Connections and Review

HCI evaluation is fundamentally about reducing uncertainty in your design decisions through systematic evidence gathering. Whether you choose formative evaluation during design or summative evaluation afterward, expert analysis or user participation, laboratory control or field study authenticity, the goal remains constant: create systems people can use effectively and want to use repeatedly. Your choice of evaluation methods should align with your development stage, available resources, and the specific questions you need answered to improve your design.