Chapter 8: Evaluation in HCI
🎯 Learning Objectives
By the end of this study session, you’ll be able to:
- Explain why evaluation is essential for creating usable systems that people actually want to use
- Distinguish between formative evaluation (during design) and summative evaluation (with finished products)
- Choose between expert analysis and user participation methods based on your project stage and budget
- Design and conduct effective laboratory studies with proper protocols and guidelines
- Plan field studies that capture natural user behavior while managing real-world challenges
- Use observation and think-aloud techniques to gather meaningful user insights
- Create effective questionnaires using appropriate question formats and design principles
- Understand when to apply experimental techniques like A/B testing for design decisions
- Analyze both quantitative and qualitative evaluation data to improve your designs
🌟 The Big Picture
Evaluation in HCI is your quality assurance system for making sure people can actually use what you build—and that they’ll like using it. Instead of guessing whether your design works, evaluation gives you concrete evidence about usability problems before you ship, helping you focus on real user needs rather than imaginary ones.
📚 Core Concepts
What Is HCI Evaluation?
Evaluation means systematically testing how usable your system is for real people. You’re not just checking if it works—you’re checking if it works well for humans.
The Four Main Goals:
- Assess system functionality: Does it match what users actually need?
- Measure interface effects: How easy is it to learn? How satisfied are users?
- Identify specific problems: What causes confusion or unexpected results?
- Validate user requirements: Can people use it? Will they like it?
Why Evaluation Matters (Your Safety Net)
- Ensures usability: People can actually use your system and will want to
- Focuses your efforts: Work on real problems, not imaginary ones
- Provides improvement roadmap: Clear suggestions for making it better
- Prevents shipping disasters: Fix problems before users encounter them
📚 Evaluation Timing and Types
Formative vs. Summative Evaluation
Formative Evaluation (During Design):
- Happens while you’re still designing
- Can be done by your team or with real users
- Purpose: Catch problems early when they’re cheap to fix
- Think of it like: Spell-check while writing, not after printing
Summative Evaluation (Finished Product):
- Happens with your completed system
- Almost always involves real users
- Purpose: Verify that your final product actually works for people
- Think of it like: Final exam to see if you achieved your goals
📚 Evaluation Methods: Expert vs. User-Centered
Expert Analysis Methods
When to use: Early stages, tight budgets, quick feedback needed
Advantages:
- Relatively inexpensive (no user recruitment needed)
- Can be used at any development stage
- Quick turnaround time
- Good for catching obvious usability violations
Key limitation: Doesn’t test actual system use—only whether your design follows established usability principles
User Participation Methods
When to use: Later development stages with working prototypes
The trade-off: More expensive and time-consuming, but gives you real user behavior data that expert analysis can’t provide
📚 Laboratory Studies: Controlled User Testing
When Laboratory Studies Work Best
Ideal situations:
- System location would be dangerous for testing
- Single-user tasks that don’t require natural context
- Need to compare alternative designs under controlled conditions
- Want to deliberately manipulate context to uncover specific problems
Laboratory Study Setup Requirements
Essential equipment and personnel:
- Well-equipped usability lab with video monitoring
- Multiple cameras (one for interface interactions, one for user expressions)
- Voluntary participants willing to complete task lists
- Observer team behind one-way mirror
- Reality check: This is expensive, time-consuming, and complex
Laboratory Study Protocol: The Complete Guide
Before Testing Begins
Preparation essentials:
- Establish clear objectives and information requirements
- Pre-test everything before actual participants arrive
- Create comfortable environment that reduces participant anxiety
- Critical mindset shift: Emphasize you’re evaluating the system, not the user
- Acknowledge that software likely has usability problems
- Inform users they can stop anytime without penalty
- Explain all monitoring equipment and its purpose
- Obtain explicit consent for observation and recording
- Guarantee individual results remain completely confidential
During Testing Sessions
Maintain optimal testing conditions:
- Never assign unnecessary tasks—respect participant time
- Keep atmosphere relaxed and supportive
- Stay unobtrusive—don’t let yourself or equipment interfere
- Hand out tasks one at a time to avoid overwhelming participants
- Never show displeasure with user performance
- Critical decision point: Stop immediately if session becomes too unpleasant for participant
After Testing Sessions
Proper closure and analysis:
- Answer any participant questions
- Analyze, summarize, and report findings relative to original objectives
- Remember: Your data is only as good as your initial objectives
Laboratory Studies: Advantages and Disadvantages
Advantages:
- Specialist equipment available for detailed measurements
- Uninterrupted environment eliminates external distractions
- Controlled conditions allow precise comparisons
Disadvantages:
- Lacks natural context where system will actually be used
- Difficult to observe multi-user cooperation and collaboration
- Artificial environment may change user behavior
📚 Field Studies: Natural Environment Testing
When Field Studies Are Essential
Field studies work best when natural context is crucial to understanding how your system really performs.
Field Studies Advantages:
- Natural environment preserves authentic user behavior
- Context retained—you see real-world constraints and opportunities
- Users interact with your system the way they naturally would
Field Studies Disadvantages:
- Distractions can interfere with data collection
- Environmental noise may affect measurements
- Less control over variables that might influence results
📚 Data Collection Techniques
Observation + Think Aloud Protocol
How it works:
- Observation: Watch users performing tasks with your system
- Think Aloud: Ask users to describe what they’re doing, why, and what they think is happening
Think Aloud Advantages:
- Simple technique requiring little specialized expertise
- Provides valuable insight into user mental models
- Shows how system is actually used (vs. how you think it’s used)
Think Aloud Limitations:
- Subjective and selective—users may filter their thoughts
- Act of describing can alter task performance
- Users may think before speaking, changing natural behavior
What Observation Can Tell You
Valuable data you can collect:
- Which features get used frequently
- Time needed to complete specific tasks
- Which interfaces cause confusion or frustration
- How users react to error messages and system feedback
Managing the Observer Effect (Hawthorne Effect)
When people know they’re being watched, they often change their behavior. Minimize this by:
- Reducing environmental distractions
- Clearly stating you’re evaluating the interface, not their performance
- Making observation as natural and non-intrusive as possible
Recording Your Observations
Choose your recording method strategically:
- Hand-written notes (quick but limited)
- Audio recording (captures verbal feedback)
- Video recording (most complete but longest to analyze)
- Rule of thumb: More complete records take longer to analyze—choose based on your time and analysis needs
📚 Query Techniques: Getting User Feedback
Interviews: One-on-One User Conversations
Interview Advantages:
- Flexible format—adapt questions based on user responses
- Deep exploration of issues as they arise
- Elicit unexpected user perspectives and identify unanticipated problems
- Personal connection can lead to more honest feedback
Interview Disadvantages:
- Participants can be subjective and reluctant to criticize openly
- Very time-consuming for both data collection and analysis
- Interviewer skill significantly affects data quality
Questionnaires: Systematic User Feedback
Questionnaire Advantages:
- Quick data collection from large user groups
- Systematic analysis of responses across many users
- Standardized questions ensure consistent data
- Cost-effective for gathering broad user opinions
Questionnaire Disadvantages:
- Less flexible than interviews—can’t adapt to unexpected responses
- Limited ability to probe deeper into interesting issues
- Require significant skill to design effective questions
- Often suffer from poor response rates
Established HCI Questionnaires
Instead of creating your own from scratch, consider these proven instruments:
Historical questionnaires you should know:
- QUIS (Maryland, 1988): Questionnaire for UI Satisfaction
- PUEU (IBM, 1989): Perceived Usefulness & Ease of Use
- PUTQ (Purdue, 1997): Purdue Usability Testing Questionnaire
- CSUQ (IBM, 1995): Computer System Usability Questionnaire
- SUS (Redhatch, 1986): System Usability Scale
Questionnaire Design: Question Formats Deep Dive
Closed Questions
Example: “Which aspects of the interface did you like most?” A. the color B. the sound C. the ease of navigation
Closed Question Advantages:
- Easy for respondents to answer quickly
- Responses are straightforward to analyze and compare
- Clear data categorization
Closed Question Disadvantages:
- May suggest ideas to respondents that wouldn’t naturally occur to them
- Could distort findings by limiting response options
- Miss unexpected insights outside your predetermined categories
Open Questions
Example: “What aspects of the interface did you like most?”
Open Question Advantages:
- Respondents can answer however feels natural to them
- Rich source of unexpected insights and detailed feedback
- Captures user perspectives you might not have considered
Open Question Disadvantages:
- Responses are difficult and time-consuming to analyze
- Hard to compare across different users systematically
- May intimidate some respondents who prefer guided options
Scalar Questions: Measuring User Attitudes
Likert Scales:
- Range of values measuring agreement/disagreement with statements
- Example: “The interface was easy to use”
- Strongly Disagree | Disagree | Neutral | Agree | Strongly Agree
- Limitation: Only measures agreement—hard to quantify other dimensions
Semantic Differential Scales:
- Measure meaning and connotations of interface concepts
- Examples of opposing pairs:
- Good ↔ Bad
- Quick ↔ Slow
- Important ↔ Unimportant
- Expensive ↔ Inexpensive
- Purpose: Captures emotional and subjective reactions to your design
Questionnaire Design Excellence: Critical Guidelines
Essential Design Questions
Before writing each question, ask yourself:
- Clarity: Can this question be easily understood by all respondents?
- Precision: Is this question too vague or unnecessarily precise?
- Bias: Does this question lead respondents toward a particular answer?
- Bad examples: “You support Arsenal, don’t you?” or “Are you a criminal?”
- Necessity: Is this question essential to your evaluation objectives?
- Willingness: Will respondents be willing to provide this information?
- Applicability: Does this question apply to all your respondents?
Questionnaire Design Best Practices
Question Type Strategy:
- Prefer closed questions whenever possible—offer range of potential answers
- Use open questions only when you genuinely need unpredictable insights
Question Structure and Flow:
- Start with general, easy questions
- Progress to detailed, difficult questions
- End with demographic information
- This flow keeps respondents engaged and builds their confidence
Length Management:
- Maximum recommended length: Two sides of A4 paper
- Principle: Long questionnaires deter participation and reduce response quality
Survey Implementation: Step-by-Step Process
Complete survey workflow:
- Define target population and sample size: Who do you want to ask, and how many?
- Develop your questions: Use design principles covered above
- Pre-test the questionnaire: Always test with small group before full deployment
- Conduct the survey: Deploy using chosen method (online, paper, etc.)
- Collect and analyze data: Plan analysis methods before data collection begins
📚 Experimental Techniques: Controlled Comparisons
Basic Experimental Design
Purpose: Test specific hypotheses about your design by controlling variables
The experimental process:
- Choose specific hypothesis to test
- Create experimental conditions that differ only in one controlled variable
- Measure behavioral changes and attribute them to your variable manipulation
- Key principle: Isolate the factor you’re testing from other influences
Experimental Design Approaches
Within-Groups Design
Structure: Each participant experiences all experimental conditions
Within-Groups Advantages:
- Less costly—fewer participants needed
- Less likely to suffer from individual user variation
- Each person serves as their own control group
Within-Groups Disadvantage:
- Transfer of learning problem: Experience with first condition affects performance on subsequent conditions
Between-Groups Design
Structure: Each participant experiences only one experimental condition
Between-Groups Advantages:
- No transfer of learning between conditions
- Clean comparison between different approaches
Between-Groups Disadvantages:
- Requires more participants (more expensive)
- Individual variation between groups can bias results
- Need careful participant matching across groups
A/B Testing: Real-World Experimental Design
A/B Testing Process:
- Create two minor variants of your design (A and B)
- Show design A to even-numbered website visitors
- Show design B to odd-numbered visitors
- Monitor performance metrics (dwell time, click-through rates, conversion)
- Choose better-performing design
- Continuous improvement: Repeat process with new variations
Why A/B testing works: Real users, real context, real behaviors—but with controlled comparison
📚 Advanced Measurement Techniques
Eye Tracking Technology
Eye tracking reveals where users focus their attention, providing insights into visual design effectiveness and information hierarchy.
Physiological Measurement Methods
The connection: Emotional responses create measurable physical changes that can indicate user reactions to your interface
Measurement types:
- Heart activity: Blood pressure, volume, and pulse changes
- Galvanic Skin Response (GSR): Sweat gland activity indicating stress or excitement
- Electromyogram (EMG): Electrical activity in muscles showing tension
- Electroencephalogram (EEG): Brain electrical activity patterns
Current limitation: Interpreting physiological responses requires more research—we know they correlate with user experience but aren’t always sure exactly how
📚 Data Analysis Approaches
Quantitative Data Analysis
Statistical methods for numerical data:
- Counting and frequency analysis
- T-tests for comparing groups
- Analysis of variance for multiple comparisons
- Time-series analysis for behavior over time
Qualitative Data Analysis
Interpretive methods for rich, descriptive data:
- Thematic analysis of interview transcripts
- Comparative analysis across different user groups
- Pattern identification in observational notes
- Content analysis of open-ended questionnaire responses
📚 Evaluation Method Decision Framework
Method Selection Overview
Expert Analysis → Use when:
- Early in development process
- Budget constraints limit user testing
- Need quick feedback on usability principles
- Have usability expertise available
Laboratory Studies → Use when:
- Need controlled environment for precise measurements
- Comparing specific design alternatives
- System context would be dangerous for field testing
- Have budget for specialized equipment and facilities
Field Studies → Use when:
- Natural context is crucial for realistic user behavior
- System involves collaboration or complex workflows
- Want to understand real-world usage patterns
- Can manage environmental challenges
The key insight: Match your evaluation method to your specific research questions, development stage, and available resources. Each method reveals different aspects of user experience—often you’ll need multiple approaches for comprehensive understanding.
🔄 Connections and Review
HCI evaluation is fundamentally about reducing uncertainty in your design decisions through systematic evidence gathering. Whether you choose formative evaluation during design or summative evaluation afterward, expert analysis or user participation, laboratory control or field study authenticity, the goal remains constant: create systems people can use effectively and want to use repeatedly. Your choice of evaluation methods should align with your development stage, available resources, and the specific questions you need answered to improve your design.