Chapter 8: Evaluation

Purpose and Importance of Evaluation

Definition

  • Evaluation is the process of assessing the usability of a system or interface to determine how well it supports users in achieving their goals.

Causes

  • Need to validate design decisions and ensure systems meet real user needs rather than imagined ones.

Goals / Objectives

  • Determine how usable a system is for different user groups.
  • Identify good and bad design features to inform future iterations.
  • Compare design alternatives to support decision-making.
  • Observe the effects of specific interfaces on users.
  • Validate that users can use the system and will like it.

Importance

  • Ensures people can actually use the product effectively and enjoyably.
  • Helps designers focus on real problems instead of assumptions.
  • Provides a basis for suggesting concrete improvements.
  • Prevents shipping products with unresolved usability issues.

Benefits

  • Early identification of usability flaws.
  • Data-driven design refinement.
  • Improved user satisfaction and adoption.

Procedures

  • Conducted throughout the HCI design lifecycle.
  • Can involve expert analysis, user testing, observation, or questionnaires.
  • Applied to both early (low-fidelity) and final (high-fidelity) designs.

Advantages & Disadvantages

  • Not specified in notes as a direct comparison for evaluation in general.

Impact / Effect

  • Leads to more user-centered, effective, and satisfying systems.
  • Reduces post-release support costs and user frustration.

Examples

  • Testing a prototype with users to see if they can complete a registration task.
  • Comparing two menu layouts to see which is faster to use.

Types of Evaluation: Formative vs. Summative

Definition

  • Formative evaluation: Conducted during the design and development phase to improve the product iteratively.
  • Summative evaluation: Conducted on a finished product to assess its overall usability and performance.

Causes

  • Different stages of the design lifecycle require different evaluation goals—improvement vs. validation.

Goals / Objectives

  • Formative: Inform and guide ongoing design decisions.
  • Summative: Measure final usability against benchmarks or requirements.

Importance

  • Formative evaluation prevents costly late-stage redesigns.
  • Summative evaluation provides evidence of product readiness or success.

Benefits

  • Formative: Enables early problem detection and user involvement.
  • Summative: Supports go/no-go decisions and competitive benchmarking.

Procedures

  • Formative: Involves designers and/or users during prototyping; uses methods like think-aloud, expert reviews, or quick usability tests.
  • Summative: Typically involves real users testing a complete system under controlled or real-world conditions.

Advantages & Disadvantages

  • Not explicitly contrasted in notes, but implied by context.

Impact / Effect

  • Formative evaluation shapes better designs; summative evaluation validates them.

Examples

  • Formative: Testing a paper prototype with 5 users to refine navigation.
  • Summative: Running a final usability test with 20 users before product launch.

Evaluation Methods: Expert Analysis vs. User Participation

Definition

  • Expert analysis: Usability review performed by HCI experts without involving real users.
  • User participation: Evaluation that involves real users interacting with the system.

Causes

  • Resource constraints, project stage, or evaluation goals determine method choice.

Goals / Objectives

  • Expert analysis: Identify likely usability problems quickly and cheaply.
  • User participation: Observe actual user behavior and gather authentic feedback.

Importance

  • Expert analysis is efficient for early-stage screening.
  • User participation reveals real-world usage patterns and emotional responses.

Benefits

  • Expert analysis: Fast, inexpensive, no lab or users needed.
  • User participation: Provides empirical, behavioral, and attitudinal data.

Procedures

  • Expert analysis: Experts inspect designs using methods like heuristic evaluation or cognitive walkthroughs.
  • User participation: Users perform tasks while being observed, interviewed, or surveyed.

Advantages & Disadvantages

  • Expert analysis:
    • Advantages: Low cost, applicable early, no user recruitment needed.
    • Disadvantages: Does not assess actual use; limited to expert judgment.
  • User participation:
    • Advantages: Captures real behavior and subjective experience.
    • Disadvantages: More time-consuming, expensive, and logistically complex.

Impact / Effect

  • Expert analysis catches common flaws early; user testing uncovers unexpected issues.

Examples

  • Expert analysis: A UX designer reviewing a wireframe against Nielsen’s heuristics.
  • User participation: Observing users struggle to find the “checkout” button on an e-commerce site.

Laboratory Studies

Definition

  • Controlled usability tests conducted in a specialized lab environment, separate from the user’s normal context.

Causes

  • Need for precise control over variables, use of specialized equipment, or isolation from distractions.

Goals / Objectives

  • Observe user interaction under consistent, repeatable conditions.
  • Compare alternative designs objectively.

Importance

  • Enables detailed data collection (e.g., video, eye tracking) not feasible in the field.

Benefits

  • Access to specialist equipment (e.g., cameras, eye trackers).
  • Uninterrupted observation and recording.
  • Controlled comparison of design variants.

Procedures

  • Recruit voluntary participants.
  • Set up lab with video monitoring (e.g., one camera on screen, one on user).
  • Observers watch behind a one-way mirror.
  • Follow structured guidelines before, during, and after testing (see “Guidelines” subtopic).

Advantages & Disadvantages

  • Advantages:
    • Specialist equipment available
    • Uninterrupted environment
  • Disadvantages:
    • Lacks real-world context
    • Difficult to observe group or collaborative tasks

Impact / Effect

  • May produce artificial behavior due to unnatural setting (Hawthorne Effect).
  • Best for single-user, constrained tasks.

Examples

  • Testing a medical device interface in a simulated clinical lab.
  • Comparing two login flows using eye-tracking in a usability lab.

Field Studies

Definition

  • Evaluation conducted in the user’s natural environment (e.g., office, home, public space).

Causes

  • When context of use is critical to understanding behavior (e.g., noise, interruptions, tools).

Goals / Objectives

  • Observe how users interact with a system in real-life conditions.
  • Capture authentic workflows and environmental influences.

Importance

  • Reveals how context shapes use—something labs cannot replicate.

Benefits

  • Retains natural work context.
  • Uncovers real-world challenges (e.g., distractions, multitasking).

Procedures

  • Observe users performing tasks in their usual setting.
  • Use techniques like observation, think-aloud, or interviews.
  • Record data via notes, audio, or video (with consent).

Advantages & Disadvantages

  • Advantages:
    • Natural environment
    • Context retained
  • Disadvantages:
    • Distractions and noise
    • Harder to control variables

Impact / Effect

  • Provides ecologically valid insights but may be harder to analyze due to variability.

Examples

  • Watching nurses use a tablet during patient rounds in a hospital.
  • Observing retail staff using a point-of-sale system during a busy shift.

Think-Aloud and Observation

Definition

  • Observation: Watching users perform tasks without interference.
  • Think-aloud: Asking users to verbalize their thoughts, actions, and reasoning while performing tasks.

Causes

  • Need to understand user cognition and decision-making during interaction.

Goals / Objectives

  • Reveal how users interpret interface elements.
  • Identify confusion, frustration, or workarounds in real time.

Importance

  • Uncovers the “why” behind user behavior, not just the “what.”

Benefits

  • Simple to implement.
  • Provides rich qualitative insights.
  • Shows actual system use patterns.

Procedures

  • Ask users to speak continuously while using the system (“Say what you’re doing, thinking, and feeling”).
  • Avoid leading questions or interruptions.
  • Record sessions for later analysis.

Advantages & Disadvantages

  • Advantages:
    • Simple and low-tech
    • Reveals real usage and thought processes
    • Can show which features are used/forgotten
  • Disadvantages:
    • Subjective and selective (users may filter thoughts)
    • Describing actions may alter natural performance (Hawthorne Effect)

Impact / Effect

  • Helps identify confusing interfaces, error reactions, and task bottlenecks.

Examples

  • User says, “I don’t know what this icon means—maybe it’s settings?” while hovering over a gear icon.
  • User takes 2 minutes to find the “save” button, muttering, “It should be at the top…”

Query Techniques: Interviews and Questionnaires

Definition

  • Interviews: One-on-one conversations to explore user experiences in depth.
  • Questionnaires: Structured sets of written questions given to users to collect standardized feedback.

Causes

  • Need to gather subjective opinions, attitudes, or retrospective feedback.

Goals / Objectives

  • Elicit user views, preferences, and unanticipated problems.
  • Collect scalable, quantifiable satisfaction data.

Importance

  • Complements behavioral data with user perceptions and emotions.

Benefits

  • Interviews: Flexible, exploratory, rich in detail.
  • Questionnaires: Efficient, systematic, suitable for large samples.

Procedures

  • Interviews: Conduct face-to-face or by phone; adapt questions based on responses.
  • Questionnaires: Design questions carefully, pre-test, distribute, and analyze responses.

Advantages & Disadvantages

  • Interviews:
    • Advantages: Adaptable, in-depth, reveals hidden issues
    • Disadvantages: Time-consuming, subjective, users may withhold criticism
  • Questionnaires:
    • Advantages: Quick, reaches many users, systematic analysis
    • Disadvantages: Less flexible, hard to probe, risk of low response rates

Impact / Effect

  • Interviews uncover “why” behind behaviors; questionnaires quantify “how many” feel a certain way.

Examples

  • Interview: Asking a teacher, “What frustrated you most about the grading interface?”
  • Questionnaire: Using the System Usability Scale (SUS) to score overall satisfaction.

Questionnaire Design and Formats

Definition

  • The process of creating structured survey instruments to measure user attitudes, perceptions, or experiences.

Causes

  • Need for standardized, scalable user feedback across many participants.

Goals / Objectives

  • Gather reliable, comparable data on usability, satisfaction, or preferences.

Importance

  • Poorly designed questionnaires yield misleading or unusable data.

Benefits

  • Enables statistical analysis and benchmarking.
  • Supports summative evaluation and product comparisons.

Procedures

  • Choose question format (closed, open, Likert scale, semantic differential).
  • Follow design principles: clarity, neutrality, relevance, brevity.
  • Pre-test the questionnaire.
  • Limit length (e.g., max 2 pages of A4).

Advantages & Disadvantages

  • Closed questions:
    • Advantages: Easy to answer and analyze
    • Disadvantages: May suggest ideas users wouldn’t have considered
  • Open questions:
    • Advantages: Rich, unanticipated insights
    • Disadvantages: Hard to analyze systematically

Impact / Effect

  • Well-designed questionnaires provide actionable, quantifiable insights; poor ones waste time and mislead.

Examples

  • Likert scale: “I found the system easy to use.” (Strongly Disagree → Strongly Agree)
  • Semantic differential: Rate the system as “Slow ___ ___ ___ Fast”
  • Notable questionnaires: SUS (1986), QUIS (1988), CSUQ (1995), PUTQ (1997)

Experimental Techniques

Definition

  • Controlled studies that test hypotheses by manipulating one or more interface variables and measuring user responses.

Causes

  • Need to establish causal relationships between design choices and user performance.

Goals / Objectives

  • Test specific hypotheses (e.g., “Larger buttons reduce error rates”).
  • Isolate the effect of a single design variable.

Importance

  • Provides scientific evidence for design decisions.

Benefits

  • Objective, replicable results.
  • Enables A/B testing and optimization.

Procedures

  • Define hypothesis and variables.
  • Choose experimental design:
    • Within-groups: Each participant uses all conditions (e.g., Design A and B).
    • Between-groups: Each participant uses only one condition.
  • Measure behavioral outcomes (e.g., task time, errors).
  • Use statistical analysis (e.g., t-tests) to interpret results.

Advantages & Disadvantages

  • Within-groups:
    • Advantages: Fewer participants, controls for user variation
    • Disadvantages: Risk of learning/transfer effects
  • Between-groups:
    • Advantages: No learning bias
    • Disadvantages: Needs more users; user differences may skew results

Impact / Effect

  • Supports data-driven design with high confidence in results.

Examples

  • A/B testing: Show Design A to even-numbered website visitors, Design B to odd-numbered; compare click-through rates.
  • Testing two font sizes to see which leads to faster reading comprehension.

Physiological Methods

Definition

  • Evaluation techniques that measure users’ physical responses (e.g., eye movements, heart rate) to infer cognitive or emotional states.

Causes

  • Need to capture implicit or unconscious user reactions not expressed verbally.

Goals / Objectives

  • Detect emotional arousal, cognitive load, or attention patterns during interaction.

Importance

  • Reveals user responses that users may not be aware of or able to articulate.

Benefits

  • Objective, real-time data on user engagement and stress.
  • Complements self-reported data.

Procedures

  • Use specialized equipment to record:
    • Eye tracking: Where users look and for how long.
    • Galvanic Skin Response (GSR): Sweat gland activity indicating arousal.
    • Electromyogram (EMG): Muscle activity (e.g., facial expressions).
    • Electroencephalogram (EEG): Brain activity.
    • Heart rate / blood pressure: Stress or engagement levels.

Advantages & Disadvantages

  • Advantages:
    • Captures unconscious responses
    • Continuous, objective data stream
  • Disadvantages:
    • Equipment is expensive and complex
    • Difficult to interpret physiological signals accurately
    • Requires more research for reliable HCI applications

Impact / Effect

  • Emerging area with potential for deeper UX insights, but not yet mainstream in standard usability practice.

Examples

  • Using eye tracking to see if users notice a “Subscribe” button.
  • Measuring GSR spikes when users encounter an error message.

Guidelines for Conducting Laboratory Usability Testing

Definition

  • Best practices for planning, executing, and concluding controlled usability tests in a lab setting.

Causes

  • Need to ensure ethical, reliable, and valid testing sessions.

Goals / Objectives

  • Create a comfortable, unbiased environment that yields honest user behavior.
  • Protect participant rights and data confidentiality.

Importance

  • Poorly run tests produce unreliable data and harm user trust.

Benefits

  • Increases data quality and participant cooperation.
  • Minimizes observer bias and Hawthorne Effect.

Procedures

  • Before:
    • Define objectives and tasks.
    • Pre-test all materials.
    • Explain the purpose: evaluating the system, not the user.
    • Obtain informed consent and assure confidentiality.
    • Explain monitoring equipment.
  • During:
    • Keep atmosphere relaxed.
    • Be unobtrusive.
    • Give tasks one at a time.
    • Never show displeasure or guide users.
    • Allow users to stop anytime.
  • After:
    • Answer user questions.
    • Analyze and report findings in relation to original objectives.

Advantages & Disadvantages

  • Not explicitly contrasted in notes.

Impact / Effect

  • Ethical, well-run sessions yield honest, useful data and positive participant experiences.

Examples

  • Telling a user: “We’re testing the software, not you—any problems are the system’s fault.”
  • Stopping a session when a user becomes visibly frustrated.