Chapter 8: Evaluation
Purpose and Importance of Evaluation
Definition
- Evaluation is the process of assessing the usability of a system or interface to determine how well it supports users in achieving their goals.
Causes
- Need to validate design decisions and ensure systems meet real user needs rather than imagined ones.
Goals / Objectives
- Determine how usable a system is for different user groups.
- Identify good and bad design features to inform future iterations.
- Compare design alternatives to support decision-making.
- Observe the effects of specific interfaces on users.
- Validate that users can use the system and will like it.
Importance
- Ensures people can actually use the product effectively and enjoyably.
- Helps designers focus on real problems instead of assumptions.
- Provides a basis for suggesting concrete improvements.
- Prevents shipping products with unresolved usability issues.
Benefits
- Early identification of usability flaws.
- Data-driven design refinement.
- Improved user satisfaction and adoption.
Procedures
- Conducted throughout the HCI design lifecycle.
- Can involve expert analysis, user testing, observation, or questionnaires.
- Applied to both early (low-fidelity) and final (high-fidelity) designs.
Advantages & Disadvantages
- Not specified in notes as a direct comparison for evaluation in general.
Impact / Effect
- Leads to more user-centered, effective, and satisfying systems.
- Reduces post-release support costs and user frustration.
Examples
- Testing a prototype with users to see if they can complete a registration task.
- Comparing two menu layouts to see which is faster to use.
Definition
- Formative evaluation: Conducted during the design and development phase to improve the product iteratively.
- Summative evaluation: Conducted on a finished product to assess its overall usability and performance.
Causes
- Different stages of the design lifecycle require different evaluation goals—improvement vs. validation.
Goals / Objectives
- Formative: Inform and guide ongoing design decisions.
- Summative: Measure final usability against benchmarks or requirements.
Importance
- Formative evaluation prevents costly late-stage redesigns.
- Summative evaluation provides evidence of product readiness or success.
Benefits
- Formative: Enables early problem detection and user involvement.
- Summative: Supports go/no-go decisions and competitive benchmarking.
Procedures
- Formative: Involves designers and/or users during prototyping; uses methods like think-aloud, expert reviews, or quick usability tests.
- Summative: Typically involves real users testing a complete system under controlled or real-world conditions.
Advantages & Disadvantages
- Not explicitly contrasted in notes, but implied by context.
Impact / Effect
- Formative evaluation shapes better designs; summative evaluation validates them.
Examples
- Formative: Testing a paper prototype with 5 users to refine navigation.
- Summative: Running a final usability test with 20 users before product launch.
Evaluation Methods: Expert Analysis vs. User Participation
Definition
- Expert analysis: Usability review performed by HCI experts without involving real users.
- User participation: Evaluation that involves real users interacting with the system.
Causes
- Resource constraints, project stage, or evaluation goals determine method choice.
Goals / Objectives
- Expert analysis: Identify likely usability problems quickly and cheaply.
- User participation: Observe actual user behavior and gather authentic feedback.
Importance
- Expert analysis is efficient for early-stage screening.
- User participation reveals real-world usage patterns and emotional responses.
Benefits
- Expert analysis: Fast, inexpensive, no lab or users needed.
- User participation: Provides empirical, behavioral, and attitudinal data.
Procedures
- Expert analysis: Experts inspect designs using methods like heuristic evaluation or cognitive walkthroughs.
- User participation: Users perform tasks while being observed, interviewed, or surveyed.
Advantages & Disadvantages
- Expert analysis:
- Advantages: Low cost, applicable early, no user recruitment needed.
- Disadvantages: Does not assess actual use; limited to expert judgment.
- User participation:
- Advantages: Captures real behavior and subjective experience.
- Disadvantages: More time-consuming, expensive, and logistically complex.
Impact / Effect
- Expert analysis catches common flaws early; user testing uncovers unexpected issues.
Examples
- Expert analysis: A UX designer reviewing a wireframe against Nielsen’s heuristics.
- User participation: Observing users struggle to find the “checkout” button on an e-commerce site.
Laboratory Studies
Definition
- Controlled usability tests conducted in a specialized lab environment, separate from the user’s normal context.
Causes
- Need for precise control over variables, use of specialized equipment, or isolation from distractions.
Goals / Objectives
- Observe user interaction under consistent, repeatable conditions.
- Compare alternative designs objectively.
Importance
- Enables detailed data collection (e.g., video, eye tracking) not feasible in the field.
Benefits
- Access to specialist equipment (e.g., cameras, eye trackers).
- Uninterrupted observation and recording.
- Controlled comparison of design variants.
Procedures
- Recruit voluntary participants.
- Set up lab with video monitoring (e.g., one camera on screen, one on user).
- Observers watch behind a one-way mirror.
- Follow structured guidelines before, during, and after testing (see “Guidelines” subtopic).
Advantages & Disadvantages
- Advantages:
- Specialist equipment available
- Uninterrupted environment
- Disadvantages:
- Lacks real-world context
- Difficult to observe group or collaborative tasks
Impact / Effect
- May produce artificial behavior due to unnatural setting (Hawthorne Effect).
- Best for single-user, constrained tasks.
Examples
- Testing a medical device interface in a simulated clinical lab.
- Comparing two login flows using eye-tracking in a usability lab.
Field Studies
Definition
- Evaluation conducted in the user’s natural environment (e.g., office, home, public space).
Causes
- When context of use is critical to understanding behavior (e.g., noise, interruptions, tools).
Goals / Objectives
- Observe how users interact with a system in real-life conditions.
- Capture authentic workflows and environmental influences.
Importance
- Reveals how context shapes use—something labs cannot replicate.
Benefits
- Retains natural work context.
- Uncovers real-world challenges (e.g., distractions, multitasking).
Procedures
- Observe users performing tasks in their usual setting.
- Use techniques like observation, think-aloud, or interviews.
- Record data via notes, audio, or video (with consent).
Advantages & Disadvantages
- Advantages:
- Natural environment
- Context retained
- Disadvantages:
- Distractions and noise
- Harder to control variables
Impact / Effect
- Provides ecologically valid insights but may be harder to analyze due to variability.
Examples
- Watching nurses use a tablet during patient rounds in a hospital.
- Observing retail staff using a point-of-sale system during a busy shift.
Think-Aloud and Observation
Definition
- Observation: Watching users perform tasks without interference.
- Think-aloud: Asking users to verbalize their thoughts, actions, and reasoning while performing tasks.
Causes
- Need to understand user cognition and decision-making during interaction.
Goals / Objectives
- Reveal how users interpret interface elements.
- Identify confusion, frustration, or workarounds in real time.
Importance
- Uncovers the “why” behind user behavior, not just the “what.”
Benefits
- Simple to implement.
- Provides rich qualitative insights.
- Shows actual system use patterns.
Procedures
- Ask users to speak continuously while using the system (“Say what you’re doing, thinking, and feeling”).
- Avoid leading questions or interruptions.
- Record sessions for later analysis.
Advantages & Disadvantages
- Advantages:
- Simple and low-tech
- Reveals real usage and thought processes
- Can show which features are used/forgotten
- Disadvantages:
- Subjective and selective (users may filter thoughts)
- Describing actions may alter natural performance (Hawthorne Effect)
Impact / Effect
- Helps identify confusing interfaces, error reactions, and task bottlenecks.
Examples
- User says, “I don’t know what this icon means—maybe it’s settings?” while hovering over a gear icon.
- User takes 2 minutes to find the “save” button, muttering, “It should be at the top…”
Query Techniques: Interviews and Questionnaires
Definition
- Interviews: One-on-one conversations to explore user experiences in depth.
- Questionnaires: Structured sets of written questions given to users to collect standardized feedback.
Causes
- Need to gather subjective opinions, attitudes, or retrospective feedback.
Goals / Objectives
- Elicit user views, preferences, and unanticipated problems.
- Collect scalable, quantifiable satisfaction data.
Importance
- Complements behavioral data with user perceptions and emotions.
Benefits
- Interviews: Flexible, exploratory, rich in detail.
- Questionnaires: Efficient, systematic, suitable for large samples.
Procedures
- Interviews: Conduct face-to-face or by phone; adapt questions based on responses.
- Questionnaires: Design questions carefully, pre-test, distribute, and analyze responses.
Advantages & Disadvantages
- Interviews:
- Advantages: Adaptable, in-depth, reveals hidden issues
- Disadvantages: Time-consuming, subjective, users may withhold criticism
- Questionnaires:
- Advantages: Quick, reaches many users, systematic analysis
- Disadvantages: Less flexible, hard to probe, risk of low response rates
Impact / Effect
- Interviews uncover “why” behind behaviors; questionnaires quantify “how many” feel a certain way.
Examples
- Interview: Asking a teacher, “What frustrated you most about the grading interface?”
- Questionnaire: Using the System Usability Scale (SUS) to score overall satisfaction.
Definition
- The process of creating structured survey instruments to measure user attitudes, perceptions, or experiences.
Causes
- Need for standardized, scalable user feedback across many participants.
Goals / Objectives
- Gather reliable, comparable data on usability, satisfaction, or preferences.
Importance
- Poorly designed questionnaires yield misleading or unusable data.
Benefits
- Enables statistical analysis and benchmarking.
- Supports summative evaluation and product comparisons.
Procedures
- Choose question format (closed, open, Likert scale, semantic differential).
- Follow design principles: clarity, neutrality, relevance, brevity.
- Pre-test the questionnaire.
- Limit length (e.g., max 2 pages of A4).
Advantages & Disadvantages
- Closed questions:
- Advantages: Easy to answer and analyze
- Disadvantages: May suggest ideas users wouldn’t have considered
- Open questions:
- Advantages: Rich, unanticipated insights
- Disadvantages: Hard to analyze systematically
Impact / Effect
- Well-designed questionnaires provide actionable, quantifiable insights; poor ones waste time and mislead.
Examples
- Likert scale: “I found the system easy to use.” (Strongly Disagree → Strongly Agree)
- Semantic differential: Rate the system as “Slow ___ ___ ___ Fast”
- Notable questionnaires: SUS (1986), QUIS (1988), CSUQ (1995), PUTQ (1997)
Experimental Techniques
Definition
- Controlled studies that test hypotheses by manipulating one or more interface variables and measuring user responses.
Causes
- Need to establish causal relationships between design choices and user performance.
Goals / Objectives
- Test specific hypotheses (e.g., “Larger buttons reduce error rates”).
- Isolate the effect of a single design variable.
Importance
- Provides scientific evidence for design decisions.
Benefits
- Objective, replicable results.
- Enables A/B testing and optimization.
Procedures
- Define hypothesis and variables.
- Choose experimental design:
- Within-groups: Each participant uses all conditions (e.g., Design A and B).
- Between-groups: Each participant uses only one condition.
- Measure behavioral outcomes (e.g., task time, errors).
- Use statistical analysis (e.g., t-tests) to interpret results.
Advantages & Disadvantages
- Within-groups:
- Advantages: Fewer participants, controls for user variation
- Disadvantages: Risk of learning/transfer effects
- Between-groups:
- Advantages: No learning bias
- Disadvantages: Needs more users; user differences may skew results
Impact / Effect
- Supports data-driven design with high confidence in results.
Examples
- A/B testing: Show Design A to even-numbered website visitors, Design B to odd-numbered; compare click-through rates.
- Testing two font sizes to see which leads to faster reading comprehension.
Physiological Methods
Definition
- Evaluation techniques that measure users’ physical responses (e.g., eye movements, heart rate) to infer cognitive or emotional states.
Causes
- Need to capture implicit or unconscious user reactions not expressed verbally.
Goals / Objectives
- Detect emotional arousal, cognitive load, or attention patterns during interaction.
Importance
- Reveals user responses that users may not be aware of or able to articulate.
Benefits
- Objective, real-time data on user engagement and stress.
- Complements self-reported data.
Procedures
- Use specialized equipment to record:
- Eye tracking: Where users look and for how long.
- Galvanic Skin Response (GSR): Sweat gland activity indicating arousal.
- Electromyogram (EMG): Muscle activity (e.g., facial expressions).
- Electroencephalogram (EEG): Brain activity.
- Heart rate / blood pressure: Stress or engagement levels.
Advantages & Disadvantages
- Advantages:
- Captures unconscious responses
- Continuous, objective data stream
- Disadvantages:
- Equipment is expensive and complex
- Difficult to interpret physiological signals accurately
- Requires more research for reliable HCI applications
Impact / Effect
- Emerging area with potential for deeper UX insights, but not yet mainstream in standard usability practice.
Examples
- Using eye tracking to see if users notice a “Subscribe” button.
- Measuring GSR spikes when users encounter an error message.
Guidelines for Conducting Laboratory Usability Testing
Definition
- Best practices for planning, executing, and concluding controlled usability tests in a lab setting.
Causes
- Need to ensure ethical, reliable, and valid testing sessions.
Goals / Objectives
- Create a comfortable, unbiased environment that yields honest user behavior.
- Protect participant rights and data confidentiality.
Importance
- Poorly run tests produce unreliable data and harm user trust.
Benefits
- Increases data quality and participant cooperation.
- Minimizes observer bias and Hawthorne Effect.
Procedures
- Before:
- Define objectives and tasks.
- Pre-test all materials.
- Explain the purpose: evaluating the system, not the user.
- Obtain informed consent and assure confidentiality.
- Explain monitoring equipment.
- During:
- Keep atmosphere relaxed.
- Be unobtrusive.
- Give tasks one at a time.
- Never show displeasure or guide users.
- Allow users to stop anytime.
- After:
- Answer user questions.
- Analyze and report findings in relation to original objectives.
Advantages & Disadvantages
- Not explicitly contrasted in notes.
Impact / Effect
- Ethical, well-run sessions yield honest, useful data and positive participant experiences.
Examples
- Telling a user: “We’re testing the software, not you—any problems are the system’s fault.”
- Stopping a session when a user becomes visibly frustrated.