Analyzing Hypothesis Tests in Research
Learning Objective
- Analyze a hypothesis test and make conclusions
Why This Matters
Every news headline about a study that "proves" a treatment works, every company blog claiming an A/B test "reached significance," every clinical trial summary your doctor reads -- all of them report hypothesis test conclusions, and most leave out critical context. The skill you're building here -- extracting what was actually tested from how it was reported -- is the difference between consuming research passively and evaluating it.
How to Use This Simulation
- Select a research scenario from the dropdown and read the report excerpt carefully.
- Work through the five analysis steps in order -- each step unlocks after you answer the previous one correctly.
- After completing all steps, expand View on Curve to see the hypothesis test's geometry on the normal distribution.
- Check the Explanation Panel below -- it updates at each step and connects your analysis to the concepts from prior simulations.
Research Report Excerpt
What type of population parameter is this study making a claim about?
Which pair of hypotheses matches the research question in this report?
Based on the alternative hypothesis, what type of test is this?
What significance level (α) does this study use?
Which conclusion is best supported by the evidence reported?
What's Happening
Quick Check
A market researcher collects survey data and finds that customer satisfaction scores are higher for Product A than Product B. She then runs a one-tailed test (Hₐ: μₐ > μᴵ) and reports p = 0.04, α = 0.05. She concludes: "Product A produces significantly higher satisfaction." What is the primary methodological concern with this analysis?
Try This
A university dining services report states: "We tested whether the average wait time in our renovated cafeteria is less than the old average of 8 minutes. With a sample of 40 students, we found a mean wait time of 7.2 minutes (z = −2.04, p = 0.0207, α = 0.05). We concluded that the renovation significantly reduced wait times."
Use the five-step analysis framework to verify each component: identify the parameter, state H₀ and Hₐ, confirm the test type, confirm α, and evaluate whether the stated conclusion is appropriate. Do all five components align with each other?
A fitness app company publishes a blog post: "Our new AI coaching feature led to significantly more workouts per week among users who opted in (p < 0.05)." The post does not state H₀, Hₐ, the test type, the sample size, or the actual difference in workouts per week.
Tasks: (1) Infer H₀ from context. (2) Identify the likely test type from the directional language "led to more." (3) Identify the parameter being tested. (4) Evaluate whether the conclusion "significantly more" distinguishes between statistical and practical significance. In one sentence, name one piece of additional information that would most strengthen this analysis.
A health insurance company presents to its board: "Our employee wellness program participants had significantly lower annual healthcare costs than non-participants (z = 2.12, p = 0.034, α = 0.05, n = 15,000 per group). We recommend expanding the program company-wide." The average cost difference was $18 per employee per year, on a baseline of approximately $6,400.
Tasks: (1) Verify the formal conclusion is correct (p < α → reject H₀). (2) Calculate the cost difference as a percentage of baseline ($18 / $6,400). (3) The company would spend $200 per employee to expand the program. Write a two-sentence recommendation to the board that names the statistical conclusion AND addresses whether the $18 savings justifies the $200 investment.
Instructor Notes
Teaching Notes
This simulation works best as a bridge between textbook hypothesis testing and real-world research reading. Students who ace computation problems often stumble when asked to extract H₀ and Hₐ from a paragraph of prose. The stepwise analysis framework gives them a repeatable protocol: parameter, hypotheses, test type, α, conclusion.
Scenario 2 (Tech A/B Test) is the most important for classroom discussion because it introduces the statistical vs practical significance distinction. Ask students: "The test says the new algorithm is significantly better. Would you spend $500,000 to implement a 0.6-minute improvement?" That question lands harder than any formal definition of effect size.
Common Student Errors
- Confusing "fail to reject H₀" with "accept H₀." Scenario 4 is designed to surface this. The researchers concluded "the regulation had no effect" -- an overstatement. Students need to hear that insufficient evidence is not the same as evidence of no effect.
- Choosing the test type based on what the data show rather than what the research question asks. The test direction comes from the research question, formulated before data collection.
- Inferring α = 0.05 by default without checking whether the report specifies a different level. Scenario 1 uses α = 0.01 explicitly.
- Treating statistical significance as a binary proof rather than evidence. "The drug works" is not what rejecting H₀ means. "There is sufficient evidence that the drug lowers blood pressure" is the correct framing.
Discussion Questions
- A study with 500,000 participants finds a statistically significant effect (p = 0.001) of a vitamin supplement on lifespan, but the average increase is 0.3 days. Should the supplement be recommended? Why does sample size matter here?
- Two studies test the same drug. Study A (n = 30) finds p = 0.06 and fails to reject H₀. Study B (n = 3,000) finds p = 0.002 and rejects H₀. Did the drug "work" in Study B but "not work" in Study A? What does this tell you about the role of sample size?
- A news headline reads: "Scientists prove coffee causes cancer." What questions would you ask before believing this headline? How would you apply the five-step analysis to the underlying study?
Exam Connection
Typical exam questions present a hypothesis test scenario and ask students to identify the correct conclusion, explain what a p-value means in context, or determine whether a researcher's stated conclusion is valid. This simulation directly practices all three skills. The Challenge tier also previews the practical-significance reasoning that appears in more advanced coursework.