Hypothesis Test for Mean -- P-Value Approach

Learning Objectives

Why This Matters

Every FDA drug approval, every Netflix A/B test, every published clinical trial -- the go/no-go decision comes down to a p-value. When a landmark replication project found that over half of "significant" psychology results couldn't be reproduced, much of the blame fell on researchers and readers who misunderstood what p < 0.05 actually means. Learn to read this number correctly and you can evaluate any research finding, any test report, any trial result with real comprehension instead of false confidence.

How to Use This Simulation

  1. Enter sample data (x̄, n, σ, μ₀) or choose a preset scenario and watch the p-value area shade on the standard normal curve in real time.
  2. Toggle the test type (left, right, two-tailed) and observe how the p-value calculation and shaded region change.
  3. Drag the α slider to change the significance level and see whether the decision flips between reject and fail to reject.
  4. Click "Show Critical Value Comparison" to verify that both decision approaches always reach the same conclusion.

0.05

Hypothesis Test Workflow

1
State Hypotheses
2
Test Type & Significance
3
Test Statistic
4
P-Value
5
Decision
6
Conclusion

P-Value Approach

Critical Value Approach

Test Statistic (z)
--
P-Value
--
Decision
--

What's Happening

Quick Check

A pharmaceutical company tests a new blood pressure medication on 50,000 patients. The test produces a p-value of 0.0014 and rejects H₀ at α = 0.05. The medication lowered average systolic blood pressure by 0.3 mmHg compared to the placebo. A doctor reviewing the study says: "The p-value is extremely small, so this drug clearly has a major clinical impact." What is wrong with the doctor's reasoning?

Try This

A quality control manager tests whether a coffee chain's average wait time has changed from the claimed 4.0 minutes. The test statistic is z = 2.53 (two-tailed test, α = 0.05).

Compute the p-value by hand: p = 2 × P(Z > 2.53). Using a standard normal table, P(Z > 2.53) ≈ 0.0057, so p ≈ 0.0114. Enter x̄ = 4.8, n = 40, σ = 2.0, μ₀ = 4.0 in the simulation and verify your answer. State whether you reject or fail to reject H₀ and write the conclusion in one sentence.

A ride-share company claims its average pickup time is 5.0 minutes. A consumer group surveys 36 rides and measures an average of 5.4 minutes (σ = 1.5 minutes). The consumer group suspects pickup times are longer than claimed.

(1) State H₀ and Hₐ. (2) Identify the test type. (3) Compute the test statistic: z = (x̄ − μ₀) / (σ/√n). (4) Compute the p-value. (5) Compare p to α = 0.05 and state the decision. (6) Write the conclusion two ways: statistical and plain English. Enter the values into the simulation to verify each step. In one sentence, explain why a two-tailed version of this same test would produce a larger p-value.

A food delivery app tests a new routing algorithm on 50 deliveries. The current claimed average delivery time is 35.0 minutes. With the new algorithm, the sample average is 33.5 minutes (σ = 6.0 minutes, left-tailed test, α = 0.05).

Compute the test statistic and p-value. (1) State the formal statistical decision. (2) The delivery time improvement is 1.5 minutes. In two sentences, describe what this conclusion does and does not say about whether the company should deploy the algorithm nationwide. (3) If the company ran the same test with n = 5,000 deliveries instead of 50, predict what would happen to the p-value and explain whether a smaller p-value would change your practical recommendation.

Instructor Notes

Teaching Notes

This simulation is most effective when students first attempt to state what the p-value "means" before interacting. Write their definitions on the board, then let them drag x̄ and watch the purple-striped area grow and shrink. Most students arrive with the "probability that H₀ is true" misconception, and watching the area change while H₀ stays fixed creates immediate cognitive dissonance -- the entry point for correction.

The "Show Critical Value Comparison" toggle is deliberately hidden by default. Let students work through the p-value approach first, then reveal the comparison to show that both methods always agree. This sequence prevents students from seeing the p-value approach as "just another version of the critical value test" and instead builds it as a standalone framework that happens to be equivalent.

The "Streaming Engagement" preset is designed to trigger the "small p = large effect" misconception. Load it last and ask: "Should the company ship this algorithm change?" The 18-second improvement with a decisive rejection is the perfect setup for a class discussion about statistical vs practical significance.

Common Student Errors

Discussion Questions

Exam Connection

Typical exam questions present sample data (x̄, n, σ, μ₀) and ask students to (1) state hypotheses, (2) compute the test statistic, (3) find the p-value, (4) compare to α, and (5) state the conclusion two ways. The simulation's workflow panel mirrors this exact sequence. The Stretch challenge directly practices the full procedure. For conceptual questions about p-value interpretation, the Quick Check targets the "small p = large effect" misconception, which appears frequently in exam distractor options.