Hypothesis Test for Mean -- P-Value Approach

Learning Objectives

Determine the p-value for a hypothesis test for the mean (population standard deviation known)
Make a conclusion and interpret the results of a one-mean hypothesis test (population standard deviation known) using the P-Value Approach

Why This Matters

Every FDA drug approval, every Netflix A/B test, every published clinical trial -- the go/no-go decision comes down to a p-value. When a landmark replication project found that over half of "significant" psychology results couldn't be reproduced, much of the blame fell on researchers and readers who misunderstood what p < 0.05 actually means. Learn to read this number correctly and you can evaluate any research finding, any test report, any trial result with real comprehension instead of false confidence.

How to Use This Simulation

Enter sample data (x̄, n, σ, μ₀) or choose a preset scenario and watch the p-value area shade on the standard normal curve in real time.
Toggle the test type (left, right, two-tailed) and observe how the p-value calculation and shaded region change.
Drag the α slider to change the significance level and see whether the decision flips between reject and fail to reject.
Click "Show Critical Value Comparison" to verify that both decision approaches always reach the same conclusion.

Scenario:

x̄ =

n =

σ =

μ₀ =

Test type:

α =

0.05

Hypothesis Test Workflow

State Hypotheses

Test Type & Significance

Test Statistic

P-Value

Decision

Conclusion

P-Value Approach

Critical Value Approach

Test Statistic (z)

P-Value

Decision

What's Happening

Quick Check

A pharmaceutical company tests a new blood pressure medication on 50,000 patients. The test produces a p-value of 0.0014 and rejects H₀ at α = 0.05. The medication lowered average systolic blood pressure by 0.3 mmHg compared to the placebo. A doctor reviewing the study says: "The p-value is extremely small, so this drug clearly has a major clinical impact." What is wrong with the doctor's reasoning?

Try This

A quality control manager tests whether a coffee chain's average wait time has changed from the claimed 4.0 minutes. The test statistic is z = 2.53 (two-tailed test, α = 0.05).

Compute the p-value by hand: p = 2 × P(Z > 2.53). Using a standard normal table, P(Z > 2.53) ≈ 0.0057, so p ≈ 0.0114. Enter x̄ = 4.8, n = 40, σ = 2.0, μ₀ = 4.0 in the simulation and verify your answer. State whether you reject or fail to reject H₀ and write the conclusion in one sentence.

A ride-share company claims its average pickup time is 5.0 minutes. A consumer group surveys 36 rides and measures an average of 5.4 minutes (σ = 1.5 minutes). The consumer group suspects pickup times are longer than claimed.

(1) State H₀ and Hₐ. (2) Identify the test type. (3) Compute the test statistic: z = (x̄ − μ₀) / (σ/√n). (4) Compute the p-value. (5) Compare p to α = 0.05 and state the decision. (6) Write the conclusion two ways: statistical and plain English. Enter the values into the simulation to verify each step. In one sentence, explain why a two-tailed version of this same test would produce a larger p-value.

A food delivery app tests a new routing algorithm on 50 deliveries. The current claimed average delivery time is 35.0 minutes. With the new algorithm, the sample average is 33.5 minutes (σ = 6.0 minutes, left-tailed test, α = 0.05).

Compute the test statistic and p-value. (1) State the formal statistical decision. (2) The delivery time improvement is 1.5 minutes. In two sentences, describe what this conclusion does and does not say about whether the company should deploy the algorithm nationwide. (3) If the company ran the same test with n = 5,000 deliveries instead of 50, predict what would happen to the p-value and explain whether a smaller p-value would change your practical recommendation.

Instructor Notes

Teaching Notes

This simulation is most effective when students first attempt to state what the p-value "means" before interacting. Write their definitions on the board, then let them drag x̄ and watch the purple-striped area grow and shrink. Most students arrive with the "probability that H₀ is true" misconception, and watching the area change while H₀ stays fixed creates immediate cognitive dissonance -- the entry point for correction.

The "Show Critical Value Comparison" toggle is deliberately hidden by default. Let students work through the p-value approach first, then reveal the comparison to show that both methods always agree. This sequence prevents students from seeing the p-value approach as "just another version of the critical value test" and instead builds it as a standalone framework that happens to be equivalent.

The "Streaming Engagement" preset is designed to trigger the "small p = large effect" misconception. Load it last and ask: "Should the company ship this algorithm change?" The 18-second improvement with a decisive rejection is the perfect setup for a class discussion about statistical vs practical significance.

Common Student Errors

Stating that "the p-value is the probability that H₀ is true." This is the most common error and the one the explanation panel targets directly. The p-value is computed assuming H₀ is true -- it cannot simultaneously be the probability of H₀.
Using the absolute value of z for one-tailed tests instead of the signed value. A positive z in a left-tailed test should produce a large p-value (near 1), correctly indicating no support for the left-tailed alternative. The simulation handles this without absolute-value tricks.
Confusing "fail to reject H₀" with "accept H₀." The simulation explicitly says "fail to reject" and the explanation panel distinguishes absence of evidence from evidence of absence.
Treating α = 0.05 as a natural law rather than a convention. Drag the α slider to show that the same data can lead to different decisions at different significance levels.

Discussion Questions

If two studies test the same hypothesis and one gets p = 0.049 while the other gets p = 0.051, should we draw different conclusions? What does this tell you about binary significance thresholds?
A drug company runs 20 independent studies on an ineffective drug. How many would you expect to produce p < 0.05 by chance? What does this suggest about the replication crisis?
Load the "Streaming Engagement" preset. The p-value is small and the test rejects H₀. Would you recommend the company roll out this algorithm change? What additional information would you want before deciding?

Exam Connection

Typical exam questions present sample data (x̄, n, σ, μ₀) and ask students to (1) state hypotheses, (2) compute the test statistic, (3) find the p-value, (4) compare to α, and (5) state the conclusion two ways. The simulation's workflow panel mirrors this exact sequence. The Stretch challenge directly practices the full procedure. For conceptual questions about p-value interpretation, the Quick Check targets the "small p = large effect" misconception, which appears frequently in exam distractor options.