Significance Levels, Critical Values, and Test Statistics

Learning Objectives

Why This Matters

Every time the FDA evaluates whether a new drug works, the decision comes down to four numbers: a significance level set before the trial begins, critical values computed from that threshold, a test statistic calculated from patient data, and a p-value that measures how surprising the results would be if the drug did nothing. The same four-number framework decides which features ship at Netflix, which manufacturing batches Boeing approves for flight, and whether a bank flags a transaction as potential fraud. This simulation shows you how those four pieces connect -- and why changing any one of them can flip the decision.

How to Use This Simulation

  1. Drag the α slider and watch the rejection region grow or shrink in real time -- the shaded area always equals α.
  2. Drag the test statistic slider and watch the p-value shading change. Cross the critical value boundary to see the decision flip.
  3. Toggle the test type to see how left-tailed, right-tailed, and two-tailed tests place the rejection region differently.
  4. Check the Explanation Panel below -- it updates as you interact and tells you why the numbers change.
0.05
0.00
Decision Fail to Reject H₀
Threshold
--
Evidence
--
Conclusion
--

What's Happening

Quick Check

A researcher computes a p-value of 0.03 for a hypothesis test with α = 0.05. A classmate says, "That means there's only a 3% chance the null hypothesis is true." Is the classmate correct?

Try This

Load the "Drug Trial Efficacy" preset (right-tailed, α = 0.05, z = 2.10). Using the simulation's displays, answer: (1) What is the critical value? (2) Is the test statistic in the rejection region? (3) What is the p-value? (4) Compare p to α and state the decision. Verify your answers against the results panel.

Then change α to 0.01 without moving the test statistic. Does the decision change? Why?

A coffee chain claims its average drive-through time is 180 seconds. You sample 36 drive-throughs and find x̄ = 188 seconds. The population standard deviation is known: σ = 24 seconds. Test the claim that the true mean exceeds 180 seconds at α = 0.05.

(1) Compute the test statistic using z = (x̄ − μ₀) / (σ / √n). (2) Set the simulation to right-tailed and enter your test statistic. Read the critical value and p-value from the displays. (3) State the decision using both the critical value approach (compare z to z*) AND the p-value approach (compare p to α). (4) Confirm in one sentence that both approaches reach the same conclusion.

Two quality engineers at a battery factory test the same production batch. Engineer A uses α = 0.10; Engineer B uses α = 0.01. Both compute the same test statistic: z = −2.15 for a left-tailed test. Enter these values into the simulation for each engineer's α.

Determine each engineer's critical value and decision. Then explain in two sentences why the same test statistic leads to different decisions, and which engineer is being more cautious about falsely flagging a good batch versus missing a bad one.

Instructor Notes

Teaching Notes

This simulation works best when you let students drag the α slider first before introducing critical values or p-values formally. Have them predict what will happen when α increases -- most will guess "the test becomes more accurate" rather than "the rejection region grows." That prediction error is the entry point for understanding that α is a threshold choice with tradeoffs, not a measure of test quality.

The knife-edge preset (Clinical Threshold, z = 1.96 at α = 0.05) is an excellent discussion starter. Ask students whether z = 1.96 and z = 1.95 represent meaningfully different evidence. This primes the "p < 0.05 is a convention, not a law of nature" insight that runs throughout modern statistical practice.

Common Student Errors

Discussion Questions

Exam Connection

Typical exam questions give α, a test type, and either a test statistic or a p-value, then ask students to state the decision. The simulation directly practices both decision paths: comparing z to z* and comparing p to α. Emphasize that both approaches always agree -- exam questions may require one or both. The Stretch challenge previews the full test statistic computation (Sim 27's territory), giving students a preview of exam-format problems.