Significance Levels, Critical Values, and Test Statistics

Learning Objectives

Understand the significance level and rejection region
Understand and interpret critical values
Understand and interpret the test statistic and p-value

Why This Matters

Every time the FDA evaluates whether a new drug works, the decision comes down to four numbers: a significance level set before the trial begins, critical values computed from that threshold, a test statistic calculated from patient data, and a p-value that measures how surprising the results would be if the drug did nothing. The same four-number framework decides which features ship at Netflix, which manufacturing batches Boeing approves for flight, and whether a bank flags a transaction as potential fraud. This simulation shows you how those four pieces connect -- and why changing any one of them can flip the decision.

How to Use This Simulation

Drag the α slider and watch the rejection region grow or shrink in real time -- the shaded area always equals α.
Drag the test statistic slider and watch the p-value shading change. Cross the critical value boundary to see the decision flip.
Toggle the test type to see how left-tailed, right-tailed, and two-tailed tests place the rejection region differently.
Check the Explanation Panel below -- it updates as you interact and tells you why the numbers change.

Test Type:

Left-tailed (Hₐ: μ < μ₀) Right-tailed (Hₐ: μ > μ₀) Two-tailed (Hₐ: μ ≠ μ₀)

Scenario:

α (significance level):

0.05

Test statistic (z):

0.00

Decision Fail to Reject H₀

Threshold

Evidence

Conclusion

What's Happening

Quick Check

A researcher computes a p-value of 0.03 for a hypothesis test with α = 0.05. A classmate says, "That means there's only a 3% chance the null hypothesis is true." Is the classmate correct?

Try This

Load the "Drug Trial Efficacy" preset (right-tailed, α = 0.05, z = 2.10). Using the simulation's displays, answer: (1) What is the critical value? (2) Is the test statistic in the rejection region? (3) What is the p-value? (4) Compare p to α and state the decision. Verify your answers against the results panel.

Then change α to 0.01 without moving the test statistic. Does the decision change? Why?

A coffee chain claims its average drive-through time is 180 seconds. You sample 36 drive-throughs and find x̄ = 188 seconds. The population standard deviation is known: σ = 24 seconds. Test the claim that the true mean exceeds 180 seconds at α = 0.05.

(1) Compute the test statistic using z = (x̄ − μ₀) / (σ / √n). (2) Set the simulation to right-tailed and enter your test statistic. Read the critical value and p-value from the displays. (3) State the decision using both the critical value approach (compare z to z*) AND the p-value approach (compare p to α). (4) Confirm in one sentence that both approaches reach the same conclusion.

Two quality engineers at a battery factory test the same production batch. Engineer A uses α = 0.10; Engineer B uses α = 0.01. Both compute the same test statistic: z = −2.15 for a left-tailed test. Enter these values into the simulation for each engineer's α.

Determine each engineer's critical value and decision. Then explain in two sentences why the same test statistic leads to different decisions, and which engineer is being more cautious about falsely flagging a good batch versus missing a bad one.

Instructor Notes

Teaching Notes

This simulation works best when you let students drag the α slider first before introducing critical values or p-values formally. Have them predict what will happen when α increases -- most will guess "the test becomes more accurate" rather than "the rejection region grows." That prediction error is the entry point for understanding that α is a threshold choice with tradeoffs, not a measure of test quality.

The knife-edge preset (Clinical Threshold, z = 1.96 at α = 0.05) is an excellent discussion starter. Ask students whether z = 1.96 and z = 1.95 represent meaningfully different evidence. This primes the "p < 0.05 is a convention, not a law of nature" insight that runs throughout modern statistical practice.

Common Student Errors

Treating α and the p-value as interchangeable. Both are probabilities, but α is chosen before data collection and the p-value is computed from data. The comparison p vs α is the decision rule, not an equivalence.
Confusing the critical value (z*) with the test statistic (z). Both live on the same z-axis, which creates notation confusion. Emphasize: z* is the boundary (from α), z is the evidence (from data).
Believing that "fail to reject H₀" means "H₀ is true." The test only evaluates whether there is sufficient evidence against H₀, not whether H₀ is correct.
Assuming a smaller α is always better. Smaller α reduces Type I errors but increases Type II errors. Sim 26 formalizes this tradeoff.

Discussion Questions

If you were deciding whether a new drug should go to market, would you use α = 0.10 or α = 0.01? What are you trading off with each choice?
Two researchers analyze the same data. One uses α = 0.05 and rejects H₀. The other uses α = 0.01 and fails to reject. Did they get different results, or different conclusions from the same result?
A news article reports "the result was statistically significant (p = 0.049)." Another study on the same topic reports "the result was not significant (p = 0.051)." How different is the actual evidence between these two studies?

Exam Connection

Typical exam questions give α, a test type, and either a test statistic or a p-value, then ask students to state the decision. The simulation directly practices both decision paths: comparing z to z* and comparing p to α. Emphasize that both approaches always agree -- exam questions may require one or both. The Stretch challenge previews the full test statistic computation (Sim 27's territory), giving students a preview of exam-format problems.