Developing Hypotheses and Understanding Possible Conclusions

Learning Objectives

Why This Matters

Every cancer screening test faces the same impossible question: set the threshold too low and you'll flag healthy patients for invasive follow-ups they don't need (Type I error), set it too high and you'll miss real tumors in patients who walk away reassured (Type II error). The same tradeoff runs through spam filters, fraud detection, criminal trials, and every A/B test in tech. There is no setting that eliminates both errors -- choosing one means accepting more of the other, and which matters more depends entirely on what's at stake.

How to Use This Simulation

  1. Drag the α slider and watch the rejection region grow or shrink on the curve -- then look at the decision matrix to see β move in the opposite direction.
  2. Drag the μtrue slider to change how far the true mean sits from the hypothesized mean and watch how effect size changes β and power.
  3. Click any cell in the decision matrix to highlight the corresponding region on the curve and read what that outcome means in context.
  4. Switch between preset scenarios to see how different test types and real-world stakes change the tradeoff.
0.01
6.0

The true population mean -- unknown in practice, but set here to see how effect size changes β.

Decision Matrix

Every hypothesis test produces one of these four outcomes. Click any cell to highlight the corresponding region on the curve.

Reality ↓   Decision →
Fail to Reject H₀
Reject H₀
H₀ is True
✓ Correct Decision
--
P = 1 − α
✗ Type I Error
--
P = α (false alarm)
Hₐ is True
✗ Type II Error
--
P = β (missed detection)
✓ Correct (Power)
--
P = 1 − β
α (Type I Rate)
--
β (Type II Rate)
--
Power (1 − β)
--
Critical Value(s)
--

What's Happening

Quick Check

A researcher runs a hypothesis test with α = 0.05. A colleague tells them: "Since α = 0.05, that means there's a 95% chance you'll correctly detect a real effect." What is wrong with this statement?

Try This

Load the "Cancer Screening" preset (right-tailed, α = 0.01, μ₀ = 5.0, μtrue = 6.0). The decision matrix shows four possible outcomes. Click the cell that represents a Type I error and read what happens on the curve. Now click the cell that represents a Type II error.

In one sentence, explain why these two errors can't both be zero at the same time, using the α and β values you see in the results panel.

Using the "Drug Trial" preset: H₀: μ = 120, Hₐ: μ ≠ 120, σ = 15, n = 50, α = 0.05, μtrue = 123. Compute β by hand:

(1) Find the critical values: z*α/2 = ±1.9600. (2) Calculate SE = 15/√50 = 2.1213. (3) Find the critical sample means: μ₀ ± z* × SE = 120 ± 1.96 × 2.1213 = 115.84 and 124.16. (4) Convert these critical values to z-scores under the alternative distribution (centered at 123): z = (115.84 − 123)/2.1213 = −3.37 and z = (124.16 − 123)/2.1213 = 0.55. (5) Look up these z-scores in the standard normal table: β = P(−3.37 < z < 0.55) = 0.7088 − 0.0004. Verify your answer against the simulation.

Then explain in one sentence why β depends on μtrue but α does not.

A hospital laboratory evaluates two diagnostic protocols for detecting an early-stage condition. Protocol A: α = 0.01, β = 0.20. Protocol B: α = 0.05, β = 0.05.

(1) Which protocol has the higher false alarm rate? Which has the higher missed-detection rate?

(2) A false alarm means the patient undergoes an unnecessary biopsy (stressful, costly, but not dangerous). A missed detection means the condition progresses six months before the next screening. Describe the patient consequence of each error type for each protocol.

(3) Recommend one protocol and defend your choice in two sentences that address the cost asymmetry.

Instructor Notes

Teaching Notes

This simulation works best when you let students drag the α slider before explaining the tradeoff. Most students assume that decreasing α makes the test "better" in every way. Watching β grow in the decision matrix as α shrinks produces immediate cognitive conflict -- the foundation for understanding the tradeoff. Follow that with the μtrue slider so students see that effect size independently modulates β.

The decision matrix is the conceptual anchor for the rest of the hypothesis testing arc. Students need to internalize that every test produces one of exactly four outcomes, two of which are errors that occur even when the test is conducted perfectly. Return to this matrix in Sims 27 and 28 when students compute test statistics and p-values.

Common Student Errors

Discussion Questions

Exam Connection

Typical exam questions present a scenario and ask students to (1) write H₀ and Hₐ, (2) identify the test type, (3) describe Type I and Type II errors in context, and (4) state which error is more serious. The decision matrix and tiered challenges directly prepare students for all four components.