Developing Hypotheses and Understanding Possible Conclusions

Learning Objectives

Identify the null and alternative hypotheses for an experiment with one population mean
Distinguish between one- and two-tailed hypothesis tests and understand possible conclusions
Differentiate between Type I and Type II errors when performing a hypothesis test

Why This Matters

Every cancer screening test faces the same impossible question: set the threshold too low and you'll flag healthy patients for invasive follow-ups they don't need (Type I error), set it too high and you'll miss real tumors in patients who walk away reassured (Type II error). The same tradeoff runs through spam filters, fraud detection, criminal trials, and every A/B test in tech. There is no setting that eliminates both errors -- choosing one means accepting more of the other, and which matters more depends entirely on what's at stake.

How to Use This Simulation

Drag the α slider and watch the rejection region grow or shrink on the curve -- then look at the decision matrix to see β move in the opposite direction.
Drag the μ_true slider to change how far the true mean sits from the hypothesized mean and watch how effect size changes β and power.
Click any cell in the decision matrix to highlight the corresponding region on the curve and read what that outcome means in context.
Switch between preset scenarios to see how different test types and real-world stakes change the tradeoff.

Scenario:

Test Type:

α (Significance Level):

0.01

μ_true (True Mean):

6.0

The true population mean -- unknown in practice, but set here to see how effect size changes β.

μ₀:

σ:

Decision Matrix

Every hypothesis test produces one of these four outcomes. Click any cell to highlight the corresponding region on the curve.

Reality ↓ Decision →

Fail to Reject H₀

Reject H₀

H₀ is True

✓ Correct Decision

P = 1 − α

✗ Type I Error

P = α (false alarm)

Hₐ is True

✗ Type II Error

P = β (missed detection)

✓ Correct (Power)

P = 1 − β

α (Type I Rate)

β (Type II Rate)

Power (1 − β)

Critical Value(s)

What's Happening

Quick Check

A researcher runs a hypothesis test with α = 0.05. A colleague tells them: "Since α = 0.05, that means there's a 95% chance you'll correctly detect a real effect." What is wrong with this statement?

Try This

Load the "Cancer Screening" preset (right-tailed, α = 0.01, μ₀ = 5.0, μ_true = 6.0). The decision matrix shows four possible outcomes. Click the cell that represents a Type I error and read what happens on the curve. Now click the cell that represents a Type II error.

In one sentence, explain why these two errors can't both be zero at the same time, using the α and β values you see in the results panel.

Using the "Drug Trial" preset: H₀: μ = 120, Hₐ: μ ≠ 120, σ = 15, n = 50, α = 0.05, μ_true = 123. Compute β by hand:

(1) Find the critical values: z*_α/2 = ±1.9600. (2) Calculate SE = 15/√50 = 2.1213. (3) Find the critical sample means: μ₀ ± z* × SE = 120 ± 1.96 × 2.1213 = 115.84 and 124.16. (4) Convert these critical values to z-scores under the alternative distribution (centered at 123): z = (115.84 − 123)/2.1213 = −3.37 and z = (124.16 − 123)/2.1213 = 0.55. (5) Look up these z-scores in the standard normal table: β = P(−3.37 < z < 0.55) = 0.7088 − 0.0004. Verify your answer against the simulation.

Then explain in one sentence why β depends on μ_true but α does not.

A hospital laboratory evaluates two diagnostic protocols for detecting an early-stage condition. Protocol A: α = 0.01, β = 0.20. Protocol B: α = 0.05, β = 0.05.

(1) Which protocol has the higher false alarm rate? Which has the higher missed-detection rate?

(2) A false alarm means the patient undergoes an unnecessary biopsy (stressful, costly, but not dangerous). A missed detection means the condition progresses six months before the next screening. Describe the patient consequence of each error type for each protocol.

(3) Recommend one protocol and defend your choice in two sentences that address the cost asymmetry.

Instructor Notes

Teaching Notes

This simulation works best when you let students drag the α slider before explaining the tradeoff. Most students assume that decreasing α makes the test "better" in every way. Watching β grow in the decision matrix as α shrinks produces immediate cognitive conflict -- the foundation for understanding the tradeoff. Follow that with the μ_true slider so students see that effect size independently modulates β.

The decision matrix is the conceptual anchor for the rest of the hypothesis testing arc. Students need to internalize that every test produces one of exactly four outcomes, two of which are errors that occur even when the test is conducted perfectly. Return to this matrix in Sims 27 and 28 when students compute test statistics and p-values.

Common Student Errors

Believing β = 1 − α. The most common error. Have students set α = 0.05 and read β from the display -- it's almost never 0.95.
Treating "fail to reject" as "accept H₀." When Hₐ is true and we fail to reject, we've made a Type II error, not confirmed H₀.
Assuming a smaller α is always better. The Cancer Screening preset with α = 0.01 shows high β for small effects.
Confusing β (Type II error rate) with the regression coefficient β. Context should make the usage clear.

Discussion Questions

In a criminal trial, the standard is "beyond reasonable doubt." Does this correspond to a large or small α? What does that imply about β?
A pharmaceutical company can choose α = 0.01 or α = 0.05 for a drug trial. Who benefits from each choice?
If you could increase the sample size without limit, what happens to both α and β?

Exam Connection

Typical exam questions present a scenario and ask students to (1) write H₀ and Hₐ, (2) identify the test type, (3) describe Type I and Type II errors in context, and (4) state which error is more serious. The decision matrix and tiered challenges directly prepare students for all four components.