Basics of Hypothesis Testing

Learning Objectives

Identify the null and alternative hypothesis
Identify and explain differences between one- and two-tailed hypothesis tests

Why This Matters

Every time a tech company A/B tests a new feature -- a redesigned checkout button, a tweaked recommendation algorithm, a different notification schedule -- the test type determines what counts as evidence. A one-tailed test asks "did engagement improve?" while a two-tailed test asks "did engagement change at all?" Pick wrong and you either miss a harmful regression that costs users or waste statistical power testing a direction nobody cares about.

How to Use This Simulation

Select a preset scenario and read the research claim, null hypothesis, and alternative hypothesis.
Toggle between Left-tailed, Right-tailed, and Two-tailed tests and watch the rejection region jump to match the alternative hypothesis.
Drag the α slider and observe how the rejection region grows or shrinks and the critical value moves.
Check the Explanation Panel below the chart -- it updates as you interact and tells you why the region changed.

Scenario:

Test Type:

α (Significance Level):

0.05

The significance level α sets the total area of the rejection region.

Null Hypothesis

Alternative Hypothesis

Critical Value(s)

Rejection Region

Test Type

Hypothesis Pair

What's Happening

Quick Check

A pharmaceutical company runs a clinical trial to test whether a new drug lowers blood pressure more than the current standard. They use a one-tailed test at α = 0.05 and reject H₀. A reviewer asks: "Would you still reject H₀ if you had used a two-tailed test at the same α?" Which answer is correct?

Try This

A hospital administrator claims the average emergency room wait time is more than 45 minutes. Before touching the simulation, write down: is this a left-tailed, right-tailed, or two-tailed test? Then load a preset with the same test type and verify the rejection region is where you predicted.

A food delivery app claims their average delivery time is 30 minutes. A consumer group suspects it's actually longer. (1) Write H₀ and Hₐ in correct notation. (2) Identify the test type. (3) Using the simulation, find the critical value at α = 0.05. (4) Now switch to a two-tailed test at α = 0.05 and note the new critical value. In one sentence, explain why the two-tailed critical value is farther from zero.

A UX team redesigns a checkout flow for an e-commerce site. The VP of Product wants to run a one-tailed test ("did conversion rate increase?") because the team invested six months and they only care about improvement. The data scientist argues for a two-tailed test because the redesign could also decrease conversions, and shipping a harmful change is worse than missing an improvement. Set up both tests in the simulation at α = 0.05. In two sentences, defend which test type you would recommend and explain what the other test type would miss.

Instructor Notes

Teaching Notes

This simulation works best when you start with a single preset and have students toggle between all three test types before explaining the notation. The visual jump of the rejection region from one tail to the other to both tails creates an immediate anchor for "the alternative hypothesis determines where you look for evidence." Once they see it, the notation becomes a label for something they already understand.

The α slider is the second interaction to introduce. Students are often surprised that a larger α makes it easier to reject H₀. Ask them: "If you make the rejection region bigger, does that mean you need more evidence or less evidence to reject?" The visual makes the answer obvious in a way that the formula alone doesn't.

Common Student Errors

Writing the equality in Hₐ instead of H₀. The null hypothesis always contains the equality (≥, ≤, or =). The alternative is always a strict inequality (<, >, or ≠).
Choosing the test type based on personal preference rather than the research question. The direction of the claim determines the test type before any data is collected.
Assuming one-tailed and two-tailed tests are interchangeable. At the same α, they have different critical values and can produce different conclusions for identical data.
Switching from a two-tailed to a one-tailed test after seeing the data because "the effect was in one direction." This is a form of p-hacking. The test type must be committed before data collection.

Discussion Questions

Why would a company choose a two-tailed test when they clearly hope their new product performs better? What risk does a one-tailed test create?
If a researcher changes from a two-tailed to a one-tailed test after collecting data, how does this affect the critical value? Why is this considered problematic?
A drug company tests whether a new medication is more effective than the standard. A consumer advocacy group tests whether the medication's effectiveness differs from the standard. They use the same data. Can they reach different conclusions? Why?

Exam Connection

Typical exam questions give a research claim in plain English and ask students to write H₀ and Hₐ in correct notation, then identify the test type. Common variants: given H₀ and Hₐ, identify the test type and sketch the rejection region; given α and the test type, find the critical value(s). The simulation directly practices all three formats. The "Battery Life" preset is particularly useful for exam prep because it demonstrates that the same claim can be tested as one-tailed or two-tailed depending on the research question.