Calculate the Test Statistic -- One Mean Hypothesis Test
Learning Objective
- Compute the value of the test statistic (z-value) for a hypothesis test for one population mean with a known standard deviation
Why This Matters
Every time a factory inspector pulls 36 bolts off the assembly line and measures their diameter, one number decides whether the machine has drifted out of spec or whether the batch just happened to run slightly wide. That number is the test statistic. The same calculation tells a pharmaceutical regulator whether a generic drug matches the brand-name dosage and tells a city engineer whether the water treatment plant meets federal safety thresholds. Get it right and you catch real problems; get it wrong and you either miss a defective process or shut down a perfectly good one.
How to Use This Simulation
- Drag the x̄ slider and watch the red test statistic line move along the z-axis in real time as the formula recalculates.
- Change n, σ, or μ₀ in the input fields and observe how each affects the standard error and the test statistic.
- Toggle the test type (left, right, two-tailed) and watch the rejection region shift while the test statistic stays fixed.
- Check the formula display below the curve -- it shows every substitution step so you can follow the arithmetic.
Drag to change the sample mean and watch the test statistic line move across the curve.
What's Happening
Quick Check
An inspector tests H₀: μ = 50 mm using σ = 1.2 mm. With a sample of n = 36, she measures x̄ = 50.4 mm and calculates z = 2.0000. If she doubles her sample size to n = 72 while x̄ and σ stay the same, what happens to the test statistic?
Try This
A coffee chain claims their large cup contains μ = 16 oz. You sample n = 25 cups and find x̄ = 15.6 oz. The population standard deviation is σ = 1.0 oz. Calculate the test statistic by hand using z = (x̄ − μ₀) / (σ/√n), then enter the values into the simulation and verify that your calculated z matches the displayed test statistic. What is the standard error, and what does it represent?
A fitness tracker company claims their device counts an average of μ = 10,000 steps per day with no systematic error. A consumer testing lab samples n = 64 devices over a standardized walking course and finds x̄ = 10,180 steps with σ = 800 steps. (1) State H₀ and H₁ in correct notation, given that the lab suspects the device overcounts. (2) What test type does this H₁ require? (3) Compute the test statistic by hand. (4) At α = 0.05, find the critical value using the simulation. (5) Does the test statistic fall inside or outside the rejection region? What would this suggest about the company's claim?
A pharmaceutical company claims their generic aspirin tablet contains μ = 500 mg of active ingredient. Two inspectors each test a batch. Inspector A samples n = 25 tablets and finds x̄ = 502 mg (σ = 10 mg). Inspector B samples n = 2,500 tablets and finds x̄ = 502 mg (same σ = 10 mg). Both have the same effect size: the sample mean is 2 mg above the claim. Compute the test statistic for each inspector. Then explain in two sentences why Inspector B's test statistic is much larger than Inspector A's, even though both measured the same 2 mg difference. What does this tell you about the relationship between sample size, statistical significance, and whether a 2 mg difference actually matters for patients?
Instructor Notes
Teaching Notes
This simulation works best when you let students drag the x̄ slider before explaining the formula. The visual of the red test statistic line moving through the rejection region while the dark critical value lines hold still creates an immediate, physical distinction between data-driven and threshold-driven quantities. Once students see that the two lines respond to different inputs, the conceptual separation sticks.
The step-by-step formula display updates in real time. Use it to show students that the denominator is σ/√n (the standard error), not σ. Ask them to predict what happens when n changes before they drag the slider -- the standard error shrinks, the test statistic magnitude grows, and the same x̄ - μ₀ difference becomes "more detectable." This is the CLT payoff: larger samples produce tighter sampling distributions.
Common Student Errors
- Using σ instead of σ/√n in the denominator. This inflates the test statistic and produces false rejections. The simulation labels the denominator "Standard Error" every time to reinforce the correct formula.
- Confusing the test statistic with the critical value. Both are z-values, both appear on the same axis, and students sometimes use the same variable name for both. The color and label distinction (red "Test Statistic z" vs dark slate "Critical Value z*") addresses this.
- Ignoring the sign of the test statistic. For left-tailed tests, only negative z values can fall in the rejection region. A positive test statistic in a left-tailed test means the data is in the opposite direction from what H₁ claims.
- Believing that a larger test statistic always means a larger real-world effect. The Challenge tier directly addresses this by showing that n = 2,500 produces z = 10 from a 2 mg difference.
Discussion Questions
- If two researchers study the same claim with the same σ and μ₀ but different sample sizes, can they get different test statistics? Can they reach different conclusions? What does this tell you about the role of sample size in hypothesis testing?
- A test statistic of z = 0 means the sample mean equals the claimed population mean exactly. Does that prove the null hypothesis is true? Why or why not?
- In the "Vitamin Dosage" preset, the test statistic is about 2.12 from a difference of only 0.3 mg. Would you call this a practically meaningful difference? What additional information would you need to decide?
Exam Connection
Typical exam questions give x̄, μ₀, σ, and n, and ask students to compute the test statistic. The most common error is using σ in the denominator instead of σ/√n. The formula display in this simulation mirrors the step-by-step work students must show on exams: compute the standard error first, then the numerator, then divide. Some exam items also ask students to compare two test statistics with different sample sizes, which is the Challenge tier exercise.