Understanding Confidence Intervals

Learning Objective

Understand and compare confidence intervals in context

Why This Matters

When a poll reports "52% support, margin of error ±3%, 95% confidence," most news coverage translates this as "there's a 95% chance true support is between 49 and 55%." That translation is the most common statistical misquote in public discourse -- the 95% describes how often the polling methodology captures the true value across many polls, not the probability that this one poll got it right. Every confidence interval you read in a research paper, drug trial, or product spec uses this same logic, and misreading it leads to overconfidence in single results.

How to Use This Simulation

Click "Draw New Samples" and watch 100 confidence intervals appear, each built from a fresh random sample of the same population.
Count the green intervals (captured μ) and red intervals (missed μ). Compare the capture count to the confidence level.
Click any interval to inspect its bounds, sample mean, and capture status in the results panel below.
Change the confidence level or sample size, then draw again. Watch how the capture rate and interval width respond.

Scenario:

Confidence Level:

Sample Size (n):

100

Capture Rate

Average Width

Selected Interval

Click any interval to inspect it.

What's Happening

Quick Check

A researcher reports a 95% confidence interval of [120, 140] for the mean systolic blood pressure of adults in a city. A health department official says, "This means 95% of adults in this city have blood pressure between 120 and 140." Which response is correct?

Try This

Run the simulation with the default 95% confidence level and all 100 intervals drawn. Count the green intervals. Is the count exactly 95? Run it two more times and record each capture count. In one sentence, explain why the count changes from run to run but stays near 95.

Run the simulation at 90%, then 95%, then 99% confidence. For each level, record the capture count and the average interval width shown in the results panel. Explain in one sentence the tradeoff between confidence level and interval width. Then run the 95% level three more times and record the three capture counts. Why doesn't a count of 92 in one run mean the procedure is broken?

A clinical researcher reports: "We constructed a 95% confidence interval of [2.1, 5.8] mg/dL for the mean cholesterol reduction from our new drug. We are 95% confident the true reduction is in this range." (1) Identify the interpretation error in the researcher's statement. (2) Write a corrected one-sentence interpretation. (3) If a second team repeated the study with a new sample of the same size, would they get the same interval? Explain why that matters for how you read the first team's result.

Instructor Notes

Teaching Notes

This simulation works best when you let students run it once at 95% confidence and count the captures themselves before explaining the interpretation. The visceral experience of watching ~5 intervals miss is what makes the "procedure, not interval" distinction stick. Have students predict the count before drawing, then compare prediction to result.

The re-run button is pedagogically essential. Students need to see the count change -- 93, then 96, then 95, then 91 -- to understand that "95%" is a long-run average, not an exact guarantee. If a student sees 100/100 on their first run (rare but possible), have them run again immediately.

Common Student Errors

Saying "there's a 95% probability that μ is in this interval" -- the canonical misconception. The simulation directly confronts this by showing that once an interval is built, it either captures μ (green) or it doesn't (red). There's no probability to assign after the fact.
Confusing the confidence interval with a prediction interval. Students think the CI tells them where 95% of individual values fall. The CI targets the population mean, not individual data points.
Believing that a higher confidence level always produces a "better" interval. It produces a wider interval that captures more often, but the precision tradeoff means you know less about where μ actually sits.
Thinking that if one run produces 92/100 captures, the procedure is broken. The 95% is a long-run rate; individual runs exhibit natural binomial variation.

Discussion Questions

If you could only run one study and build one confidence interval, why does it matter that the procedure works 95% of the time across many hypothetical studies? How does that help you trust your single result?
A pharmaceutical company reports a 95% CI of [-0.3, 4.7] for a drug's effect. The interval includes 0 (no effect). A 99% CI from the same data would be even wider and would also include 0. Does increasing the confidence level help or hurt this company's argument? Why?
Why do political polls report margins of error but not the number of intervals that would miss across repeated polls? What would change if news reports included that framing?

Exam Connection

The most common exam question on this topic presents a specific confidence interval and four interpretation statements, asking students to identify the correct one. Voice Anchor Sample 2 in the project knowledge was built around this exact format. The Quick Check mirrors the typical exam structure. Emphasize to students that the correct interpretation always references the procedure ("if we repeated this process many times...") rather than assigning probability to the specific interval.