Sampling Errors, Bias & Variables
Learning Objectives
- Identify sampling errors and bias
- Identify explanatory and response variables in an experiment
- Define and distinguish between qualitative, quantitative, discrete, and continuous variables
Why This Matters
In 1936, Literary Digest magazine surveyed 2.4 million Americans and confidently predicted Alf Landon would defeat Franklin Roosevelt in the presidential election. Roosevelt won 46 of 48 states. The magazine's sample was drawn from phone directories, car registrations, and its own subscriber list - sources that over-represented wealthier Americans during the Great Depression. The same math runs today: every political poll, A/B test, and clinical trial lives or dies on whether its sample actually represents the population it claims to describe.
How to Use This Simulation
- In the Sampling Simulator tab, click "Draw Samples" to pull a biased and an unbiased sample from the same population. Watch how their means compare to the population mean.
- Draw multiple samples and watch the bias tracker - sampling error scatters randomly, but bias stays tilted in the same direction every time.
- Switch to the Variable Classification tab to practice identifying variable roles and data types across 10 research scenarios.
- Check the Explanation Panel below - it updates as you interact and names the distinction between sampling error and bias.
Every variable in a study can be classified along multiple dimensions: its role in the study (explanatory or response) and its data type (qualitative or quantitative; if quantitative, discrete or continuous). These classifications determine which statistical methods apply. Classify both variables in each scenario below.
Note: "Explanatory" and "response" are sometimes called "independent" and "dependent" in other textbooks.
What's Happening
Quick Check
A survey asks college students to rate their campus dining hall on a scale of 1 to 5 stars. A student calculating the results reports the mean rating as 3.2 stars. Their statistics professor says the mean may not be the most appropriate measure for this variable. Why?
Try This
A phone survey calls 1,000 randomly selected households between 10 AM and 2 PM on weekdays and asks about employment status and daily screen time. (1) Identify whether sampling bias is present and what kind. (2) If researchers want to study whether employment status predicts screen time, identify the explanatory and response variables. (3) Classify each variable as qualitative or quantitative, and if quantitative, discrete or continuous. Verify your classifications by comparing to a similar preset in the simulator.
Two studies investigate how much sleep college students get per night. Study A hands out surveys to students leaving a campus gym at 6 AM. Study B emails a survey link to 500 randomly selected student email addresses; 200 respond. (1) Name the sampling method in each study. (2) Describe the likely direction of bias in each - would each method overestimate or underestimate average sleep? (3) Classify the variables in each study. (4) Explain in one sentence why the same question ("how much do students sleep?") produces different answers depending on the sampling method.
A news headline reads: "Fitness app users walk 40% more than the average American, study finds." The study analyzed step-count data from 50,000 users of a popular fitness tracking app. (1) Classify the study design (observational or experiment). (2) Identify the sampling method and describe at least two bias risks. (3) Identify the variables and classify each by role and data type. (4) Write one sentence stating what the study's design and sample actually support, and one sentence stating what the headline implies but the data cannot confirm. (5) Propose one specific change to the study that would make the headline's claim stronger.
Instructor Notes
Teaching Notes
This simulation is most effective when you let students draw 5-10 samples before explaining the distinction between sampling error and sampling bias. The bias tracker dot chart makes the distinction visible: random sample signed differences scatter both directions and average toward zero; biased sample signed differences cluster on one side. Let students describe the pattern they see before naming it.
The variable classification tab addresses a separate but related objective. Students who can identify "qualitative" and "quantitative" in isolated examples often struggle when a variable uses numbers but represents categories (star ratings, zip codes, jersey numbers). Scenario 8 (driver rating, 1-5 stars) is designed to surface this confusion.
Common Student Errors
- Confusing "sampling error" (natural random variability) with "an error in sampling" (a mistake). The term sounds like something went wrong, but sampling error is expected and unavoidable.
- Believing a large sample automatically fixes bias. The Literary Digest polled 2.4 million people and still got the 1936 election wrong - sample size doesn't fix a biased sampling method.
- Treating "qualitative" and "explanatory" as mutually exclusive categories. A variable's role (explanatory/response) is independent of its data type (qualitative/quantitative). Treatment group is both qualitative AND explanatory.
- Classifying any numbered variable as quantitative. Star ratings, rankings, and category codes use numbers but represent categories, not quantities.
Discussion Questions
- If you surveyed your campus by standing outside the library, which student populations would be over-represented? Under-represented? How would this affect conclusions about study habits?
- A fitness app company reports that their users average 8,500 steps per day. Why might this number not represent the average American's activity level? What kind of bias is at work?
- Can a study have both selection bias AND response bias at the same time? Give an example.
Exam Connection
Typical exam questions present a research scenario and ask students to (1) identify the type of bias present, (2) classify variables by role and data type, and (3) explain whether the study's conclusions are justified. The Starter challenge directly practices this format. The Challenge tier extends to evaluating headlines against study designs, which appears in more advanced exam questions.