Skewness and Standard Deviation

Learning Objectives

Determine if a data set is skewed
Compute variance and standard deviation
Interpret the standard deviation of a set of data
Compute z-scores and use them to compare values from different data sets

Why This Matters

College admissions officers regularly compare a 1280 SAT against a 28 ACT and have to decide which student performed better against their own peer group. The two tests have different scales, different means, and different standard deviations, so the raw numbers don't compare directly -- a z-score does. The same standardization decides which salary is higher across industries, which video game high score really stands out across difficulty modes, and which 401(k) contribution is more disciplined across age brackets.

How to Use This Simulation

Use the Distribution Explorer tab to switch between preset shapes -- watch the mean, median, and standard deviation respond differently to skew.
Open the Z-Score Comparison tab to standardize values from different distributions and see which one is actually more impressive.
Edit data points to test how a single extreme value reshapes the SD without moving the median.
Read the Explanation Panel below the workspace -- it updates as you interact and connects each preset to the misconception it surfaces.

Preset Dataset:

Symmetric

Scenario:

Maya -- SAT

Population Mean μ

Population SD σ

Value (x)

z = --

Jordan -- ACT

Population Mean μ

Population SD σ

Value (x)

z = --

Caveat: Z-scores compare position in standard deviations from the mean. They translate to a percentile rank (rarity) only when the underlying distribution is approximately normal. In a heavily skewed or bimodal distribution, the same z-score can correspond to a different percentile.

Mean

Sample SD

Sample Variance

Skewness

What's Happening

Quick Check

Maya scored z = 1.5 on the SAT, where scores are approximately normally distributed. Jordan scored z = 1.5 on a final exam in a class where most students did very well and a small handful struggled, producing a heavily left-skewed distribution. Are Maya and Jordan equally rare relative to their respective populations?

Try This

Load the Coffee Wait Times preset. Calculate the sample standard deviation by hand using the formula:

SD = √[ Σ(x − x̄)² / (n − 1) ]

Show your squared deviations for the first three data points. Then verify your final SD matches the simulation's Sample SD card to four decimal places.

A student scored 88 on a biology midterm where the class mean was 75 with SD = 10. The same student scored 92 on a chemistry midterm where the class mean was 85 with SD = 6.

Calculate both z-scores by hand using z = (x − μ) / σ. In which class did the student perform more strongly relative to peers? Verify both z-scores using the Z-Score Comparison tab (set the Scenario to "Custom" and enter the values).

You're a college admissions reader. Maya took the SAT and scored 1280 (national mean 1050, SD 200). Jordan took the ACT and scored 28 (national mean 21, SD 5). Both submitted identical GPAs and similar essays.

Standardize both scores using the simulation's Z-Score Comparison tab. Write a one-paragraph recommendation that:

Names the two z-scores you calculated
Identifies which applicant performed more strongly against their own test population
Notes one limitation of using z-scores alone when the underlying distributions might not be perfectly normal

Instructor Notes

Teaching Notes

The Aha moment lives in the Z-Score Comparison tab. Have students predict which is more impressive -- a 1280 SAT or a 28 ACT -- before standardizing. Most will say SAT because the number is bigger. The 28 ACT wins on z-score, and the moment of standardization makes the abstract formula concrete. Consider asking students to predict, calculate, then explain the reversal in their own words.

The bimodal preset is the cleanest way to surface the misconception that "SD describes spread, full stop." Students see a large SD and a near-zero skew, but the histogram clearly shows two clusters -- neither the mean nor the SD captures what's actually going on. Use this to introduce the limits of single-number summaries.

Common Student Errors

Confusing variance with standard deviation. Variance is in squared units (minutes², dollars²), SD is in original units. Students often try to interpret variance directly, which is meaningless ("the variance is 25 squared minutes" is not a sentence).
Naming skewness after the peak instead of the tail. Right-skew = tail extends right, peak on the left. The mean gets pulled toward the tail.
Treating z-scores as a universal rarity measure. A z-score of 2 corresponds to roughly the 97.7th percentile only when the distribution is normal. In skewed or bimodal data, the percentile differs.
Forgetting Bessel's correction (n − 1) when computing sample SD. This sim defaults to sample SD throughout.
Reading a near-symmetric mean-median gap on bimodal data as "the data is symmetric." The numerical descriptors miss the shape.

Discussion Questions

Why do most published reports on income use median rather than mean, but most reports on test scores use mean? What does this say about the assumed distribution shape in each context?
If a value has z = 0, what does that tell you about the value? What does it tell you about the value's percentile rank, and what extra information would you need to know the percentile?
Can a dataset have a small SD but be spread across a wide range of values? (Hint: think about bimodal data with the clusters tightly packed.)

Exam Connection

Typical exam questions present two values from different distributions and ask which is more impressive. Students must standardize each and compare z-scores. Other formats give a dataset and ask students to identify skew direction, compute SD by hand using the formula, and interpret the SD as "typical distance from the mean." Some questions deliberately use bimodal data to test whether students recognize when SD is misleading.