# Visually assessing normality

Purpose: To determine whether a population is normal; actually, to determine whether sample data come from a plausibly normal distribution, or whether a random variable has a plausibly normal distribution.

Procedure: Use descriptive statistics and graphs to determine whether your sample has the characteristics of Normally distributed data. Here are a couple of checklists:

Characteristics of data from Normal populations:

1. One hump (check the histogram, or better yet, the histogram-with-normal curve)
2. Tops of the bars of the histogram follow the Normal curve pretty well (tops of bars can be sometimes above the Normal curve and sometimes below it, but in a mostly random way).
3. Approximately symmetric (histogram, box-and-whiskers plot, Q-Q plot, lengths of tails)
4. Outliers are under control:
1. No more than about 5% of the data are outliers (SPSS box-and-whiskers plot or "Tukey's hinges," that is, the 1.5 x IQR rule)
2. No more than about 1% of data are extreme outliers
3. Outliers are balanced (more or less symmetric)

5) The Q-Q plot shows a little random wriggle about the line, but no real curvature. (Points at either end can be a little farther away from the main linear pattern than points near the middle.)

Indications of non-Normality (in order of seriousness):

1. More than one hump
2. Too many extreme outliers
3. Too many outliers
4. Outliers are unbalanced
5. The data are "strongly" skewed
6. The Q-Q plot shows definite curvature, whether upward curvature, or downward, or one following the other.
7. Bad hump (the middle of the histogram has several bars in a row that are all too tall or all too short).

Conclusion: When deciding whether data come from a normal population, base your decision on the totality of the evidence. Don't look at just one or two things. If you don't find any outliers or skewing worth worrying about, and if the plots look okay to you, then say that your data are approximately normal. At that point, my students have my permission to use z-scores for calculations involving probabilities. If not, then my students do not have permission to use z-scores for anything other than telling how far from center a given datum is, and they must not rely exclusively on the mean and standard deviation for descriptive purposes.

Related topics: