There is variation in measurement. Some of that variation is random. This is the entire reason for the existence of the discipline of Statsitics!

# Bias in statistics

On the previous page, we found out that while samples statistics vary, we can hope that the variation decreases with increasing sample size. What about bias? We want statistics that come as close as possible to the parameters they estimate. If they vary little, but are wrong individually and on the average, then they are of little use to us.

#### The Good News

Many of the statistics humans use are unbiased, which essentially means that the mean of all possible values of that statistic is the true value the statistic ought to have, namely the parameter value. For example, in class, I showed you all possible samples of size 2, 3, 4, 5, and 6 from a given population of 7 measurements. I also showed you that in each case, the mean of all the sample means was 77.14286. But the population mean was also 77.14286. (These facts are also mentioned on the previous page.) So the mean of the sample means is the population mean. I like to say this way: The mean of the means is the mean. Another way of saying the same thing is

The sample mean is an unbiased estimator of the population mean.

Basically, what this means is that even though individual sample means are “wrong” (i.e., because sample means are random, the chance that any given sample mean equals the population mean is very small), sample means are “right,” on the average. This is one of our excuses for using sample means to estimate population means. (The other excuse is that, if the sample size is large, sample means  are typically “close” to the population mean.)

Here’s another unbiased estimator: the sample variance is an unbiased estimator of the population variance. Again, this means that, even though individual sample variances are “wrong,” sample variances are “right,” on the average.