Infinite Populations

__Part 1: Variability of Sample Means in Finite versus
Infinite Populations __

This part may not help you very much if you haven’t learned about this formula yet: .

But if you __have__ learned about this formula, then let
me tell you that it’s false if the population is finite. It was derived under
the assumption that the population is infinite. When do mere mortals ever
encounter genuinely infinite populations? WHAT GOOD IS THIS FORMULA? Well, we have said elsewhere that if the population is *
large enough* (compared to the sample) then it may as well be infinite. Let’s
see why, with an example or two.

Suppose you have a population of 200 objects, and you’ve
sampled 25 of them. You measure something on all 25 objects and find the mean of
the 25 measurements. Let’s say the true population standard deviation is 10
(just to have a number). Question: What is the standard deviation of *sample
means*, when the sample size is 25?

The formula at the top of this page says the standard deviation of sample means when is .

But this is not the right formula to use when the
population is finite. Our population has only 200 objects in it—definitely *
not* infinitely many!

The right formula to use turns out to be , where , the population size.

So the *true* standard deviation of sample means (for
this particular population, when )
is ,
**not** 2!

Now, some of you will say, “Who cares? 1.875 is not very different than 2.” But in some situations (say, in chapters 6, 7, 8, etc.) the difference will be big enough to matter. Big enough to influence some decision you make. Big enough to possibly lead you to a costly mistake, such as killing someone with a drug that should not have been used, or whatever.

Let’s do another example. This time, the population size is . Let’s use a true standard deviation of 10 and a sample size of 25 measurements, as we did in the previous example. The true standard deviation of sample means (for this particular population, when ) is

, close enough to 2 to be good enough. The point here is,

If the population is large enough, the variability in the statistic does not depend (in any practical way) on the population size.

In other words, if the population is large enough, it may as well be infinite, where these calculations are concerned.

The bad news is, we will continue to use the formula in this class, even when it does not actually apply.

The good news is, those of you that go on to take research methods classes will use, in those classes (and in your professions!), the right formulas when your populations are finite. The better news is that most of the populations you will use will be so large they might as well be infinite, so there’s no need to worry (most of the time). And which formula is easier to use, or ?

You see,** the arithmetic is simpler if we assume the
population is infinite!** Weird, huh?

__Part 2: Population Size, Sample Size, and Random
Sampling__

Suppose your company receives a shipment of 50 widgets, of which 10 are defective. (That’s 10 / 50 = 20% defective. That’s a lot!) You select three widgets at random to test. What is the probability that all three widgets are defective?

**Small population, sample without replacement: **Well,
it depends. Let’s assume you sample *without replacement*, that is, once
you’ve selected a widget, you don’t put it back in the box (replace it) before
selecting the next widget. We need the probability that all three widgets are
defective, that is, the first one is defective **AND** the second one is
defective **AND** the third one is defective. I emphasize the word “and”
here, because it tells us what arithmetic to use. Probability Rule #5 says that
“and” means “multiply,” when events are independent. Sampling without
replacement qualifies. So…

The probability that the first widget we pick is defective, is 10/50. Having selected a defective widget, there are only 9 defectives left to pick, and only 49 widgets total to pick them from, so the probability of the second widget being defective is 9/49. Having selected two defective widgets, there are 8 defectives left to pick, and 48 widgets total remain. So the probability of the third one being defective is 8/48. “And” means “multiply” so we do, and get , give or take.

**Small population, sample with replacement:** Now let’s
assume that (for whatever reason) you replace each widget you test, *before
selecting the next widget*. This is called sampling *with* replacement.
Again, we need the probability that all three widgets are defective, that is,
the first one is defective **AND** the second one is defective **AND** the
third one is defective. I emphasize the word “and” here, because it tells us to
multiply (as opposed to adding or something). So…

The probability that the first widget we pick is defective,
is 10/50. Having selected and tested a defective widget, we put it back in the
box. That means there are 10 defective widgets in the box, out of a total of 50
widgets to choose from. So the probability of the second widget being defective
is also 10/50. We test the second widget, replace it, and pick a third widget.
Again, 10 of the 50 widgets in the box are defective, so the probability of the
third widget being defective is also 10/50. As before, we multiply and get
,
which is **not** ;
it’s about 31% higher than 0.00612.

(You may not be impressed by this, but the effect becomes
more dramatic as the number of widgets tested becomes larger. For example, if
you select 10 widgets to test, and select them without replacement, the
probability that all 10 are defective is 9.73 x 10^{-11}, but if you
sample with replacement, the probability of all ten being defective is about
1000 times higher: 1.024 x 10^{-7}.)

**Large population, sample without replacement:** Now
let’s say there 5000 widgets in the shipment, and you’re going to select 3 to be
tested. To make a fair comparison with the two previous examples, let’s say 20%
of the widgets are defective. 20% 0f 5000 is 1000, so 1000 out of 5000 widgets
are defective. Sample the three without replacement. The probability that all
three are defective is .

**Large population, sample with replacement:** If you
sample your three widgets with replacement, when 1000 out of 5000 are defective,
the probability that all three are defective is
.

Amazing! Both times we sampled with replacement, we got the
same probability! But wait! This time, the difference between sampling with
replacement and sampling without replacement is *much smaller* than before:
0.00800 is only about 0.25% higher than 0.00798.

Why is the difference so much less than it was when there
were only 50 widgets? Because a population of 5000 is large compared to a sample
of 3. (3 x 100 = 300; by our rule of thumb, a population of 300 is large
compared to a sample of size 3!) But a population of 50 is *not* large
compared to a sample of size 3.

As the size of the population gets bigger compared to the size of the sample, the difference between sampling with replacement and sampling without replacement gets smaller and smaller. So if the population is infinite and the sample is finite, the difference between sampling with replacement and sampling without replacement is approximately ZERO. In other words,

If the population is large enough, it might as well be infinite.

And you might as well compute your probabilities as though
you were sampling with replacement (independently) even though you sample *
without* replacement (dependently) in practice. Once again, **the arithmetic
is simpler if we assume the population is infinite!**

(Note: The difference still becomes more dramatic as the
number of widgets tested becomes larger. For example, if 1000 out of 5000
widgets are defective, and you select 10 widgets to test, and select them
without replacement, the probability that all 10 are defective is 9.88 x 10^{-8},
but if you sample with replacement, the probability of all ten being defective
is still 1.024 x 10^{-7}, which is about 3.7% higher, as opposed to
0.25% higher.)

Are there any truly, actually infinite populations? I don’t
know. I doubt any mortal knows. But for what it’s
worth, I believe that there is a huge variety of *potentially*
infinite populations. So we might as well assume they *are* infinite. Or
something.

BYU-Idaho mailto:brownd@byui.edu

232 Ricks Building 208-496-1839 voice

Rexburg, ID 83440 208-496-2005 fax

**Please
do not call me at home.**