HOW TO TEACH RESAMPLING STATS ALONG WITH A STANDARD TEXT
Julian L. Simon and Peter Bruce
INTRODUCTION
A simple and effective way to teach the resampling
method at the introductory level is to use your usual
text and course outline, and present the resampling
method immediately following the conventional method
for all the problems that you demonstrate in class.
This tactic may be illustrated with the text
Introductory Statistics for Business and Economics
(Wiley, 1990), by Thomas H. Wonnacott and Ronald J.
Wonnacott. This text was chosen for illustration
because one of us expects to use it for a class soon,
and also is acquainted with Tom Wonnacott. It was not
chosen because it lends itself particularly well to the
resampling approach; resampling fits with other texts
just about as well.
Notes to teachers are either indented or in
brackets. Other material is intended to be read by
students.
CONFIDENCE INTERVALS
W and W begin their book, and the first chapter on
"The Nature of Statistics," with an example of the
reliability of a simple randomly-selected 1988
presidential election poll, showing 840 votes for Bush
and 660 votes for Dukakis out of 1500. W and W
estimate 95% confidence limits for the population
proportion of Bush supporters in conventional fashion.
After showing this, the teacher may proceed by
lecturing as follows:
One can also estimate the confidence intervals in a fashion
different from the classical approach just shown. The resampling
method works by experimentally drawing samples from a population
like the one you wish to investigate. Let's see how it is done.
We draw samples of size 1500 from a population whose
proportion we estimate using the information from the survey
results, which showed a proportion of .56 Bush supporters. (One
makes the same assumption when using the classical method. [Note
to the teacher: Spend some more time here on the logic of this
assumption, or else we postpone the discussion until later.]
Then we examine the results of those samples to see how much they
vary from one another. We can do this with an urn containing 56
red balls and 44 black balls (or 5600 red and 4400 black balls),
putting back the ball every time we draw one. [The class can
actually do this, and then go on to the computer procedure below,
after noting that the procedure by hand is perfectly
satisfactory, but gets tedious. Or the teacher can immediately
skip to the computer procedure, after just describing the urn
procedure. So we move on to:]
Let's do this with the computer program RESAMPLING STATS.
We first draw a single sample of 1500 "voters" with these
commands:1
GENERATE 1500 1,100 A
This command draws 1500 balls randomly with numbers
between 1 and 100, and puts them in a location we'll
call A. We will let 1-56 = red (Bush), 57-100 = black
(Dukakis)
COUNT A between 1 56 B
This command counts the number of red balls in the
sample of 1500, and puts the count in location B.
-----------------------------------------------------------------
1[Technical note to teacher: To conserve memory, Resampling Stats
limits vectors to 1000 elements unless you otherwise specify.
Therefore, this program needs the following command:
MAXSIZE A 1500 This increases the size allowed for vector A
to accommodate our 1500 "voters".
----------------------------------------------------------------
Please recall our purpose, which is to find out how much
the sample results vary from one another. Therefore, to find out
the results from a good many samples, we next repeat the process
(say) 100 times, keep score of the result each time, and then end
the process when 100 trials are completed. Then after the 100
simulated samples have been drawn, we construct a histogram of
the results. We do all this by adding a few commands to the one-
sample program we wrote above, as follows:
REPEAT 100 Take 100 samples from our simulated population
GENERATE 1500 1,100 A Take 100 balls randomly between 1 and
100, and put them in a location we'll call A. Let 1-56
= red (Bush), 57-100 = blue (Dukakis).
COUNT A between 1 56 Count the number of red balls and put the
count in location B.
SCORE B Z. Record the result of this trial on the
"scoreboard" Z.
END End the above experiment loop, go back to the
beginning, and repeat until 100 trials have been
completed.
HISTOGRAM Z. Diagram the results of the 100 trials, and show the
mean. The results may be seen Figure W1.
In the histogram we see that sample results range all the
way from 786 (53%) favoring Bush to 888 (59%) favoring Bush. The
results clearly vary greatly from one trial sample to another,
teaching the crucial lesson of variability. Our first estimate
of the sampling "margin of error" is clearly about 6%. If we
were to do a thousand more samples, or ten thousand, however, we
would expect that the range of samples to be greater: a few "far
out" samples are more likely to be generated by chance in a
thousand than in a hundred samples. We solve this dilemma by
specifying a "confidence interval" that includes the vast
majority -- say 95% -- of our sample results. In this case, the
range 801 (53.4%) to 871 (58.1%) includes 95% of the trial
results and, therefore, is our estimate of a "95% confidence
interval". You will learn later how to get RESAMPLING STATS to
examine all your trial results and find the endpoints of this
interval for you.
It is important that without any further ado,
resampling provides an intellectually complete answer
to the question that W and W raise in their very first
pages but cannot answer in a meaningful fashion. They
must throw a formula at the reader that the reader
cannot possibly understand at that point, and indeed
may never be able to fully understand, even after
waiting many chapters for the answer to be provided
with classical methods. But because W and W are so
anxious to immediately get the reader swimming in the
waters of inferential statistics, rather than
postponing that entry for several chapters, they are
forced to provide a baffling formula.
In contrast, resampling can in the very first
pages provide a procedure and an answer to the problem
at hand that students can follow and understand in its
entirety. This enables W and W to satisfy their desire
to immediately introduce inferential statistics,
without paying the price of baffling and scaring the
reader.
The instructor might try to construct a program in
BASIC to handle the resampling procedure. But it will
soon be clear even to a person adept with that language
that the program will not be simple to write. And the
program will certainly be quite obscure to students who
do not already understand BASIC, whereas the RESAMPLING
STATS program above can be understood without prior
programming experience in any language.
Showing a conventional solution with Minitab at
this point would be entirely meaningless to the
beginning student, another point in favor of resampling
and of RESAMPLING STATS.
PROBABILITY THEORY
W and W next present a lovely opportunity to show
what resampling can do in the context of probability
theory. On page 83 they show how to calculate the
probability of not getting a boy in five children,
using the multiplication rule.
The teacher can then continue and ask:
What is the probability of getting exactly four girls in
five children? The amswer cannot be arrived at with a simple
rule. You could work this problem in the same manner that the
earlier problem about boy-girl-boy was worked, constructing the
entire sample space (W and W examples 3-2 to 3-4), but this
obviously would be tedious. And if the problem were 14 girls out
of 19 children, it would obviously be impossible to handle with
sample-space analysis.
Another way to estimate the chances of getting four girls in
five children is by resampling (or Monte Carlo) experimentation.
You might make a first approximation that the probability of a
girl being born is the same as that of a boy. And you could
then use coins to stand for children, a head for a boy and a tail
for a girl. Continue as follows:
1. Toss a coin 5 times, letting heads = girl, tails = boy.
2. Count how often you got a head.
3. Record "yes" if 4 heads, "no" if not.
4. Repeat steps 1-3, say, 50 times.
5. Count how many of the 50 trials had a "yes".
Instead of using coins, we can do the simulation on the
computer with RESAMPLING STATS. This time we'll be more
realistic and assume that the probability of girl is 48%, and a
boy 52%. A program to arrive at an estimate is
REPEAT 1000 Do the experiment 1000 times
GENERATE 5 1,100 A Generate randomly five numbers between
1 and 100 and
put them in a location called A. Let 1-48 = girl, 49-
100 = boy.
COUNT A <=48 B Count the number of girls, put the result in B
SCORE B Z Keep score of the result of each trial
END End one trial, go back and repeat until all 1000 are
complete, then proceed
HISTOGRAM Z Produce a histogram of the trial results.
BINOMIAL DISTRIBUTION
When W and W discuss the binomial distribution,
they show how to calculate the probability that, from a
population of microwave ovens that are 80% perfect, a
sample of 10 will be half perfect and half imperfect
(p. 119). After that deductive calculation, the
resampling procedure -- just like the program for four
girls out of five children just above -- may be shown.
Students may be told that they can take their choice of
which way to handle problems in real life, and on exams
-- with the binomial formula, or with the RESAMPLING
STATS program. If correctly done, both methods will
arrive at the same result. If experience holds, most
students will tend to opt for simulation.
Some students will feel that there is something
illegitimate about simulation, perhaps because it is
not "exact". It sometimes helps to point out to the
students that any probability formula such as the
binomial is itself only a mathematical shortcut to the
full procedure of specifying the entire sample space.
The use of the t-distribution in a two-sample problem
is an excellent example: it is a mathematically
convenient way of describing what happens in a
randomization procedure, developed in an era in which
lack of computing power kept people from carrying out
randomizations for all but the smallest data sets.
Simulation is simply another shortcut. When one sees
that both the formula method and the simulation method
are on the same footing in this respect, resampling is
more likely to seem legitimate.
W and W then (p. 120) tell the students that
instead of the formula, they can use a table in the
back of the book. At this point the student's
intuition is of course shut off, because the logic of a
table is inpenetrable to all. Once again the
Resampling Stats procedure is shown, and the students
can see for themselves that they can completely
understand everything that is happening.
Here again one may wish to compare a BASIC program
with RESAMPLING STATS in performing the resampling
procedure. This is the program that Gnanadesikan et. al
(The Art and Technique of Simulation, Dale Seymour,
1987) use to simulate repeated coin tosses:
80 INPUT "ENTER THE NUMBER OF KEY COMPONENTS";N
100 INPUT "ENTER THE NUMBER OF TRIALS";NT
120 DIM T$(NT,N),C(2*N)
140 FOR I = 1 to NT
150 LET NH = 0
160 FOR J = 1 TO N
170 LET X = RND (1)
180 IF X < .5 THEN 220
190 T$ (I,J) = "H"
200 NH = NH + 1
210 GOTO 230
220 T$ (I,J) = "T"
230 IF J = N THEN 260
250 GOTO 270
270 NEXT J
280 C(NH + 1) = C(NH + 1) + 1
290 NEXT I
330 FOR K = 1 TO N + 1
350 NEXT K
360 END
The above BASIC program is written in general form
and does not specify a particular number of coins and
heads, as RESAMPLING STATS does. (We have simplified
the program by removing the many "print" statements.)
Note that the RESAMPLING STATS program listed above
does the same job, for a sample of 5 coins.
INTERLUDE: THE GENERAL PROCEDURE
The procedural steps taken in solving the particular problem
above were chosen to fit the specific facts. We can also
describe the steps in a more general fashion. The generalized
procedure simulates what we do when we estimate a probability
using resampling problem-solving operations.
Step A. Construct a simulated population or "universe" of
random numbers or cards or dice or another randomizing mechanism
whose composition is similar to the universe whose behavior we
wish to describe and investigate. The term "universe" refers to
the system that is relevant for a single simple event. For
example:
A coin with two sides, or two sets of random numbers "1-
52" and 53-100", simulates the system that produces a single male or
female birth, when we are estimating the probability of four
girls in the first five children. Notice that in this universe
the probability of a girl remains the same from trial event to
trial event -- that is, the trials are independent --
demonstrating a universe from which we sample without
replacement.
Hard thinking is required in order to determine the
appropriate "real" universe whose properties interest you.
Step(s) B. Specify the procedure that produces a pseudo-
sample which simulates the real-life sample in which we are
interested. That is, one must specify the procedural rules by
which the sample is drawn from the simulated universe. These
rules must correspond to the behavior of the real universe in
which you are interested. To put it another way, the simulation
procedure must produce simple experimental events with the same
probabilities that the simple events have in the real world. For
example:
In the case of four daughters in five children, you can
draw a card and then replace it if you are using a deck of red
and black cards. Or if you are using a random-numbers table, the
random numbers automatically simulate replacement. Just as the
chances of having a boy or a girl do not change depending on the
sex of the preceding child, so we want to ensure through
replacement that the chances do not change each time we choose
from the deck of cards.
Recording the outcome of the sampling must be indicated as
part of this step, e.g. "record `yes' if girl `no' if
a boy.
Step(s) C. If several simple events must be combined into a
composite event, and if the composite event was not described in
the procedure in step B, describe it now. For example:
For the four girls in five children, the procedure for
each simple event of a single birth was described in step B. Now
we must specify repeating the simple event four times, and
determine whether the outcome is or is not four girls.
Recording of "four or more girls" or "three or less girls"
is part of this step. This record indicates the results of all
the trials and is the basis for a tabulation of the final result.
Step(s) D. Calculate from the tabulation of outcomes of the
resampling trials. For example: the proportion of "yes" or
"no" estimates the likelihood we wish to estimate in step C.
RANDOM SAMPLING AND THE DISTRIBUTION OF THE MEAN
W and W pose the following problem (p. 202): "A population
of men on a large midwestern campus has a mean height of mu = 69
inches, and a standard deviation sigma = 3.22 inches. If a
random sample of n = 10 men is drawn, what is the chance the
sample mean X-bar will be within 2 inches of the population mean
mu?"
The framing of this question reveals the unrealistic fashion
in which classical statistics poses most question. The data for
the population necessarily arise discretely, and the parameter of
the standard deviation is a derived computation; beginning with
the discussion the standard deviation given as a datum
immediately removes the problem from a realistic setting.
Luckily, W and W earlier present data on the heights of 200
men (p. 28).
We take those observations as our supposed population, that
is, as our best estimate of what the population is like. We now
draw samples of 10 from this collection. Whether we draw them
with or without replacement depends on what we are assuming the
collection to be - the entire population, or a sample from it.
If the latter, we must discuss why it is reasonable to consider
it our best estimate of the population, and then draw from it.
[It is unfortunate for pedagogical purposes that W and W
present the data in grouped format. The student may therefore
leap to the unsound conclusion that the appropriate procedure is
to rearrange the raw data into bins to produce a frequency
histogram, and then do a bootstrap confidence interval using not
the original data we collected, but the values of the bin centers
and their frequencies. The teacher should forestall that
possibility. ]
Programs for the two different situations are as follows:
SAMPLING WITHOUT REPLACEMENT:
READ file "heights" A Read the height data from an ASCII
file called "heights" located in the same directory as
RESAMPLING STATS. The heights should be listed in a
column; they will become vector A.
REPEAT 100 Repeat the following trial 100 times
SHUFFLE A A Shuffle the height vector A, keep calling
it A
TAKE A 1,10 B Take the first 10 (without replacement),
put them in B
MEAN B C Calculate their mean
SCORE B Z Keep score
END End one trial, go back and repeat until all 100 are
complete, then proceed to the next step
HISTOGRAM Z Produce a histogram of the "resample" means
SAMPLING WITH REPLACEMENT:
READ file "heights" A Read the height data from an ASCII
file called "heights" located in the same directory as
RESAMPLING STATS. The heights should be listed in a
column; they will become vector A.
REPEAT 100 Repeat the following trial 100 times
SAMPLE 10 A B Take a sample of size 10, with
replacement, put them in B
MEAN B C Calculate its mean
SCORE B Z Keep score
END End one trial, go back and repeat until all 100 are
complete, then proceed to the next step
HISTOGRAM Z Produce a histogram of the "resample" means
W and W show a Monte Carlo simulation for their
height problem (p. 222).
The teacher may compare the clarity of the RESAMPLING STATS
bootstrap-like treatment with the treatment using the normal
distribution and the computer.
THE BOOTSTRAP
Happily, W and W provide an introduction to the
bootstrap in the context of confidence intervals. They
suggest, however, that it is for use "in situations too
complex for standard theory to handle" (p. 277). Here
the teacher may recall how a very similar technique was
used successfully right at the start of the course (see
above), and remind students how easy it is to do this
with RESAMPLING STATS. So how about doing a bootstrap
right here, using the 200 heights as a sample, not a
population? Here's the program:
BOOTSTRAP SAMPLING:
READ file "heights" A Read the height data from an ASCII
file called "heights" located in the same directory as
RESAMPLING STATS. The heights should be listed in a
column; they will become vector A.
REPEAT 100 Repeat the following trial 100 times
SAMPLE 200 A B Take a sample of size 200, selected
randomly and with replacement, from our
original sample
MEAN B C Calculate the mean of the resample
SCORE B Z Keep score
END End one trial, go back and repeat until all 100 are
complete, then proceed to the next step
HISTOGRAM Z Produce a histogram of the "resample" means
HYPOTHESIS TESTING
W and W begin their discussion of hypothesis
testing (p. 288) with samples of 10 men's salaries and
5 women's salaries, and they ask if there is a
difference between the groups. (The actual difference
is $5,000.) They deal with the problem with the t
test.
Minitab or other software may also be presented at
this point.
After completing the demonstration with the t test
(and perhaps standard software), the teacher may
proceed as follows by a modified randomization test
that samples without replacement.
COPY (13 11 19 15 22 20 14 17 14 15) A Copy the data for
the men's salaries
COPY (9 12 8 10 16) B Copy the data for the women's
salaries
CONCAT A B C Put all the data together in the same vector
REPEAT 100 Repeat the following procedure 100 times
SAMPLE 10 C D Select 10 salaries, at random and with
replacement (our original sample was assumed to be from
a larger population), and put them in a vector called D
MEAN D DD Calculate the mean salary in this group
SAMPLE 5 C E Select 5 salaries, at random and with
replacement, and put them in E
MEAN E EE Calculate the mean salary in this group
SUBTRACT DD EE F Find out by how much the "male" average
exceeds the "female" average
SCORE F Z Keep score of the difference
END
HISTOGRAM Z Produce a histogram of trial differences
In the histogram we see that randomly-drawn samples produced
differences in average salary that were generally less than
$4,000; only once was there a difference greater than $5,000.
The class may then discuss the pro's and con's of
the classical and the resampling approaches for this
problem. Again, the students may be told that they may
use either method on examinations. As long as the data
are given in their full form, the students are likely
to opt for the resampling method.
DISCUSSION
We have presented only a very few illustrative
problems. But even with this small set, the teacher
should be able to have a good idea of the place of
resampling when taught in parallel with the classical
methods. And even this small a sample of problems is
sufficient to provide a reasonable sense of how the
general resampling method deals with the garden variety
of statistical and probabilistic problems.
A definition of resampling, a bit of its history,
and other background materials that may be used one
place or another in the course may be found in the
enclosed article from Chance.
howteach statwork disk 1-210 May 14, 1991