This Lab was modified by Patricia Humphrey and John Rafter from Spurrier, J.D. et al,
Elementary Statistics Laboratory Manual, Duxbury Press, 1995.
Real and Perceived Distances
One of the most important aspects of data analysis is the study of relationships between variables.
How does a cricket’s chirping rate change as temperature decreases? How does the yield of a
chemical reaction change when pressure is increased? This session uses graphical and descriptive
tools to help quantify relationships between variables.
Often the measurement we really wish to make on an object is difficult to make (or expensive,
toxic, or destructive). If we can find an easily measured variable that is closely linked to the
difficult one, we may be able to use the easy one in place of the difficult one. Before doing so,
we should do an experiment, called a regression experiment, that involves measuring both
variables on each of several objects and then studying the manner in which the easy
measurement tends to change with the difficult one. We may then be able to adjust the easy
measurement to better approximate the difficult one. This process is called calibration. This
session is a regression and calibration experiment to study the manner in which guessed
distances between objects (an easy measurement) denoted by X, the predictor variable, vary in
relation to true distances (a more difficult measurement) denoted by Y, the response variable.
It is well known that people tend to underestimate the size of faraway objects. Do we also tend to
underestimate the distance to faraway objects, or do we tend to overestimate these distances? Or,
do we guess right, on average?
Step 1: Data Collection
The class as a group will go to a pre-chosen spot, with handouts and pencils. The instructor will
first identify a fixed reference point, such as a lamppost or a fire hydrant. Next the instructor
will identify a landmark. You should write a brief description of this landmark in column 2 of
Table 2.1. Each student will then be asked to guess the distance between the reference point and
the landmark. Please keep your guess to yourself so as not to influence others. To simplify calculations later, guess in units of feet only. Silently record your guessed distance in column 3
of Table 2.1. Then the instructor will ask you to guess the distance between the reference point
and a second landmark, to be recorded in Table 2.1, and then another, and so on, for a total of 13
landmarks. Don’t worry that your guesses might be bad. You will calibrate them later.
The class will then be split into teams to measure the true distances to the landmarks. Each team
will have three members:
1. The Base: This person holds the tape end at the reference point and at intermediate
points along the way if the landmark is too far away to measure in one tape length. He or she
also advises the other team members if they are not walking straight toward the landmark and
keeps track of the number of full tape lengths that have been used en route to the landmark.
2. The Point: This person takes the tape roll and carefully walks straight toward the
landmark, until it is reached or the tape runs out. If the tape runs out, the Point is responsible for
keeping track of exactly where the starting point for the next tape length will be while the Base
comes forward. Also, the Point verifies the final reading made by the Eyes.
3. The Eyes: This person walks beside the Point. When the landmark is reached, the Eyes
reads the tape and (in conjunction with the Base and the Point) calculates the final measured
distance to the landmark and records it in the appropriate row of column 4 in Table 2.1. The Eyes
is also the spokesperson for the team in class discussion.
Each of the first 12 landmark distances will be independently measured by at least 3 teams.
There are two serious errors that occur with surprising frequency:
1. It is very easy to forget how many tape lengths have been used when measuring distant
landmarks. It is the Base’s responsibility to remember this, but the other team members should
2. If the end of the tape is reached, the Point should be very careful where the start of the
next tape length is marked. For example, if you are using 25-foot tape measures, the tape is
actually longer than 25 feet, but the new start point should be at the 25-foot mark, not the tape
end. The Eyes should back up the Point to help prevent this error.
Do not measure the 13th landmark distance. It will be a test case. Its true distance has been
measured in advance by your instructor, and we discuss it later.
Table 2.1: Distances Between a Fixed Point and Several Landmarks
Guessed Measured Median Landmark Distance Distance Measured Number Landmark Description (feet) (feet) Distance
X Y Distance)
1 Railing on handicap ramp
2 Heat Pump by Biology door
3 Left end of the close Bike Rack
4 Post at center point of lawn
5 Near steps, bottom right corner
6 Large Magnolia on left
7 Storm drain toward bike racks
8 Left railing on main steps to MPP
9 Third tree by Biology
10 Right corner of far sidewalk
11 Far right corner of near sidewalk
12 Right side of second bike racks
13 Sprinkler control toward Pecan
Step 2: Individual Data Analysis
First, the instructor will lead the class in resolving team-to-team differences in measured
distances. The median of all the measured distances for each landmark will be used as the true
distance. Fill in column 5 of Table 2.1 with these medians as the discussion proceeds. Notice that,
by using the median of at least three measurements, if one of the teams messed up in a big way
(resulting in an outlier), its mistake will not have much effect on the final number, because the
median is resistant to extreme values.
We are now ready to examine the relationship between the true distances (response) and your guessed distances (predictor). The most useful graphical tool for examining the relationship
between two variables is the scatter plot. A scatter plot of true distances versus guessed distances
locates a point on the Cartesian plane for each landmark, with the point’s coordinates given by
(horizontal coordinate, vertical coordinate) = (guessed distance, true distance). Note that when
we say “true distance versus guessed distance” we mean that guessed distances are to be on the
horizontal (X) axis. Also, when we describe a scatter plot, it is “Y versus X” or “Y against X.” When we discuss regression, it is “Y on X.”
We can make a scatter plot, with an added 45? line to help you judge whether you are an
accurate guesser as follows:
1. Press Y= and enter X by pressing the X,T,,n key. (Figure 2.1) Exit from this screen by
2. Enter the true distances (Y) into L and your guessed distances (X) into L. DO NOT 12THENTER THE GUESSED (or true) DISTANCE FOR THE 13 POINT. nd3. Press 2, Y=, and select a Plot.
4. Be sure the ON is highlighted. Use the down arrow key to get to Type:. Choose the scatter
plot icon (first plot type), and press ENTER. Set Xlist: to L and Ylist: to L. Choose 21
whichever Mark: you prefer. (Figure 2.2)
5. To display the scatter plot and the line Y = X, Press Zoom, 9.
Figure 2.1 Figure 2.2
After a short delay, the plot window should open and you should see your scatter plot and the
45? line. If the points on your scatter plot tend to lie above the 45? line, you tend to
underestimate the true distances to landmarks. We would then say that as a guesser you are
negatively biased. This is the case for the majority of guessers. Some guessers are fairly
accurate on the average with their guesses. That is, the points on their scatter plot tend to fall
along the 45? line. We say they are unbiased guessers. A few individuals tend to overestimate
the true distances. The points on their scatter plot tend to lie below the 45? line. We say they are
positively biased guessers. Of course, one’s ability as a guesser may vary from situation to situation.
Reproduce the scatter plot on a piece of paper to turn in with this assignment. Be sure to label
the x-axis and y-axis as well as title your plot. To more easily do this, either press the TRACE
key to find the x and y values at each point or plot the values directly using table 2.1. After you
determine the regression equation (see Step 4) plot it on your scatter plot.
Step 3: Calibration
If the points on your own scatter plot lie approximately on a straight line, then the relationship
between your guessed distances and the true distances is said to be approximately linear. If there
seems to be a U-shaped curve, the relationship is said to be convex. If the curve is an inverted
U-shape, the relationship is said to be concave. In the writing assignment (see Page 6), you will
be asked to describe the relationship shown by the points on your plot?
If we consider you, as a distance guesser, to be a new sort of measuring instrument, we can use
the regression line to calibrate you. Calibration is an activity or operation for correcting bias in a
thmeasuring device and is an example of one very important use for regression experiments. Here landmark. is a graphical method for calibrating your guess for the 13 1. Locate the point on the horizontal (X) axis of your plot that corresponds to your initial
guess for the 13th landmark.
2. With a straightedge, draw a vertical line from that point up the plot until the line
touches the sketched regression line. (See Step 4)
3. From that point, draw a horizontal line to the vertical (Y) axis.
4. The adjusted guess is the reading on the Y axis found at the end of this horizontal line.
Figure 2.3 shows the result of using the calibration to adjust an initial guess of 55 feet. The
calibration leads to an adjusted guessed distance of 64 feet in this case, an increase of 9 feet from
the initial guess. This makes a lot of sense under the observation that these guesses are
negatively biased; if the initial guess is 55 feet, it should be adjusted upward.
My Guesses of Distance to 12 Landmarks
60Y = X50
30SketchedReal Distance (ft)20
Guessed Distance (ft)
A guess of 55 feet calibrates to a true 64 feet
Figure 2.3: Calibration Example
5 Step 4: Regression Equation
The numerical equivalent to the graphical approach for adjusting your initial guess is to
substitute your guess into the regression equation. To calculate the regression equation for this
and your guessed distances are in L. data, use your guesses as the predictor (X) variable and the true distances as the response (Y) 122. Press STAT and arrow to CALC, then Press 8. variable. Use the following steps:
3. Enter L as the X list and L as the Y list and store the resulting line in function Y. This is 211ndnd1. Recall that the true distances are in Ldone by first pressing 2, 2, a comma, 2, 1, a comma, and Vars. Next, arrow to Y-Vars, select Function, and press Enter. Then press Enter again to choose Y The screen should 1.
look like Figure 2.4.
4. After pressing Enter (for the third time), the regression equation will appear. In this case,
the X represents your guessed distance and the Y represents the true distance. (Figure 2.5)
Figure 2.4 Figure 2.5
Record your values of a, b, r-squared, and r in the space below.
Now, we can once again calculate a true distance for your guessed distance for the 13th
landmark by plugging your guess into your regression equation for x. If this calibration equation is useful
the calculated distance will be closer to the true distance than your guess was.
Step 5: Residual Plots
Residual Plots can be of use in determining whether the model used (in this case a straight line)
was a good choice. Plotting residuals against X will indicate any curvature, outliers, and
problems with nonconstant variance.
To put the residuals in list L3, press 2ndnd, 1, -, Vars, arrow to Y-Vars, Function, Y, (, 2, 2, ), 1ndSTO>, 2, 3, Enter. The resulting screen should look like the figure below. Then make a
scatter plot using your guessed distances (L) as the X variable and the residuals (L) as the Y 23variable.
Reproduce the residual plot on a piece of paper to turn in with this assignment. Be sure to label and L. 23the x-axis and y-axis as well as title your plot. To more easily do this, either press the TRACE key to find the x and y values at each point or plot the values directly using the editor to display Parting Glances: LIn this experiment, we used a regression line to calibrate a “measuring instrument”, in this case, a
human being guessing distances between objects. A more important calibration exercise was
performed to improve verification of nuclear weapons tests under the Threshold Test Ban Treaty
between the United States and Russia. After the cold war, the two countries embarked on an
effort to make onsite yield measurements of each other’s nuclear tests, for the purpose of
calibrating a monitoring system based on seismic measurements. That is, there are two
measurements of a nuclear explosion’s force:
1. The onsite measurement
2. The seismic disturbance as measured by a seismograph halfway around the world.
Once a reliable calibration method is constructed, each country should be able to monitor the
other’s nuclear tests using at-home seismic measurements, instead of traveling overseas to make onsite measurements. Note that this difficult onsite measurement will become impossible if
relations between the United States and Russia return to cold war levels. Also, as more is
learned about the relationships between the two measuring methods, nuclear testing in countries
other than Russia may be more reliably monitored in the United States by seismologist. (Picard,
Richard and Bryson, Maurice (1992), “Calibrated Seismic Verification of the Threshold Test
Ban Treaty,” Journal of the American Statistical Association, 87, 293-299.)
Short Answer Writing Assignment All answers should be complete sentences.
Also the table of guesses and measured distances, the scatterplot of your data including the
regression line, and the scatterplot of the residuals.
1. Name two factors (variables) that might cause your ability as a guesser to vary from situation
2. Describe the form of your scatter plot. (i.e., Do points lie approximately on a straight line, a
convex curve, a concave curve, or in some other pattern?)
3. Do your distance guesses tend to be negatively biased, positively biased, or approximately
4. What was your initial guess for the mystery 13
th landmark? Write your regression equation. thWhat was the estimated distance to the 13 landmark from the regression equation? Did the
regression lead to a more accurate guess? If not, what was unusual about your line or guess
that led to the “calibrated” guess being less accurate?
5. What percent of variation in true distance was accounted for by your guesses?
6. Give a meaningful interpretation of the slope for your regression line.
7. Describe the pattern of your residual plot. Indicate any problems with using the regression
line for prediction that are suggested by the residual plot.