DOC

# Session Two Real and Perceived Distances

By Larry Elliott,2014-01-20 03:33
7 views 0
Session Two Real and Perceived Distances

1

This Lab was modified by Patricia Humphrey and John Rafter from Spurrier, J.D. et al,

Elementary Statistics Laboratory Manual, Duxbury Press, 1995.

Real and Perceived Distances

Introduction:

One of the most important aspects of data analysis is the study of relationships between variables.

How does a cricket’s chirping rate change as temperature decreases? How does the yield of a

chemical reaction change when pressure is increased? This session uses graphical and descriptive

tools to help quantify relationships between variables.

The Setting:

Often the measurement we really wish to make on an object is difficult to make (or expensive,

toxic, or destructive). If we can find an easily measured variable that is closely linked to the

difficult one, we may be able to use the easy one in place of the difficult one. Before doing so,

we should do an experiment, called a regression experiment, that involves measuring both

variables on each of several objects and then studying the manner in which the easy

measurement tends to change with the difficult one. We may then be able to adjust the easy

measurement to better approximate the difficult one. This process is called calibration. This

session is a regression and calibration experiment to study the manner in which guessed

distances between objects (an easy measurement) denoted by X, the predictor variable, vary in

relation to true distances (a more difficult measurement) denoted by Y, the response variable.

Background:

It is well known that people tend to underestimate the size of faraway objects. Do we also tend to

underestimate the distance to faraway objects, or do we tend to overestimate these distances? Or,

do we guess right, on average?

The Experiment

Step 1: Data Collection

The class as a group will go to a pre-chosen spot, with handouts and pencils. The instructor will

first identify a fixed reference point, such as a lamppost or a fire hydrant. Next the instructor

will identify a landmark. You should write a brief description of this landmark in column 2 of

Table 2.1. Each student will then be asked to guess the distance between the reference point and

the landmark. Please keep your guess to yourself so as not to influence others. To simplify calculations later, guess in units of feet only. Silently record your guessed distance in column 3

of Table 2.1. Then the instructor will ask you to guess the distance between the reference point

and a second landmark, to be recorded in Table 2.1, and then another, and so on, for a total of 13

landmarks. Don’t worry that your guesses might be bad. You will calibrate them later.

The class will then be split into teams to measure the true distances to the landmarks. Each team

will have three members:

2

1. The Base: This person holds the tape end at the reference point and at intermediate

points along the way if the landmark is too far away to measure in one tape length. He or she

also advises the other team members if they are not walking straight toward the landmark and

keeps track of the number of full tape lengths that have been used en route to the landmark.

2. The Point: This person takes the tape roll and carefully walks straight toward the

landmark, until it is reached or the tape runs out. If the tape runs out, the Point is responsible for

keeping track of exactly where the starting point for the next tape length will be while the Base

comes forward. Also, the Point verifies the final reading made by the Eyes.

3. The Eyes: This person walks beside the Point. When the landmark is reached, the Eyes

reads the tape and (in conjunction with the Base and the Point) calculates the final measured

distance to the landmark and records it in the appropriate row of column 4 in Table 2.1. The Eyes

is also the spokesperson for the team in class discussion.

Each of the first 12 landmark distances will be independently measured by at least 3 teams.

There are two serious errors that occur with surprising frequency:

1. It is very easy to forget how many tape lengths have been used when measuring distant

landmarks. It is the Base’s responsibility to remember this, but the other team members should

help, too.

2. If the end of the tape is reached, the Point should be very careful where the start of the

next tape length is marked. For example, if you are using 25-foot tape measures, the tape is

actually longer than 25 feet, but the new start point should be at the 25-foot mark, not the tape

end. The Eyes should back up the Point to help prevent this error.

Do not measure the 13th landmark distance. It will be a test case. Its true distance has been

Table 2.1: Distances Between a Fixed Point and Several Landmarks

Guessed Measured Median Landmark Distance Distance Measured Number Landmark Description (feet) (feet) Distance

(True

X Y Distance)

1 Railing on handicap ramp

2 Heat Pump by Biology door

3 Left end of the close Bike Rack

4 Post at center point of lawn

5 Near steps, bottom right corner

6 Large Magnolia on left

7 Storm drain toward bike racks

8 Left railing on main steps to MPP

9 Third tree by Biology

10 Right corner of far sidewalk

11 Far right corner of near sidewalk

12 Right side of second bike racks

13 Sprinkler control toward Pecan

3

Step 2: Individual Data Analysis

First, the instructor will lead the class in resolving team-to-team differences in measured

distances. The median of all the measured distances for each landmark will be used as the true

distance. Fill in column 5 of Table 2.1 with these medians as the discussion proceeds. Notice that,

by using the median of at least three measurements, if one of the teams messed up in a big way

(resulting in an outlier), its mistake will not have much effect on the final number, because the

median is resistant to extreme values.

We are now ready to examine the relationship between the true distances (response) and your guessed distances (predictor). The most useful graphical tool for examining the relationship

between two variables is the scatter plot. A scatter plot of true distances versus guessed distances

locates a point on the Cartesian plane for each landmark, with the point’s coordinates given by

(horizontal coordinate, vertical coordinate) = (guessed distance, true distance). Note that when

we say “true distance versus guessed distance” we mean that guessed distances are to be on the

horizontal (X) axis. Also, when we describe a scatter plot, it is “Y versus X” or “Y against X.” When we discuss regression, it is “Y on X.”

We can make a scatter plot, with an added 45? line to help you judge whether you are an

accurate guesser as follows:

1. Press Y= and enter X by pressing the X,T,,n key. (Figure 2.1) Exit from this screen by

pressing 2

nd, MODE.

2. Enter the true distances (Y) into L and your guessed distances (X) into L. DO NOT 12THENTER THE GUESSED (or true) DISTANCE FOR THE 13 POINT. nd3. Press 2, Y=, and select a Plot.

4. Be sure the ON is highlighted. Use the down arrow key to get to Type:. Choose the scatter

plot icon (first plot type), and press ENTER. Set Xlist: to L and Ylist: to L. Choose 21

whichever Mark: you prefer. (Figure 2.2)

5. To display the scatter plot and the line Y = X, Press Zoom, 9.

Figure 2.1 Figure 2.2

After a short delay, the plot window should open and you should see your scatter plot and the

45? line. If the points on your scatter plot tend to lie above the 45? line, you tend to

underestimate the true distances to landmarks. We would then say that as a guesser you are

negatively biased. This is the case for the majority of guessers. Some guessers are fairly

accurate on the average with their guesses. That is, the points on their scatter plot tend to fall

along the 45? line. We say they are unbiased guessers. A few individuals tend to overestimate

the true distances. The points on their scatter plot tend to lie below the 45? line. We say they are

4

positively biased guessers. Of course, one’s ability as a guesser may vary from situation to situation.

Reproduce the scatter plot on a piece of paper to turn in with this assignment. Be sure to label

the x-axis and y-axis as well as title your plot. To more easily do this, either press the TRACE

key to find the x and y values at each point or plot the values directly using table 2.1. After you

determine the regression equation (see Step 4) plot it on your scatter plot.

Step 3: Calibration

If the points on your own scatter plot lie approximately on a straight line, then the relationship

between your guessed distances and the true distances is said to be approximately linear. If there

seems to be a U-shaped curve, the relationship is said to be convex. If the curve is an inverted

U-shape, the relationship is said to be concave. In the writing assignment (see Page 6), you will

be asked to describe the relationship shown by the points on your plot?

If we consider you, as a distance guesser, to be a new sort of measuring instrument, we can use

the regression line to calibrate you. Calibration is an activity or operation for correcting bias in a

thmeasuring device and is an example of one very important use for regression experiments. Here landmark. is a graphical method for calibrating your guess for the 13 1. Locate the point on the horizontal (X) axis of your plot that corresponds to your initial

guess for the 13th landmark.

2. With a straightedge, draw a vertical line from that point up the plot until the line

touches the sketched regression line. (See Step 4)

3. From that point, draw a horizontal line to the vertical (Y) axis.

4. The adjusted guess is the reading on the Y axis found at the end of this horizontal line.

Figure 2.3 shows the result of using the calibration to adjust an initial guess of 55 feet. The

calibration leads to an adjusted guessed distance of 64 feet in this case, an increase of 9 feet from

the initial guess. This makes a lot of sense under the observation that these guesses are

negatively biased; if the initial guess is 55 feet, it should be adjusted upward.

My Guesses of Distance to 12 Landmarks

80

70

60Y = X50

40

30SketchedReal Distance (ft)20

10

0

01020304050607080

Guessed Distance (ft)

A guess of 55 feet calibrates to a true 64 feet

Figure 2.3: Calibration Example

5 Step 4: Regression Equation

The numerical equivalent to the graphical approach for adjusting your initial guess is to

substitute your guess into the regression equation. To calculate the regression equation for this

and your guessed distances are in L. data, use your guesses as the predictor (X) variable and the true distances as the response (Y) 122. Press STAT and arrow to CALC, then Press 8. variable. Use the following steps:

3. Enter L as the X list and L as the Y list and store the resulting line in function Y. This is 211ndnd1. Recall that the true distances are in Ldone by first pressing 2, 2, a comma, 2, 1, a comma, and Vars. Next, arrow to Y-Vars, select Function, and press Enter. Then press Enter again to choose Y The screen should 1.

look like Figure 2.4.

4. After pressing Enter (for the third time), the regression equation will appear. In this case,

the X represents your guessed distance and the Y represents the true distance. (Figure 2.5)

Figure 2.4 Figure 2.5

Record your values of a, b, r-squared, and r in the space below.

Now, we can once again calculate a true distance for your guessed distance for the 13th

landmark by plugging your guess into your regression equation for x. If this calibration equation is useful

the calculated distance will be closer to the true distance than your guess was.

Step 5: Residual Plots

Residual Plots can be of use in determining whether the model used (in this case a straight line)

was a good choice. Plotting residuals against X will indicate any curvature, outliers, and

problems with nonconstant variance.

To put the residuals in list L3, press 2ndnd, 1, -, Vars, arrow to Y-Vars, Function, Y, (, 2, 2, ), 1ndSTO>, 2, 3, Enter. The resulting screen should look like the figure below. Then make a

scatter plot using your guessed distances (L) as the X variable and the residuals (L) as the Y 23variable.

6

Reproduce the residual plot on a piece of paper to turn in with this assignment. Be sure to label and L. 23the x-axis and y-axis as well as title your plot. To more easily do this, either press the TRACE key to find the x and y values at each point or plot the values directly using the editor to display Parting Glances: LIn this experiment, we used a regression line to calibrate a “measuring instrument”, in this case, a

human being guessing distances between objects. A more important calibration exercise was

performed to improve verification of nuclear weapons tests under the Threshold Test Ban Treaty

between the United States and Russia. After the cold war, the two countries embarked on an

effort to make onsite yield measurements of each other’s nuclear tests, for the purpose of

calibrating a monitoring system based on seismic measurements. That is, there are two

measurements of a nuclear explosion’s force:

1. The onsite measurement

2. The seismic disturbance as measured by a seismograph halfway around the world.

Once a reliable calibration method is constructed, each country should be able to monitor the

other’s nuclear tests using at-home seismic measurements, instead of traveling overseas to make onsite measurements. Note that this difficult onsite measurement will become impossible if

relations between the United States and Russia return to cold war levels. Also, as more is

learned about the relationships between the two measuring methods, nuclear testing in countries

other than Russia may be more reliably monitored in the United States by seismologist. (Picard,

Richard and Bryson, Maurice (1992), “Calibrated Seismic Verification of the Threshold Test

Ban Treaty,” Journal of the American Statistical Association, 87, 293-299.)

Also the table of guesses and measured distances, the scatterplot of your data including the

regression line, and the scatterplot of the residuals.

1. Name two factors (variables) that might cause your ability as a guesser to vary from situation

to situation?

2. Describe the form of your scatter plot. (i.e., Do points lie approximately on a straight line, a

convex curve, a concave curve, or in some other pattern?)

3. Do your distance guesses tend to be negatively biased, positively biased, or approximately

unbiased?

4. What was your initial guess for the mystery 13

th landmark? Write your regression equation. thWhat was the estimated distance to the 13 landmark from the regression equation? Did the

regression lead to a more accurate guess? If not, what was unusual about your line or guess

that led to the “calibrated” guess being less accurate?

5. What percent of variation in true distance was accounted for by your guesses?

6. Give a meaningful interpretation of the slope for your regression line.

7. Describe the pattern of your residual plot. Indicate any problems with using the regression

line for prediction that are suggested by the residual plot.

Report this document

For any questions or suggestions please email
cust-service@docsford.com