Advanced Probability and Statistics Module 3
Greetings Statfolk. As I read chapters 4 - 6, I found a couple of mistakes in the book. On page 131 the variance formula is wrong. It should be multiplication in the numerator rather than addition. Please make this change in pencil in your books. Also, in Figure 5.3 on page 151, the caption claims that the distribution does not change shape, but it obviously does. The book is correct with a certain condition. The book does a poor job explaining this, so I’ll try to do better in one of the
_x，problems below. Finally, to clarify some notation, the book writes . This isn’t that bad, but technically it should be x;n
x，i_;1iwritten . Explanation: there are n pieces of data labeled x through x; the subscripts make this clear. i is an iterator x;1nn
that runs from 1 to n to define clearly where the summation is to begin and end. Incidentally, I created the above equations with Microsoft Equation Editor, which you might have to install from your disc if you want to use it (see Help menu—―install
equation editor.‖ However, you can get around this. In the Insert menu choose Symbol, then in the Font box choose Symbol and you’ll find the capital sigma symbol: ；. You could then write x-bar = (1/n) ；x assuming that you have already defined i
the use of ； to mean the sum on i from 1 to n. To create a subscript in Word, press Control and the = sign. Then type the
subscript. These keystrokes toggle the subscripting on and off. (To create a superscript, press Control-Shift-Equal.) Putting
identifiers in equations in italics is a good idea.
You’ll need to read through chapters 4, 5, 6. There are a lot of tables and graphs, so it won’t be too bad.
The first two columns of the spreadsheet is a repeat of the 2 x - x |x - x| (x - x) x i xiibaribaribarbarStooge data from Module 2. As you recall, we measured the 1 34 0.33 0.33 0.11 33.67 variability of the data via the interquartile range. This time
2 13 -20.67 20.67 427.11 we’ll do it with the standard deviation. Create a spreadsheet 2like the one below. You can copy and paste the first two 3 23 s -10.67 10.67 113.78
columns, but you’ll have to figure out what formulae to put in 4 25 -8.67 8.67 75.11 276.75 the other cells. You will know your spreadsheet is correct 5 36 2.33 2.33 5.44 when it matches up with what is shown here. You can get 6 45 s 11.33 11.33 128.44 subscripts and superscripts in Excel by highlighting and going
7 19 -14.67 14.67 215.11 16.64 to Format, Cells, Font. Calculate x-bar using the Average
function. The third, fourth, and fifth columns should refer 8 34 0.33 0.33 0.11
back to the mean. (Make sure to use dollar signs in the 9 25 -8.67 8.67 75.11 Excel formula where necessary.) Use the Sum function to get the 10 33 -0.67 0.67 0.44 functions: totals at the bottom (or use the sigma button on the toolbar). 211 17 -16.67 16.67 277.78 Calculate s by dividing one of the totals by the appropriate 2212 9 s -24.67 24.67 608.44 number. Use s to compute s. Then use the built-in Var and 2Stdev functions in Excel to check your s and s values. To get 13 27 -6.67 6.67 44.44 276.75
help with any function you can use the function wizard. Click 14 53 19.33 19.33 373.78 on fx in the function bar to see a list of functions. You can 15 89 s 55.33 55.33 3061.78 select Statistical as a category and then type V to jump down 16 30 -3.67 3.67 13.44 16.64 to functions starting with a V. If you select Var, it will guide
17 39 5.33 5.33 28.44 you through using that function. You can also click on ―Help
With This Function‖ to see examples. Note: With all four of 18 20 -13.67 13.67 186.78
the functions we’re using you can enter just one argument—19 49 15.33 15.33 235.11 the list of numbers—rather than entering each number 20 35 1.33 1.33 1.78 separately. This list can be entered by clicking on the first 21 54 20.33 20.33 413.44 number, holding, and letting go on the last number. For
22 40 6.33 6.33 40.11 example, to find the mean of number in cells G100 through
G130 you would enter =average(G100:G130). You don’t 23 28 -5.67 5.67 32.11
even have to type G100:G130. Instead, after typing the left 24 31 -2.67 2.67 7.11 parenthesis, just click on G100 and release on G130. Totals 0.00 275.33 6365.33
1. a. What are numbers called in the third column? (There are two different names for them.) b. Interpret the positive or
negative signs of these numbers. c. Interpret the magnitudes of these numbers. d. Notice that they sum to zero. This is no
fluke. Give a formal mathematical proof that this is always the case for any data. It’s a very short proof, but it will help to
know the following fact about summations: the sum of from 1 to n of a constant means to add up the constant n times,
which yields n times the constant.
2. One reason for squaring the deviations before summing them is to make them positive. As you just proved, simply
summing the deviations always yields zero and, therefore, gives no information about how spread out the data is. By
averaging the sum of the squares of the deviations and taking the square root, we get the standard deviation. The fourth
column wasn’t necessary for computing the variance or standard deviation, but look how close the total for column 4 is to 2s. Using absolute values is a perfectly valid way of measuring variability in a data set, but it is not commonly used.
Explain why the two numbers are not exactly the same. That is, why don’t squaring and squaring rooting undo each other
to give the total in column 4?
3. a. What are the units for our standard deviation and variance? b. What is the interpretation of the standard deviation? c.