DOC

W7L1RegressionandCorrelation

By Micheal Price,2014-05-17 16:14
9 views 0
W7L1RegressionandCorrelation

Week 7 Lesson 1 (lesson 13) – Ch 14 pg 629

    Linear Regression and Correlation – how are two variables related e.g. sunny days and sales volume of ice cream

    Linear equation with one variable – straight line

    Y = bo + b1X where X is the independent variable and bois the y-intercept (a

    constant) and b1is the slope (gradient)

    Slope – an increase in 1 unit of x results in b1increase in y

    When b1> 0, slope upwards; b1 < 0, slope downwards; b1=0, slope horizontal

    Scatter Plot – a graph of two variables – x and y e.g. age and price of carsErrors occur when plot a linear line on a scatter plot. These errors are each

    measured by the vertical distance from the data point to the linear line.We can plot many lines on a scatter plot and therefore we use the least-squares

    method to choose the ‘best’ line with least (squared) error.This line is called the Regression Line and the equation, Regression equation

    Sxy = (- xix) (- )yiy = xiyi - xiyin

    Sxx = (- xix)? = xi? - xi?n and Syy = (- yiy)? = yi? - yi?n

    b1 = Sxy/Sxx

    bo = y - b1x =) = yib1xin-

    Using the Regression equation, we can calculate any values of x and y.x is also called the predictor and y is called the response variable

    extrapolation can only be made within acceptable limits, beyond that it becomes

    inaccurate

    Outliers can distort the Regression line. An outlier that affect the regression is

    called an influential observation

    An influential observation may not be an outlier though

    

    Coefficient of Determination

    1

How accurate is x predicting y? two approaches:

    Measure the total variation in the observed values of the response variable;

    

    Total Sum of Squares, SST = (- )yiy? and the Sample Variance is SST

    divided by n-1

    

    Amount of variation of response variable explained by the regression (distance

    between the mean and the predicted values of the response variable)

    

    Regression Sum of Squares, SSR = (- )yiy? where yi are the predicted

    values for each response variable, yi

    

    If we divide SSR by SST, we get an idea of the percentage of variation of the

    response variable explained by the regression

    

    Coefficient of Determination, r? = SSR/SST = (- )yiy?/ (- )yiy?

    (always between 0 and 1 where towards zero means a poor predictor)

    Amount of variation of response variable NOT explained by the regression

    (distance between the mean and the observed values of the response variable)

    Error Sum of Squares, SSE = (-)yiyi? where yi are the predicted values for

    each response variable, yi

    Therefore SST = SSR + SSE also known as the Regression Identity

    Since r? = SSR/SST, then r? = (SST – SSE)/ SST or 1- SSE/SST

    SST = Syy and SSR = S?xy /Sxx and SSE = Syy – (S?xy /Sxx)

    SST = yi? - yi?n and SSR = [(- xix) (- )yiy]?/ (- xix)?

    

    Linear Correlation Coefficient - denoted by r

    r = SxySxxSyy. where r always lie between -1 and 1

    2

r is positive when the slope is positive - both xix- and (- )yiy must be positive or

    negative to have a positive, r

    r is negative when the slope is negative –one of (- ) xix or (- )yiy must be negative

    and the other positive to have a negative, r

    r close to -1 or 1 shows a strong linear relationship and x a good predictor of ya negative r means negatively linear correlated between x and y and vice versa Correlation does not mean Causation

    3

Report this document

For any questions or suggestions please email
cust-service@docsford.com