DOC

Biostat 656 Lab 3

By Sara Butler,2014-05-07 10:40
6 views 0
Biostat 656 Lab 3

May 7, 2010

    Biostat 656: Lab 3

    Purpose: 1) Learn linear regression and logistic regression in WinBUGS and Stata 2) Compare results from WinBUGS and Stata

Linear Regression

    Often we want to estimate the relationship between a continuous outcome and several variables. Let’s revisit the Rats data. Before we get fancy with

    the growth curve modeling, let’s see what happens when we treat each observation as independent. That is, we don’t account for the fact that the same 30 rats are measured at each time. Compare this model with that on homework 1:

model

     {

     for( i in 1 : N ) {

     for( j in 1 : T ) {

     Y[i , j] ~ dnorm(mu[i , j],tau.c)

     mu[i , j] <- alpha + beta * (x[j] - xbar)

     }

     }

     tau.c ~ dgamma(0.001,0.001)

     sigma <- 1 / sqrt(tau.c)

     alpha ~ dnorm(0.0,1.0E-6)

     beta ~ dnorm(0.0,1.0E-6)

     alpha0 <- alpha - xbar * beta

     }

#data: same as before

    list(x = c(8.0, 15.0, 22.0, 29.0, 36.0), xbar = 22, N = 30, T = 5,

     Y = structure(

     .Data = c(151, 199, 246, 283, 320,

     145, 199, 249, 293, 354,

     147, 214, 263, 312, 328,

     155, 200, 237, 272, 297,

     135, 188, 230, 280, 323,

     159, 210, 252, 298, 331,

     141, 189, 231, 275, 305,

     159, 201, 248, 297, 338,

     177, 236, 285, 350, 376,

     134, 182, 220, 260, 296,

     160, 208, 261, 313, 352,

     143, 188, 220, 273, 314,

May 7, 2010

     154, 200, 244, 289, 325,

     171, 221, 270, 326, 358,

     163, 216, 242, 281, 312,

     160, 207, 248, 288, 324,

     142, 187, 234, 280, 316,

     156, 203, 243, 283, 317,

     157, 212, 259, 307, 336,

     152, 203, 246, 286, 321,

     154, 205, 253, 298, 334,

     139, 190, 225, 267, 302,

     146, 191, 229, 272, 302,

     157, 211, 250, 285, 323,

     132, 185, 237, 286, 331,

     160, 207, 257, 303, 345,

     169, 216, 261, 295, 333,

     157, 205, 248, 289, 316,

     137, 180, 219, 258, 291,

     153, 200, 244, 286, 324),

     .Dim = c(30,5)))

#init

    list(alpha =150, beta = 10, tau.c = 1)

    In this case, mu[i,j] is kind of like E(y) in frequentist linear regression. Notice that each observation has the same variance, the usual assumption of linear regression. In your homework, alpha and beta were both indexed by i. That is each weigh-in was an observation, and each rat was a cluster. So, alpha[i] and beta[i] had a cluster-specific interpretation. In this case, we do not account for the clustering. Let’s check out the results:

node mean sd MC error 2.5% median 97.5% start sample alpha0 106.5 3.218 0.02949 100.1 106.5 112.8 1001 10000 beta 6.186 0.1339 0.001194 5.921 6.185 6.45 1001 10000 sigma 16.21 0.9625 0.009252 14.49 16.17 18.22 1001 10000

Let’s add a random effect. Each individual has a random component, b, i

    added to the intercept. We need to add a prior distribution for this random component:

    2b ~ Normal(0, ?.b) i -2?.b = ?.b ~ Gamma(0.001,0.001)

The code:

model

     {

May 7, 2010

     for( i in 1 : N ) {

     for( j in 1 : T ) {

     Y[i , j] ~ dnorm(mu[i , j],tau.c)

     mu[i , j] <- alpha + beta * (x[j] - xbar) + b[i]

     }

     b[i] ~ dnorm(0.0,tau.b)

     }

     tau.c ~ dgamma(0.001,0.001)

     tau.b ~ dgamma(0.001,0.001)

     sigma <- 1 / sqrt(tau.c)

     sigma.b <- 1 / sqrt(tau.b)

     alpha ~ dnorm(0.0,1.0E-6) } beta ~ dnorm(0.0,1.0E-6) alpha0 <- alpha - xbar * beta list(alpha =150, beta = 10, tau.c = 1, b=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),

    tau.b=1)

The output:

     node mean sd MC error 2.5% median 97.5% start sample alpha0 106.5 3.141 0.1454 100.4 106.5 112.7 1001 10000 beta 6.185 0.06818 6.496E-4 6.052 6.186 6.318 1001 10000 sigma 8.263 0.5359 0.005948 7.285 8.236 9.374 1001 10000 sigma.b 14.36 2.043 0.02602 10.96 14.17 19.0 1001 10000

This model is called the variance components model. One component

    (?.b) is the random effect, and the other (?) is error. We can see that a

    large portion of the variance is due to changes across rats (?.b).

Let’s fit these models in Stata using xtreg, mle:

Fixed effects model:

    . xtreg weight day, i(id) fe

    Fixed-effects (within) regression Number of obs = 150 Group variable (i) : id Number of groups = 30

    R-sq: within = 0.9860 Obs per group: min = 5

     between = 0.0000 avg = 5.0

     overall = 0.9359 max = 5

     F(1,119) = 8357.29 corr(u_i, Xb) = -0.0000 Prob > F = 0.0000

    ------------------------------------------------------------------------------

     weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+----------------------------------------------------------------

     day | 6.185714 .0676639 91.42 0.000 6.051733 6.319696

     _cons | 106.5676 1.63237 65.28 0.000 103.3354 109.7999

May 7, 2010

    -------------+----------------------------------------------------------------

     sigma_u | 14.505165

     sigma_e | 8.2038114

     rho | .7576451 (fraction of variance due to u_i)

    ------------------------------------------------------------------------------

    F test that all u_i=0: F(29, 119) = 15.63 Prob > F = 0.0000

Random effects:

    . xtreg weight day, i(id) mle nolog

Random-effects ML regression Number of obs = 150

    Group variable (i) : id Number of groups = 30

Random effects u_i ~ Gaussian Obs per group: min = 5

     avg = 5.0

     max = 5

     LR chi2(1) = 532.52

    Log likelihood = -568.75874 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

     weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

     day | 6.185714 .0673802 91.80 0.000 6.053652 6.317777

     _cons | 106.5676 2.996157 35.57 0.000 100.6953 112.44

    -------------+----------------------------------------------------------------

     /sigma_u | 13.78543 1.905722 7.23 0.000 10.05029 17.52058

     /sigma_e | 8.169557 .5273191 15.49 0.000 7.136031 9.203084

    -------------+----------------------------------------------------------------

     rho | .7400821 .0594296 .6119631 .8420112

    ------------------------------------------------------------------------------

    Likelihood ratio test of sigma_u=0: chibar2(01)= 120.40 Prob>=chibar2 = 0.000

The fixed effects are the same…but, it looks like most of the variance is

    due to subject-specific random intercepts. Notice that parameter estimates are the same, but standard errors and degrees of freedom are different. Your homework assignment actually featured a random slope that was independent of the random intercept. We will learn about modeling random slopes in lab 4.

Logistic Regression

    Let’s revisit the seeds data from your homework. We would want to account for the fact that seeds in the same plate are probably more alike than seeds from different plates. That is, a plate is a cluster. Before we do that, let’s perform some plane old logistic regression, WinBugs and Stata style.

model

     {

May 7, 2010

     for( i in 1 : N ) {

     r[i] ~ dbin(p[i],n[i])

     logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i] +

     alpha12 * x1[i] * x2[i]

     }

     alpha0 ~ dnorm(0.0,1.0E-6)

     alpha1 ~ dnorm(0.0,1.0E-6)

     alpha2 ~ dnorm(0.0,1.0E-6)

     alpha12 ~ dnorm(0.0,1.0E-6) ist(r = c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3), } n = c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7),

     x1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),

     x2 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1), l

     N = 21)

     list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0)

The output:

node mean sd MC error 2.5% median 97.5% start sample alpha0 -0.556 0.1279 0.003287 -0.8081 -0.5525 -0.3092 1001 10000 alpha1 0.1308 0.2223 0.006015 -0.2941 0.1284 0.568 1001 10000 alpha12 -0.767 0.309 0.008687 -1.379 -0.7663 -0.1705 1001 10000 alpha2 1.321 0.1798 0.004938 0.9711 1.32 1.676 1001 10000

Let’s add a random effect

    model

     {

     for( i in 1 : N ) {

     r[i] ~ dbin(p[i],n[i])

     b[i] ~ dnorm(0.0,tau)

     logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i] +

     alpha12 * x1[i] * x2[i] + b[i]

     }

     alpha0 ~ dnorm(0.0,1.0E-6)

     alpha1 ~ dnorm(0.0,1.0E-6)

     alpha2 ~ dnorm(0.0,1.0E-6)

     alpha12 ~ dnorm(0.0,1.0E-6)

     tau ~ dgamma(0.001,0.001)

     sigma <- 1 / sqrt(tau)

     }

    list(r = c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3),

May 7, 2010

     n = c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7),

     x1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),

     x2 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1),

     N = 21)

     list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0, tau = 1)

Notice that there are no initial values for the b[i]. You can generate them

    by clicking on “gen inits” after clicking “load inits.”

    The output:

node mean sd MC error 2.5% median 97.5% start sample alpha0 -0.5527 0.1942 0.007348 -0.9353 -0.5561 -0.1492 1001 10000 alpha1 0.08113 0.309 0.01086 -0.5622 0.09124 0.6504 1001 10000 alpha12 -0.8291 0.4243 0.0162 -1.669 -0.8293 0.03798 1001 10000 alpha2 1.361 0.2771 0.01064 0.7934 1.365 1.914 1001 10000 sigma 0.2871 0.1399 0.006652 0.04579 0.2779 0.5914 1001 10000

     2The within-plate variance (sigma) suggests that there is a small cluster effect.

Now let’s get our Stata on:

We first fit a logistic regression. (The clustering is taken into account by

    using a “robust” estimator of the standard error.)

. logit germinate x1 x2 x1x2, cluster(id) nolog

Logit estimates Number of obs = 831

     Wald chi2(3) = 32.95

     Prob > chi2 = 0.0000

    Log likelihood = -543.11057 Pseudo R2 = 0.0568

     (standard errors adjusted for clustering on id)

    ------------------------------------------------------------------------------

     | Robust

     germinate | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

     x1 | .1459269 .2941935 0.50 0.620 -.4306817 .7225356

     x2 | 1.318182 .2479397 5.32 0.000 .832229 1.804135

     x1x2 | -.7781037 .3829992 -2.03 0.042 -1.528768 -.0274391

     _cons | -.5581717 .1804689 -3.09 0.002 -.9118843 -.2044592

    ------------------------------------------------------------------------------

    Now let’s add the random effect:

. xtlogit germinate x1 x2 x1x2, re i(id) nolog

Random-effects logit Number of obs = 831

    Group variable (i) : id Number of groups = 21

May 7, 2010

    Random effects u_i ~ Gaussian Obs per group: min = 4

     avg = 39.6

     max = 81

     Wald chi2(3) = 36.79 Log likelihood = -541.93097 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------

     germinate | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+----------------------------------------------------------------

     x1 | .0969904 .2780422 0.35 0.727 -.4479624 .6419432

     x2 | 1.337043 .2369396 5.64 0.000 .8726499 1.801436

     x1x2 | -.81046 .3851746 -2.10 0.035 -1.565388 -.0555316

     _cons | -.5484332 .1665951 -3.29 0.001 -.8749536 -.2219129 -------------+----------------------------------------------------------------

     /lnsig2u | -2.885798 .9316616 -4.711821 -1.059775 -------------+----------------------------------------------------------------

     sigma_u | .2362419 .1100487 .0948071 .5886711

     rho | .0528601 .0466445 .0089083 .2573524 ------------------------------------------------------------------------------ Likelihood ratio test of rho=0: chibar2(01) = 2.36 Prob >= chibar2 = 0.062

Notice that the parameter estimates are slightly different, depending on

    the inclusion of a random effect. Unlike normal data, the variance of

    binomial data depends on the mean. Therefore, changing the variance

    structure changes parameter estimates as well.

Stay tuned for lab 4 when we learn about random slopes! In lab 5, we’ll

    learn some general syntax for these models.

Report this document

For any questions or suggestions please email
cust-service@docsford.com