DOC

# Biostat 656 Lab 3

By Sara Butler,2014-05-07 10:40
5 views 0
Biostat 656 Lab 3

May 7, 2010

Biostat 656: Lab 3

Purpose: 1) Learn linear regression and logistic regression in WinBUGS and Stata 2) Compare results from WinBUGS and Stata

Linear Regression

Often we want to estimate the relationship between a continuous outcome and several variables. Let’s revisit the Rats data. Before we get fancy with

the growth curve modeling, let’s see what happens when we treat each observation as independent. That is, we don’t account for the fact that the same 30 rats are measured at each time. Compare this model with that on homework 1:

model

{

for( i in 1 : N ) {

for( j in 1 : T ) {

Y[i , j] ~ dnorm(mu[i , j],tau.c)

mu[i , j] <- alpha + beta * (x[j] - xbar)

}

}

tau.c ~ dgamma(0.001,0.001)

sigma <- 1 / sqrt(tau.c)

alpha ~ dnorm(0.0,1.0E-6)

beta ~ dnorm(0.0,1.0E-6)

alpha0 <- alpha - xbar * beta

}

#data: same as before

list(x = c(8.0, 15.0, 22.0, 29.0, 36.0), xbar = 22, N = 30, T = 5,

Y = structure(

.Data = c(151, 199, 246, 283, 320,

145, 199, 249, 293, 354,

147, 214, 263, 312, 328,

155, 200, 237, 272, 297,

135, 188, 230, 280, 323,

159, 210, 252, 298, 331,

141, 189, 231, 275, 305,

159, 201, 248, 297, 338,

177, 236, 285, 350, 376,

134, 182, 220, 260, 296,

160, 208, 261, 313, 352,

143, 188, 220, 273, 314,

May 7, 2010

154, 200, 244, 289, 325,

171, 221, 270, 326, 358,

163, 216, 242, 281, 312,

160, 207, 248, 288, 324,

142, 187, 234, 280, 316,

156, 203, 243, 283, 317,

157, 212, 259, 307, 336,

152, 203, 246, 286, 321,

154, 205, 253, 298, 334,

139, 190, 225, 267, 302,

146, 191, 229, 272, 302,

157, 211, 250, 285, 323,

132, 185, 237, 286, 331,

160, 207, 257, 303, 345,

169, 216, 261, 295, 333,

157, 205, 248, 289, 316,

137, 180, 219, 258, 291,

153, 200, 244, 286, 324),

.Dim = c(30,5)))

#init

list(alpha =150, beta = 10, tau.c = 1)

In this case, mu[i,j] is kind of like E(y) in frequentist linear regression. Notice that each observation has the same variance, the usual assumption of linear regression. In your homework, alpha and beta were both indexed by i. That is each weigh-in was an observation, and each rat was a cluster. So, alpha[i] and beta[i] had a cluster-specific interpretation. In this case, we do not account for the clustering. Let’s check out the results:

node mean sd MC error 2.5% median 97.5% start sample alpha0 106.5 3.218 0.02949 100.1 106.5 112.8 1001 10000 beta 6.186 0.1339 0.001194 5.921 6.185 6.45 1001 10000 sigma 16.21 0.9625 0.009252 14.49 16.17 18.22 1001 10000

Let’s add a random effect. Each individual has a random component, b, i

added to the intercept. We need to add a prior distribution for this random component:

2b ~ Normal(0, ?.b) i -2?.b = ?.b ~ Gamma(0.001,0.001)

The code:

model

{

May 7, 2010

for( i in 1 : N ) {

for( j in 1 : T ) {

Y[i , j] ~ dnorm(mu[i , j],tau.c)

mu[i , j] <- alpha + beta * (x[j] - xbar) + b[i]

}

b[i] ~ dnorm(0.0,tau.b)

}

tau.c ~ dgamma(0.001,0.001)

tau.b ~ dgamma(0.001,0.001)

sigma <- 1 / sqrt(tau.c)

sigma.b <- 1 / sqrt(tau.b)

alpha ~ dnorm(0.0,1.0E-6) } beta ~ dnorm(0.0,1.0E-6) alpha0 <- alpha - xbar * beta list(alpha =150, beta = 10, tau.c = 1, b=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),

tau.b=1)

The output:

node mean sd MC error 2.5% median 97.5% start sample alpha0 106.5 3.141 0.1454 100.4 106.5 112.7 1001 10000 beta 6.185 0.06818 6.496E-4 6.052 6.186 6.318 1001 10000 sigma 8.263 0.5359 0.005948 7.285 8.236 9.374 1001 10000 sigma.b 14.36 2.043 0.02602 10.96 14.17 19.0 1001 10000

This model is called the variance components model. One component

(?.b) is the random effect, and the other (?) is error. We can see that a

large portion of the variance is due to changes across rats (?.b).

Let’s fit these models in Stata using xtreg, mle:

Fixed effects model:

. xtreg weight day, i(id) fe

Fixed-effects (within) regression Number of obs = 150 Group variable (i) : id Number of groups = 30

R-sq: within = 0.9860 Obs per group: min = 5

between = 0.0000 avg = 5.0

overall = 0.9359 max = 5

F(1,119) = 8357.29 corr(u_i, Xb) = -0.0000 Prob > F = 0.0000

------------------------------------------------------------------------------

weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+----------------------------------------------------------------

day | 6.185714 .0676639 91.42 0.000 6.051733 6.319696

_cons | 106.5676 1.63237 65.28 0.000 103.3354 109.7999

May 7, 2010

-------------+----------------------------------------------------------------

sigma_u | 14.505165

sigma_e | 8.2038114

rho | .7576451 (fraction of variance due to u_i)

------------------------------------------------------------------------------

F test that all u_i=0: F(29, 119) = 15.63 Prob > F = 0.0000

Random effects:

. xtreg weight day, i(id) mle nolog

Random-effects ML regression Number of obs = 150

Group variable (i) : id Number of groups = 30

Random effects u_i ~ Gaussian Obs per group: min = 5

avg = 5.0

max = 5

LR chi2(1) = 532.52

Log likelihood = -568.75874 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

day | 6.185714 .0673802 91.80 0.000 6.053652 6.317777

_cons | 106.5676 2.996157 35.57 0.000 100.6953 112.44

-------------+----------------------------------------------------------------

/sigma_u | 13.78543 1.905722 7.23 0.000 10.05029 17.52058

/sigma_e | 8.169557 .5273191 15.49 0.000 7.136031 9.203084

-------------+----------------------------------------------------------------

rho | .7400821 .0594296 .6119631 .8420112

------------------------------------------------------------------------------

Likelihood ratio test of sigma_u=0: chibar2(01)= 120.40 Prob>=chibar2 = 0.000

The fixed effects are the same…but, it looks like most of the variance is

due to subject-specific random intercepts. Notice that parameter estimates are the same, but standard errors and degrees of freedom are different. Your homework assignment actually featured a random slope that was independent of the random intercept. We will learn about modeling random slopes in lab 4.

Logistic Regression

Let’s revisit the seeds data from your homework. We would want to account for the fact that seeds in the same plate are probably more alike than seeds from different plates. That is, a plate is a cluster. Before we do that, let’s perform some plane old logistic regression, WinBugs and Stata style.

model

{

May 7, 2010

for( i in 1 : N ) {

r[i] ~ dbin(p[i],n[i])

logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i] +

alpha12 * x1[i] * x2[i]

}

alpha0 ~ dnorm(0.0,1.0E-6)

alpha1 ~ dnorm(0.0,1.0E-6)

alpha2 ~ dnorm(0.0,1.0E-6)

alpha12 ~ dnorm(0.0,1.0E-6) ist(r = c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3), } n = c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7),

x1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),

x2 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1), l

N = 21)

list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0)

The output:

node mean sd MC error 2.5% median 97.5% start sample alpha0 -0.556 0.1279 0.003287 -0.8081 -0.5525 -0.3092 1001 10000 alpha1 0.1308 0.2223 0.006015 -0.2941 0.1284 0.568 1001 10000 alpha12 -0.767 0.309 0.008687 -1.379 -0.7663 -0.1705 1001 10000 alpha2 1.321 0.1798 0.004938 0.9711 1.32 1.676 1001 10000

Let’s add a random effect

model

{

for( i in 1 : N ) {

r[i] ~ dbin(p[i],n[i])

b[i] ~ dnorm(0.0,tau)

logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i] +

alpha12 * x1[i] * x2[i] + b[i]

}

alpha0 ~ dnorm(0.0,1.0E-6)

alpha1 ~ dnorm(0.0,1.0E-6)

alpha2 ~ dnorm(0.0,1.0E-6)

alpha12 ~ dnorm(0.0,1.0E-6)

tau ~ dgamma(0.001,0.001)

sigma <- 1 / sqrt(tau)

}

list(r = c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3),

May 7, 2010

n = c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7),

x1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),

x2 = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1),

N = 21)

list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0, tau = 1)

Notice that there are no initial values for the b[i]. You can generate them

by clicking on “gen inits” after clicking “load inits.”

The output:

node mean sd MC error 2.5% median 97.5% start sample alpha0 -0.5527 0.1942 0.007348 -0.9353 -0.5561 -0.1492 1001 10000 alpha1 0.08113 0.309 0.01086 -0.5622 0.09124 0.6504 1001 10000 alpha12 -0.8291 0.4243 0.0162 -1.669 -0.8293 0.03798 1001 10000 alpha2 1.361 0.2771 0.01064 0.7934 1.365 1.914 1001 10000 sigma 0.2871 0.1399 0.006652 0.04579 0.2779 0.5914 1001 10000

2The within-plate variance (sigma) suggests that there is a small cluster effect.

Now let’s get our Stata on:

We first fit a logistic regression. (The clustering is taken into account by

using a “robust” estimator of the standard error.)

. logit germinate x1 x2 x1x2, cluster(id) nolog

Logit estimates Number of obs = 831

Wald chi2(3) = 32.95

Prob > chi2 = 0.0000

Log likelihood = -543.11057 Pseudo R2 = 0.0568

(standard errors adjusted for clustering on id)

------------------------------------------------------------------------------

| Robust

germinate | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x1 | .1459269 .2941935 0.50 0.620 -.4306817 .7225356

x2 | 1.318182 .2479397 5.32 0.000 .832229 1.804135

x1x2 | -.7781037 .3829992 -2.03 0.042 -1.528768 -.0274391

_cons | -.5581717 .1804689 -3.09 0.002 -.9118843 -.2044592

------------------------------------------------------------------------------

Now let’s add the random effect:

. xtlogit germinate x1 x2 x1x2, re i(id) nolog

Random-effects logit Number of obs = 831

Group variable (i) : id Number of groups = 21

May 7, 2010

Random effects u_i ~ Gaussian Obs per group: min = 4

avg = 39.6

max = 81

Wald chi2(3) = 36.79 Log likelihood = -541.93097 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

germinate | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+----------------------------------------------------------------

x1 | .0969904 .2780422 0.35 0.727 -.4479624 .6419432

x2 | 1.337043 .2369396 5.64 0.000 .8726499 1.801436

x1x2 | -.81046 .3851746 -2.10 0.035 -1.565388 -.0555316

_cons | -.5484332 .1665951 -3.29 0.001 -.8749536 -.2219129 -------------+----------------------------------------------------------------

/lnsig2u | -2.885798 .9316616 -4.711821 -1.059775 -------------+----------------------------------------------------------------

sigma_u | .2362419 .1100487 .0948071 .5886711

rho | .0528601 .0466445 .0089083 .2573524 ------------------------------------------------------------------------------ Likelihood ratio test of rho=0: chibar2(01) = 2.36 Prob >= chibar2 = 0.062

Notice that the parameter estimates are slightly different, depending on

the inclusion of a random effect. Unlike normal data, the variance of

binomial data depends on the mean. Therefore, changing the variance

structure changes parameter estimates as well.

Stay tuned for lab 4 when we learn about random slopes! In lab 5, we’ll

learn some general syntax for these models.

Report this document

For any questions or suggestions please email
cust-service@docsford.com