Mixed-effects modeling with crossed random effects in R

annotated logfile of example analyses

The mixed-effects analyses shown here were done with the program R (version 2.4.1), with the function lmer in package lme4 (Bates, 2005). It is assumed that you have some basic knowledge of R (or S or S-Plus). For further assistance, you may consult my tutorial on Statistics with R and S-Plus, or other help information at the R project website www.r-project.org.

In the log below, user input commands are indicated by blue and bolder type, like this.
Program output is indicated by black monospace type, like this. 
Comments and annotations are given in italic sans-serif type, with indented left margin, like this.


Start the program R. The console window of R will show something like the following:
R version 2.4.1 (2006-12-18)
Copyright (C) 2006 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Loading required package: utils
[Previously saved workspace restored]
Online help for command or function xyz is available by typing help(xyz) at the R prompt, or by browsing the Help menu option. The help pages also contain highly informative examples that may be copy-pasted at the R prompt.
R> getwd()
This function shows the default file path. It returns a character string of that path (in fact, a list of strings containing one element). Because the return value of this function is not assigned, it is displayed. The [1] in the output tells us that the following element (string) is the first (and here, the only) element in the list of strings.
The same function, with a dir argument, can also be used to change the current working directory.
[1] "E:/Hugo/multilevel"
R> x24 <- read.table(file="x24.txt",header=F)
The function read.table reads a data sheet or matrix from the specified file. The file specification is a character string enclosed in double quotes. The file is searched in the default working directory, see above. The function returns a data object of class data.frame. Optionally, the first line of the ASCII data file specified as argument, may contain the names of the columns (variables), in which case we should add the argument header=TRUE. Since our data file doesn't have such a first line, this option is switched off by specifying header=FALSE (which is also the default value for this argument). Boolean values TRUE and FALSE may be abbreviated to their first character.
This function returns a data frame object, which is assigned here to a data frame named x24 by means of the assignment operator <-. Without such assignment, the data are read from file, the contents of the resulting data frame is displayed on the screen, and the result is then lost. Warning: Assignments are carried out without checking whether the object already exists. In other words, if some object named x24 already exists, it will be overwritten by this assignment operation. Make sure that you do not lose valuable information by inadvertedly overwriting it.
R> head(x24)
Show the first 6 elements of the specified data object.
  V1 V2 V3      V4
1  1  1  1 0.72383
2  1  2  1 0.06709
3  1  3  1 1.19462
4  1  4  1 1.72527
5  1  5  1 0.52362
6  1  6  1 1.09539
R> dimnames(x24)[[2]] <- c("subj","item","cond","resp")
The data frame x24 has two dimensions, viz. rows and columns. We'd prefer to assign helpful names to the columns, i.e. to the names of the elements in dimension number two, to replace the names V1, V2, V3, V4. This is done by constructing a list of four strings, and assigning this to the names of the second dimension of object x24.
R> # cond is coerced from numerical into factor
R> x24$cond <- as.factor(x24$cond)
The hash is the comment symbol; everything following the hash on the same line is ignored.
All variables in x24 are numeric. In regression, this means that the continuous numerical values would be used, instead of the discrete categorical values. Replace variable cond in data frame x24 by its own values but coerced (forced) into a special class named factor, for discrete factors.
R> require(lme4)
The package lme4 (Bates, 2005) is required for further analysis. If the package is not yet loaded, then do so. Any packages that are required by lme4 are also loaded.
Loading required package: lme4
Loading required package: Matrix
Loading required package: lattice
[1] TRUE

Empty model

R> # empty model (8)
R> m8 <- lmer(resp~1+(1|subj)+(1|item),data=x24,method="ML")
This is the first call of function lmer. Study the help information for this function. The first argument is the regression formula. This formula contains the random components in parentheses; the residual is not specified. The sequence (1|subj) corresponds to u0(j0); it specifies an intercept for the effect of subj that is the same for all conditions. Likewise, the sequence (1|item) corresponds to v0(0k). The two random effects, of subjects and of items, are crossed.
The data argument contains the data frame object to use, and the method argument specifies Maximum Likelihood as the estimation method (Pinheiro & Bates, 2000). The function returns an object containing the model fit and residuals; this object of class lmer is assigned to a new object named m8.
R> summary(m8)
Summarize the contents of m8.
Linear mixed-effects model fit by maximum likelihood
Formula: resp ~ 1 + (1 | subj) + (1 | item)
   Data: x24
  AIC  BIC logLik MLdeviance REMLdeviance
 2087 2101  -1040       2081         2083
Random effects:
 Groups   Name        Variance Std.Dev.
 item     (Intercept) 0.25664  0.50659			## v_0(0k)
 subj     (Intercept) 0.28828  0.53692			## u_0(j0)
 Residual             0.54027  0.73503			## e_i(jk)
number of obs: 864, groups: item, 36; subj, 24
Fixed effects:
            Estimate Std. Error t value
(Intercept)   0.0235     0.1406  0.1671			## gamma_0(00)
Note that there is no standard error (or other indication of confidence) for the estimates in the random part. These variance components are given in units of variance and in units of standard deviation.
For estimates in the fixed part, the t value is given as a measure of confidence. The probability of t is not given, because the degrees of freedom cannot be determined: these could be either 36-1 or 24-1. For conservative testing, use the lowest of these two alternatives to determine an appropriate critical value.

Fixed-only model, with homogeneity assumption

R> m8v2 <- lmer(resp~-1+cond+(1|subj)+(1|item),data=x24,method="ML")
This intermediate model, not specified in the manuscript but nevertheless reported in Table 2, has the fixed effect of cond in its fixed part. Instead of listing only the intercept, in the fixed part of the regression formula, this call suppresses the intercept (-1) and adds the fixed effect of cond. Because the condition effect is not given in the random part, this model assumes homoschedasticity and sphericity, much like a RM-ANOVA does. Hence the standard errors are the same for all three fixed estimates.
R> summary(m8v2)
Linear mixed-effects model fit by maximum likelihood
Formula: resp ~ -1 + cond + (1 | subj) + (1 | item)
   Data: x24
  AIC  BIC logLik MLdeviance REMLdeviance
 2045 2069  -1017       2035         2045
Random effects:
 Groups   Name        Variance Std.Dev.
 item     (Intercept) 0.25789  0.50783			## v_0(0k)
 subj     (Intercept) 0.28913  0.53771			## u_0(j0)
 Residual             0.51033  0.71437			## e_i(jk)
number of obs: 864, groups: item, 36; subj, 24
Fixed effects:
      Estimate Std. Error t value
cond1  0.21606    0.14485  1.4916			## gamma_H
cond2  0.04569    0.14485  0.3154			## gamma_N
cond3 -0.19127    0.14485 -1.3204			## gamma_L
R> anova(m8,m8v2)
We can evaluate whether this model is better than the predecessor one, with the same random part but without the fixed effect of cond, by comparing their likelihood ratios using function anova with the two models as two parameters. This method of evaluating fixed effects may be too liberal (Pinheiro & Bates, 2000), especially if there are few observations and many model parameters.
Data: x24
m8: resp ~ 1 + (1 | subj) + (1 | item)
m8v2: resp ~ -1 + cond + (1 | subj) + (1 | item)
     Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)
m8    3  2086.7  2101.0 -1040.3
m8v2  5  2044.8  2068.6 -1017.4 45.901      2  1.079e-10 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here there are many observations, and only 5 model parameters, so there is little chance of inflated significance. The effect of treatment condition on the observed response is clearly significant.

Full model, without homogeneity assumption

R> # full model (9)
R> m9 <- lmer(resp~-1+cond+(-1+cond|subj)+(1|item),data=x24,method="ML")
This full model has the effect of cond in its fixed part, and in the random part at the participant level. Hence this model does not require homoschedasticity nor sphericity.
R> summary(m9)
Linear mixed-effects model fit by maximum likelihood
Formula: resp ~ -1 + cond + (-1 + cond | subj) + (1 | item)
   Data: x24
  AIC  BIC logLik MLdeviance REMLdeviance
 2047 2095  -1014       2027         2036
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 item     (Intercept) 0.25344  0.50342			## v_0(0k)
 subj     cond1       0.25067  0.50067			## u_H 0(j0)
          cond2       0.30652  0.55364  0.922		## u_N 0(j0)
          cond3       0.35956  0.59964  0.898 0.960	## u_L 0(j0)
 Residual             0.49429  0.70306			## e_i(jk)
number of obs: 864, groups: item, 36; subj, 24
Fixed effects:
      Estimate Std. Error t value
cond1  0.21606    0.13857  1.5593			## gamma_H
cond2  0.04569    0.14672  0.3114			## gamma_N
cond3 -0.19127    0.15407 -1.2414			## gamma_L
Between-speaker variances seem to be different in the three treatment conditions, although this cannot be tested (yet) in lmer. Speaker's deviances uH are correlated with uN, with r=.92. The different estimates of between-speaker variance in the random part lead to different standard errors for the three treatment conditions in the fixed part. In the simulated data, the residual variances at the lowest level were not homogenenous either (cf. Note 1 in the manuscript), but this heteroschedasticity at the lowest level cannot be captured here.

2007.06.08 HQ