18-10-2018

Preface: I wrote this tutorial based on my own experience and it should be considered a work in progress. I might update it in the future. Also, it requires some familiarity with basic R and lavaan. If you are reading this and are an expert in ESEM, please send me an e-mail if you discover anything that needs improvement or is utterly wrong. Of course, feel free to reach out if you have any questions.

With the huge feature set of R, I was surprised that there is no satisfactory solution for ESEM yet. This method has so much potential, but it is barely used. One reason for this might be that you can only do it in MPlus. Now MPLus is certainly a great software for statistical purposes, but it is expensive.

Luckily, I found a conference paper on how to approximate an ESEM model in R. It consists of the following steps:

  1. Estimating factor loadings based on Exploratory Factor Analysis. In the paper, they use the factanal() command from base R, which does a maximum-likelihood EFA on a correlation matrix.
  2. “Study the rotation”. In the paper, they rotate the loading matrix manually, but don’t really give any details on their approach. I decided to rely on the methods provided by the psych library in their fa() command.
  3. Specify a SEM model based on EFA results. In the original, they use the sem library but I am more used to lavaan, so I will use that.

In this tutorial I compare the results Marsh, Nagenast and Morin (2013) reported in their ESEM of a larger British Big Five dataset with my method. This way I can show that this method works and that its results approximate those of MPlus.

The dataset used is from Wave 15 of the British Household Panel Survey. I obtained it from the UK data service, cleaned it up, recoded the reverse-scored items and loaded it into the object b5.

First, we perform a CFA and look at the fit measures and loadings. All observed variables are skewed and ordinal, so we will use a robust ML estimator (estimator = MLR in lavaan).

b5.model <- "
A =~ A1+A2+A3
C =~ C1+C2+C3
E =~ E1+E2+E3
N =~ N1+N2+N3
O =~ O1+O2+O3
"

b5.cfa <- lavaan::cfa(b5.model, data = b5, estimator = "MLR")
# Marsh et al. (2013) fit measures:# CFI: .761, TLI: .687, RMSEA: .076# Our fit measures:
fitmeasures(b5.cfa, c("cfi.robust","tli.robust","rmsea.robust","srmr"))
##   cfi.robust   tli.robust rmsea.robust         srmr 
##        0.789        0.722        0.094        0.091

As we can see, the fit measures are far from being adequate. This is because traditional CFA is very restrictive: it is assumed that all items only load on one factor. Everything else is considered “bad fit”. Normally, we would now look at modification indices and loosen the cross-loading constraints for some items, greatly increasing our chance of aggravating reviewers in personality psych journals.

So let’s try ESEM! First, we conduct an exploratory FA on the dataset. We will assume five personality factors and choose an oblimin geomin rotation because that would be the default in MPlus. Very small loadings (less than 1^-7) will be reduced to zero. Then, we will extract lavaan-compatible equations from the loading matrix.

In the Marsh et al. (2013) paper, “correlated uniqueness” (CUs) parameters were added to the model. This was done to capture method factors resulting from negatively coded items. We try to mimic this approach by adding covariance formulas for those items.

b5.efa <- fa(b5[,1:15], nfact = 5, rotate = "geominQ", fm = "ml")
## Lade nötigen Namensraum: GPArotation
fa.diagram(b5.efa)

https://tkaiser.science/esemR_files/figure-html/esem-1.png

b5.loadmat <- zapsmall(matrix(round(b5.efa$loadings, 2), nrow = 15, ncol = 5))
rownames(b5.loadmat) <- colnames(b5[,1:15])

# This will turn the loading matrix into a lavaan-compatible equation set. 

terms <- vector()
for (i in 1:5) {
  terms[i] <-
    paste0("F",i,"=~ ", paste0(c(b5.loadmat[,i]), "*", names(b5.loadmat[,1]), collapse = "+"))
}

# "Correlated uniqueness"
terms[6] <- "A1 ~~ C2+E3+N3\\n C2 ~~ E3+N3\\n E3 ~~ N3"

b5.esem <- paste(terms, collapse = "\\n")
b5.esem