1. 예제 자료

이번 시간에도, 지난 시간에 생성한 데이터프레임 mydf3를 예제 자료로 사용하도록 하겠다. 아래는 지난 시간에 작성한, 자료를 읽어들여 전처리를 하고 하위 척도 점수를 계산하는 명령이다.

library(readxl)
library(dplyr)

mydf = as.data.frame(read_excel(path = '/cloud/project/mydata.xlsx'))

names(mydf) = c('agree', 'age', 'sex', 'edu', 'marital',
                  paste0('bpns', 1:18),
                  paste0('ctq', 1:10)) 

mydf3 = mydf %>% filter(bpns14<=5 & ctq7<=4 & ctq9<=4 & ctq10<=4) %>% 
  filter(rowSums(is.na(.))==0) %>%
  mutate(bpns1r = 6 - bpns1,
         bpns2r = 6 - bpns2,
         bpns3r = 6 - bpns3,
         bpns6r = 6 - bpns6,
         bpns14r = 6 - bpns14,
         ctq6r = 5 - ctq6,
         ctq7r = 5 - ctq7,
         ctq8r = 5 - ctq8,
         ctq9r = 5 - ctq9,
         ctq10r = 5 - ctq10) %>% 
  mutate(autonomy = rowSums(select(.,bpns1r,bpns2r,bpns3r,bpns4,bpns5,bpns6r)),
         competence = rowSums(select(.,bpns7:bpns12)),
         related = rowSums(select(.,bpns13,bpns14r,bpns15:bpns18)),
         abuse = rowSums(select(.,ctq1:ctq5)),
         neglect = rowSums(select(.,ctq6r:ctq10r)))

2. lavaan 패키지를 사용한 SEM 분석

다양한 구조방정식 모형을 사용한 분석은 lavaan 패키지를 사용해서 실행할 수 있다.
다음 웹페이지에 lavaan 패키지 사용법에 대한 자세한 설명이 제시되어 있다: https://lavaan.ugent.be/tutorial/index.html

2-1. 확인적 요인분석(Confirmatory Factor Analysis)

mydf3 사용하여, 기본심리욕구 척도에 대해 아래와 같은 확인적 요인분석을 실시한다고 해보자.
확인적 요인분석은 lavaan 패키지의 cfa() 함수를 사용해서 다음과 같이 실시할 수 있다.

install.packages('lavaan')
library(lavaan)

model.cfa = 'autonomy =~ bpns1r + bpns2r + bpns3r + bpns4 + bpns5 + bpns6r
             competence =~ bpns7 + bpns8 + bpns9 + bpns10 + bpns11 + bpns12
             relatedness =~ bpns13 + bpns14r + bpns15 + bpns16 + bpns17 + bpns18'
             

fit.cfa = cfa(model = model.cfa, data = mydf3)
summary(fit.cfa, fit.measures = T, standardized = T)

# lavaan 0.6.16 ended normally after 37 iterations
# 
# Estimator                                         ML
# Optimization method                           NLMINB
# Number of model parameters                        39
# 
# Number of observations                           307
# 
# Model Test User Model:
#   
# Test statistic                               386.839
# Degrees of freedom                               132
# P-value (Chi-square)                           0.000
# 
# Model Test Baseline Model:
#   
# Test statistic                              2005.130
# Degrees of freedom                               153
# P-value                                        0.000
# 
# User Model versus Baseline Model:
#   
# Comparative Fit Index (CFI)                    0.862
# Tucker-Lewis Index (TLI)                       0.841
# 
# Loglikelihood and Information Criteria:
#   
# Loglikelihood user model (H0)              -6166.063
# Loglikelihood unrestricted model (H1)      -5972.643
# 
# Akaike (AIC)                               12410.126
# Bayesian (BIC)                             12555.473
# Sample-size adjusted Bayesian (SABIC)      12431.782
# 
# Root Mean Square Error of Approximation:
#   
# RMSEA                                          0.079
# 90 Percent confidence interval - lower         0.070
# 90 Percent confidence interval - upper         0.089
# P-value H_0: RMSEA <= 0.050                    0.000
# P-value H_0: RMSEA >= 0.080                    0.460
# 
# Standardized Root Mean Square Residual:
#   
#   SRMR                                         0.078
# 
# Parameter Estimates:
#   
# Standard errors                             Standard
# Information                                 Expected
# Information saturated (h1) model          Structured
# 
# Latent Variables:
#                Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# Autonomy =~                                                           
# bpns1r            1.000                               0.699    0.689
# bpns2r            0.947    0.089   10.592    0.000    0.662    0.715
# bpns3r            1.032    0.096   10.805    0.000    0.722    0.733
# bpns4             0.701    0.082    8.512    0.000    0.490    0.555
# bpns5             0.718    0.080    8.963    0.000    0.502    0.588
# bpns6r            0.794    0.089    8.968    0.000    0.556    0.588
# Competence =~                                                         
# bpns7             1.000                               0.630    0.646
# bpns8             0.859    0.090    9.565    0.000    0.541    0.673
# bpns9             0.770    0.086    8.952    0.000    0.485    0.618
# bpns10            1.007    0.097   10.403    0.000    0.634    0.758
# bpns11            0.829    0.094    8.818    0.000    0.522    0.607
# bpns12            0.808    0.098    8.207    0.000    0.508    0.556
# Relatedness =~                                                        
# bpns13            1.000                               0.526    0.627
# bpns14r           0.979    0.121    8.118    0.000    0.515    0.565
# bpns15            0.839    0.091    9.195    0.000    0.441    0.665
# bpns16            0.972    0.103    9.451    0.000    0.511    0.691
# bpns17            0.916    0.101    9.083    0.000    0.481    0.654
# bpns18            0.990    0.110    8.962    0.000    0.520    0.642
# 
# Covariances:
#                  Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# Autonomy ~~                                                           
#   Competence        0.211    0.038    5.497    0.000    0.479    0.479
# Relatedness         0.197    0.034    5.817    0.000    0.536    0.536
# Competence ~~                                                         
#   Relatedness       0.187    0.032    5.870    0.000    0.565    0.565
# 
# Variances:
#                  Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# .bpns1r            0.541    0.054   10.115    0.000    0.541    0.525
# .bpns2r            0.420    0.043    9.759    0.000    0.420    0.489
# .bpns3r            0.448    0.047    9.457    0.000    0.448    0.462
# .bpns4             0.539    0.048   11.278    0.000    0.539    0.692
# .bpns5             0.477    0.043   11.070    0.000    0.477    0.654
# .bpns6r            0.583    0.053   11.068    0.000    0.583    0.654
# .bpns7             0.553    0.052   10.584    0.000    0.553    0.582
# .bpns8             0.354    0.034   10.302    0.000    0.354    0.547
# .bpns9             0.380    0.035   10.835    0.000    0.380    0.618
# .bpns10            0.297    0.033    8.969    0.000    0.297    0.425
# .bpns11            0.467    0.043   10.926    0.000    0.467    0.632
# .bpns12            0.577    0.051   11.267    0.000    0.577    0.690
# .bpns13            0.426    0.040   10.702    0.000    0.426    0.606
# .bpns14r           0.565    0.051   11.175    0.000    0.565    0.681
# .bpns15            0.246    0.024   10.328    0.000    0.246    0.558
# .bpns16            0.286    0.029   10.008    0.000    0.286    0.523
# .bpns17            0.311    0.030   10.448    0.000    0.311    0.573
# .bpns18            0.387    0.037   10.567    0.000    0.387    0.588
# Autonomy           0.489    0.077    6.334    0.000    1.000    1.000
# Competence         0.396    0.068    5.834    0.000    1.000    1.000
# Relatedness        0.276    0.049    5.603    0.000    1.000    1.000

cfa() 함수의 인자는 다음과 같다.
- model인자에는 측정 모형을 명세한다. 모형은 반드시 '' 안에 입력해야 한다. 모형을 입력할 때, =~ 기호는 measured by를 나타낸다. 즉, =~ 기호 앞에는 요인(factor) 혹은 잠재변수(latent variable)의 이름을, =~ 기호 뒤에는 측정 변수명을 입력한다. 동일한 요인을 측정하는 측정 변수들은 + 기호로 연결하여 변수명을 입력한다. 요인의 이름은 원하는대로 만들어서 사용하면 된다.
- data 인자에는 측정 변수를 포함하고 있는 데이터프레임을 입력한다.
summary() 함수에 cfa()에서 반환된 결과를 입력하면 보다 자세한 결과를 볼 수 있다.
- fit.measures = T와 같이 입력하면, CFI, TLI, AIC, BIC, SABIC, RMSEA, SRMR 등과 같은 다양한 적합도 지수를 출력한다.
- standardized = T와 같이 입력하면, 표준화된 결과를 출력한다. 출력된 결과 중 Std.lv는 잠재 변수만 표준화되었을 때의 결과를 나타내고, Std.all은 잠재 변수와 측정 변수 모두 표준화되었을 때의 결과를 나타낸다.

<aside> 📎

적합도 지수

CFI (Comparative Fit Index): Baseline model에 비해 적합도가 얼마나 좋아졌는지를 나타내는 지수이다. 최대 1의 값을 가지며, 1에 가까울수록 모형이 자료에 잘 적합된다는 것을 가리킨다. 일반적으로 0.90~0.95 정도의 값을 가지면 적합도가 좋은 것으로 본다.
TLI (Tucker Lewis Index): CFI와 마찬가지로 baselin model에 비해 적합도가 얼마나 좋아졌는지를 나타내지만, TLI는 1보다 큰 값을 가질 수도 있다. 값이 클수록 모형이 자료에 잘 적합된다는 것을 가리키며, 일반적으로 0.90~0.95 정도의 값을 가지면 적합도가 좋은 것으로 본다.
RMSEA (Root Mean Square Error of Approximation): Degree of misspecification을 나타내는 absolute measure of fit이며, 값이 클수록 적합도가 좋지 않음을 나타낸다. 0.05 이하는 close fit을, 0.05에서 0.08 사이는 reasonable fit을, 0.10 이상은 poor fit을 나타낸다.
SRMR (Standardized Root Mean Square Residual): 데이터에서 얻은 공분산 행렬과 모델이 예측한 공분산 행렬 간 차이에 기반한 값으로, 값이 클수록 적합도가 좋지 않음을 나타낸다. 0.08 미만은 좋은 적합도를 나타내는 것으로 해석된다.
AIC, BIC, SABIC: 절대적인 값을 해석할 수 없으며, 여러 모형을 비교할 때에만 사용한다. 값이 더 작을수록 적합도가 상대적으로 더 좋음을 나타낸다. </aside>

2-2. 구조방정식(Structural Equation Modeling)

mydf3를 사용하여, 아래와 같은 구조방정식 모형을 사용한 분석을 실시한다고 해보자.
일반적으로 구조방정식은 lavaan 패키지의 sem() 함수를 사용해서 구현한다. 위의 구조방정식 모형을 사용해서 분석할 때 사용할 수 있는 명령은 다음과 같다.

model.sem = ' # measurement model
                Autonomy =~ bpns1r + bpns2r + bpns3r + bpns4 + bpns5 + bpns6r
                Competence =~ bpns7 + bpns8 + bpns9 + bpns10 + bpns11 + bpns12
                Relatedness =~ bpns13 + bpns14r + bpns15 + bpns16 + bpns17 + bpns18
                Abuse =~ ctq1 + ctq2 + ctq3 + ctq4 + ctq5
                Neglect =~ ctq6r + ctq7r + ctq8r + ctq9r + ctq10r

              # structural model
                Autonomy ~ Abuse + Neglect
                Competence ~ Abuse + Neglect
                Relatedness ~ Abuse + Neglect

              # residual covariance
                Autonomy ~~ 0 * Competence
                Autonomy ~~ 0 * Relatedness
                Competence ~~ 0 * Relatedness'

fit.sem = sem(model = model.sem, data = mydf3)
summary(fit.sem, fit.measures = T, standardized = T)

# lavaan 0.6.16 ended normally after 53 iterations
# 
# Estimator                                         ML
# Optimization method                           NLMINB
# Number of model parameters                        63
# 
# Number of observations                           307
# 
# Model Test User Model:
#   
# Test statistic                               917.894
# Degrees of freedom                               343
# P-value (Chi-square)                           0.000
# 
# Model Test Baseline Model:
#   
# Test statistic                              4555.803
# Degrees of freedom                               378
# P-value                                        0.000
# 
# User Model versus Baseline Model:
#   
# Comparative Fit Index (CFI)                    0.862
# Tucker-Lewis Index (TLI)                       0.848
# 
# Loglikelihood and Information Criteria:
#   
# Loglikelihood user model (H0)              -9152.333
# Loglikelihood unrestricted model (H1)      -8693.386
# 
# Akaike (AIC)                               18430.665
# Bayesian (BIC)                             18665.457
# Sample-size adjusted Bayesian (SABIC)      18465.649
# 
# Root Mean Square Error of Approximation:
#   
# RMSEA                                          0.074
# 90 Percent confidence interval - lower         0.068
# 90 Percent confidence interval - upper         0.080
# P-value H_0: RMSEA <= 0.050                    0.000
# P-value H_0: RMSEA >= 0.080                    0.041
# 
# Standardized Root Mean Square Residual:
#   
# SRMR                                           0.113
# 
# Parameter Estimates:
#   
# Standard errors                             Standard
# Information                                 Expected
# Information saturated (h1) model          Structured
# 
# Latent Variables:
#                Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# Autonomy =~                                                           
# bpns1r            1.000                               0.716    0.705
# bpns2r            0.952    0.086   11.013    0.000    0.681    0.735
# bpns3r            1.036    0.092   11.209    0.000    0.741    0.753
# bpns4             0.648    0.079    8.172    0.000    0.464    0.526
# bpns5             0.645    0.077    8.385    0.000    0.461    0.540
# bpns6r            0.765    0.085    8.952    0.000    0.547    0.580
# Competence =~                                                         
# bpns7             1.000                               0.653    0.671
# bpns8             0.815    0.085    9.575    0.000    0.533    0.663
# bpns9             0.761    0.082    9.243    0.000    0.497    0.634
# bpns10            0.946    0.091   10.376    0.000    0.618    0.740
# bpns11            0.777    0.089    8.710    0.000    0.508    0.590
# bpns12            0.801    0.094    8.486    0.000    0.523    0.573
# Relatedness =~                                                        
# bpns13            1.000                               0.503    0.600
# bpns14r           1.038    0.132    7.892    0.000    0.522    0.573
# bpns15            0.856    0.099    8.618    0.000    0.430    0.648
# bpns16            1.046    0.114    9.144    0.000    0.526    0.711
# bpns17            0.988    0.112    8.847    0.000    0.497    0.674
# bpns18            1.028    0.121    8.522    0.000    0.517    0.637
# Abuse =~                                                              
# ctq1              1.000                               0.366    0.438
# ctq2              0.672    0.111    6.038    0.000    0.246    0.504
# ctq3              1.678    0.234    7.184    0.000    0.614    0.748
# ctq4              2.298    0.311    7.393    0.000    0.841    0.827
# ctq5              1.984    0.270    7.346    0.000    0.726    0.806
# Neglect =~                                                            
# ctq6r             1.000                               0.826    0.899
# ctq7r             1.102    0.045   24.713    0.000    0.910    0.903
# ctq8r             1.122    0.048   23.418    0.000    0.926    0.883
# ctq9r             1.087    0.045   24.301    0.000    0.897    0.897
# ctq10r            1.100    0.048   22.718    0.000    0.908    0.872
# 
# Regressions:
#                Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# Autonomy ~                                                            
# Abuse            -0.003    0.152   -0.020    0.984   -0.002   -0.002
# Neglect          -0.328    0.067   -4.874    0.000   -0.379   -0.379
# Competence ~                                                          
# Abuse            -0.263    0.150   -1.757    0.079   -0.147   -0.147
# Neglect          -0.091    0.061   -1.475    0.140   -0.114   -0.114
# Relatedness ~                                                         
# Abuse            -0.069    0.106   -0.656    0.512   -0.051   -0.051
# Neglect          -0.248    0.049   -5.049    0.000   -0.408   -0.408
# 
# Covariances:
#                 Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# .Autonomy ~~                                                           
# .Competence        0.000                               0.000    0.000
# .Relatedness       0.000                               0.000    0.000
# .Competence ~~                                                         
# .Relatedness       0.000                               0.000    0.000
#  Abuse ~~                                                              
#  Neglect           0.158    0.029    5.408    0.000    0.522    0.522
# 
# Variances:
#                 Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# .bpns1r            0.518    0.053    9.836    0.000    0.518    0.503
# .bpns2r            0.394    0.042    9.342    0.000    0.394    0.459
# .bpns3r            0.420    0.047    9.000    0.000    0.420    0.433
# .bpns4             0.563    0.049   11.421    0.000    0.563    0.724
# .bpns5             0.516    0.046   11.342    0.000    0.516    0.708
# .bpns6r            0.592    0.053   11.101    0.000    0.592    0.664
# .bpns7             0.522    0.052   10.117    0.000    0.522    0.550
# .bpns8             0.363    0.035   10.215    0.000    0.363    0.561
# .bpns9             0.368    0.035   10.526    0.000    0.368    0.598
# .bpns10            0.316    0.035    9.020    0.000    0.316    0.453
# .bpns11            0.482    0.044   10.912    0.000    0.482    0.651
# .bpns12            0.561    0.051   11.044    0.000    0.561    0.672
# .bpns13            0.450    0.042   10.834    0.000    0.450    0.640
# .bpns14r           0.558    0.051   11.040    0.000    0.558    0.672
# .bpns15            0.256    0.025   10.379    0.000    0.256    0.580
# .bpns16            0.271    0.028    9.543    0.000    0.271    0.495
# .bpns17            0.296    0.029   10.071    0.000    0.296    0.546
# .bpns18            0.390    0.037   10.489    0.000    0.390    0.594
# .ctq1              0.565    0.047   11.915    0.000    0.565    0.808
# .ctq2              0.178    0.015   11.707    0.000    0.178    0.746
# .ctq3              0.297    0.030    9.748    0.000    0.297    0.441
# .ctq4              0.326    0.042    7.815    0.000    0.326    0.315
# .ctq5              0.284    0.034    8.456    0.000    0.284    0.350
# .ctq6r             0.161    0.017    9.714    0.000    0.161    0.191
# .ctq7r             0.187    0.020    9.591    0.000    0.187    0.184
# .ctq8r             0.242    0.024   10.152    0.000    0.242    0.220
# .ctq9r             0.195    0.020    9.787    0.000    0.195    0.195
# .ctq10r            0.260    0.025   10.396    0.000    0.260    0.240
# .Autonomy          0.438    0.068    6.412    0.000    0.856    0.856
# .Competence        0.404    0.067    6.017    0.000    0.948    0.948
# .Relatedness       0.204    0.039    5.186    0.000    0.809    0.809
#  Abuse             0.134    0.036    3.755    0.000    1.000    1.000
#  Neglect           0.682    0.068   10.091    0.000    1.000    1.000

sem() 함수의 인자는 cfa() 함수의 인자와 동일하다. 다만, 모형을 명세할 때, 구조방정식 모형은 측정 모형(measurement model)과 구조 모형(structural model)을 모두 명세해야 한다.
- 모형에서 ~ 기호는 predicted by를 의미한다. 즉, ~ 기호 앞에는 종속 변수의 이름을, ~ 기호 뒤에는 독립 변수의 이름을 입력한다. 독립 변수가 여러 개 있을 경우 + 기호로 연결하여 입력한다.
- 모형에서 Autonomy ~~ Competence는 Autonomy 와 Competence 잔차 간 공분산을 의미한다. 그런데, 이 모형에서는 Autonomy, Competence, Relatedness의 잔차 공분산 값을 모두 0으로 가정하고 있다. 잔차 공분산을 추정하지 않고, 그 값을 0으로 고정하려면 Autonomy ~~ 0 * Competence와 같이 입력하면 된다. 일반적으로, sem() 함수에서 파라미터를 특정 값에 고정시키고자 할 때는 * 기호를 사용한다.