vignettes/articles/plotting-penalty-matrix.Rmd
plotting-penalty-matrix.Rmd
Aim. This vignette shows how to plot the penalty matrices from the single- and multiple-group penalized factor models estimated in vignette("automatic-tuning-selection")
and “multiple-group-analysis”.
Data. For illustration purposes, we use the cross-cultural data set ccdata
containing the standardized ratings to 12 items concerning organizational citizenship behavior. Employees from different countries were asked to rate their attitudes towards helping other employees and giving suggestions for improved work conditions. The items are thought to measure two latent factors: help, defined by the first seven items (h1
to h7
), and voice, represented by the last five items (v1
to v5
). See ?ccdata
for details.
This data set is a standardized version of the one in the ccpsyc
package, and only considers employees from Lebanon and Taiwan (i.e., "LEB"
, "TAIW"
). This vignette is meant as a demo of the capabilities of penfa
; please refer to Fischer et al. (2019) and Fischer and Karl (2019) for a description and analysis of these data.
Let us load and inspect ccdata
.
library(penfa)
data(ccdata)
summary(ccdata)
## country h1 h2 h3 h4
## Length:767 Min. :-2.62004 Min. :-2.9034 Min. :-2.63082 Min. :-3.0441
## Class :character 1st Qu.:-0.69516 1st Qu.:-0.2163 1st Qu.:-0.70356 1st Qu.:-0.2720
## Mode :character Median :-0.05354 Median : 0.4554 Median :-0.06114 Median : 0.4211
## Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.58809 3rd Qu.: 0.4554 3rd Qu.: 0.58128 3rd Qu.: 0.4211
## Max. : 1.22971 Max. : 1.1272 Max. : 1.22370 Max. : 1.1141
## h5 h6 h7 v1 v2
## Min. :-2.9105 Min. :-2.9541 Min. :-2.8364 Min. :-2.627694 Min. :-2.674430
## 1st Qu.:-0.8662 1st Qu.:-0.9092 1st Qu.:-0.7860 1st Qu.:-0.660770 1st Qu.:-0.671219
## Median : 0.4966 Median : 0.4541 Median :-0.1025 Median :-0.005129 Median :-0.003482
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.4966 3rd Qu.: 0.4541 3rd Qu.: 0.5810 3rd Qu.: 0.650512 3rd Qu.: 0.664255
## Max. : 1.1781 Max. : 1.1358 Max. : 1.2645 Max. : 1.306154 Max. : 1.331992
## v3 v4 v5
## Min. :-2.65214 Min. :-2.65722 Min. :-2.51971
## 1st Qu.:-0.68800 1st Qu.:-0.68041 1st Qu.:-0.61127
## Median :-0.03329 Median :-0.02148 Median : 0.02488
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.62142 3rd Qu.: 0.63746 3rd Qu.: 0.66103
## Max. : 1.27613 Max. : 1.29639 Max. : 1.29718
Let us fit the penalized factor model with alasso and automatic tuning procedure as described in vignette("automatic-tuning-selection")
.
# Specify syntax
syntax = 'help =~ h1 + h2 + h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5
voice =~ 0*h1 + h2 + h3 + h4 + h5 + h6 + h7 + v1 + v2 + v3 + v4 + v5'
# Get adaptive weights
mle.fit <- penfa(## factor model
model = syntax,
data = ccdata,
std.lv = TRUE,
## (no) penalization
pen.shrink = "none",
eta = list(shrink = c("none" = 0), diff = c("none" = 0)),
strategy = "fixed",
verbose = FALSE)
mle.weights <- coef(mle.fit)
# Model fit
alasso.fit <- penfa(## factor model
model = syntax,
data = ccdata,
std.lv = TRUE,
## penalization
pen.shrink = "alasso",
eta = list(shrink = c("lambda" = 0.01), diff = c("none" = 0)),
## automatic procedure
strategy = "auto",
gamma = 4,
## alasso
weights = mle.weights,
verbose = FALSE)
alasso.fit
## penfa 0.1.1 reached convergence
##
## Number of observations 767
##
## Estimator PMLE
## Optimization method trust-region
## Information fisher
## Strategy auto
## Number of iterations (total) 58
## Number of two-steps (automatic) 2
## Effective degrees of freedom 27.129
##
## Penalty function:
## Sparsity alasso
##
##
The penalty matrix can be extracted through the penmat
function.
alasso_penmat <- penmat(alasso.fit)
Applying the plot
method to the penalty matrix allows us to visualize an interactive heatmap of the log of the absolute value of the estimated penalty matrix. Due to space constraints, it may occur that some of the parameter labels on the axes are hidden. If it is the case, users can zoom in on the area of interest and inspect the corresponding penalty values.
plot(alasso_penmat)
Let us fit the penalized multiple-group factor model with alasso and automatic multiple tuning procedure as described in “multiple-group-analysis”.
# Specify syntax
syntax.mg = 'help =~ 1*h1 + h2 + h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5
voice =~ 0*h1 + start(0)*h2 + start(0)*h3 + h4 + h5 + h6 + h7 + 1*v1 + v2 + v3 + v4 + v5
h2 + h3 + h4 + h5 + h6 + h7 + v2 + v3 + v4 + v5 ~ 1
help ~ NA*1
voice ~ NA*1 '
# Get adaptive weights
mle.fitMG <- penfa(## factor model
model = syntax.mg,
data = ccdata,
group = "country",
## (no) penalization
pen.shrink = "none",
pen.diff = "none",
eta = list(shrink = c("lambda" = 0), diff = c("none" = 0)),
strategy = "fixed",
verbose = FALSE)
mle.weightsMG <- coef(mle.fitMG)
# Model fit
alasso.fitMG <- penfa(## factor model
model = syntax.mg,
data = ccdata,
group = "country",
int.lv.free = TRUE,
## penalization
pen.shrink = "alasso",
pen.diff = "alasso",
eta = list(shrink = c("lambda" = 0.01),
diff = c("lambda" = 0.1, "tau" = 0.01)),
## automatic procedure
strategy = "auto",
gamma = 4,
## alasso
weights = mle.weightsMG)
##
## Automatic procedure:
## Iteration 1 : 0.00215674 827.66749181 0.00300201
## Iteration 2 : 0.00196781 29643.22874071 0.00772724
##
## Largest absolute gradient value: 62.07895697
## Fisher information matrix is positive definite
## Eigenvalue range: [53.00926, 8.675704e+12]
## Trust region iterations: 21
## Factor solution: admissible
## Effective degrees of freedom: 56.75804
The complete penalty matrix is stored in alasso.fit@Penalize@Sh.info$S.h
, but it can be easily extracted via the penmat
function. This matrix is the sum of the penalty matrices for Penalty 1 (sparsity.penmat
), Penalty 2 (loadinvariance.penmat
), and Penalty 3 (intinvariance.penmat
). Unique variances, factor (co)variances and factor means were not affected by the penalization, so their entries in the penalty matrices are equal to zero.
Through the plot
method, we can visualize an interactive heatmap of the log of the absolute value of each estimated penalty matrix (because of the wide element range). Due to space constraints given by the large number of parameters, some parameter labels on the axes may be hidden. In this case, users can zoom in on the area of interest and inspect the penalty values for the parameters of interest.
The above penalty matrix is the sum of the following three individuals penalty matrices.
This penalty matrix shrinks the small factor loadings of each group to zero. Apart from the group loadings, all the remaining entries of the penalty matrix are equal to zero.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] penfa_0.1.1
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-152 matrixStats_0.59.0 fs_1.5.0
## [4] httr_1.4.2 rprojroot_2.0.2 evd_2.3-3
## [7] numDeriv_2016.8-1.1 tools_4.1.0 utf8_1.2.1
## [10] R6_2.5.0 lazyeval_0.2.2 rgeos_0.5-5
## [13] DBI_1.1.1 mgcv_1.8-35 colorspace_2.0-2
## [16] sp_1.4-5 tidyselect_1.1.1 mnormt_2.0.2
## [19] compiler_4.1.0 pspline_1.0-18 textshaping_0.3.5
## [22] desc_1.3.0 plotly_4.9.4.1 distrEx_2.8.0
## [25] scales_1.1.1 sfsmisc_1.1-11 mvtnorm_1.1-2
## [28] psych_2.1.6 pkgdown_1.6.1 systemfonts_1.0.2
## [31] stringr_1.4.0 digest_0.6.27 rmarkdown_2.9
## [34] trustOptim_0.8.6.2 pkgconfig_2.0.3 htmltools_0.5.1.1
## [37] VineCopula_2.4.2 fastmap_1.1.0 stabledist_0.7-1
## [40] ADGofTest_0.3 htmlwidgets_1.5.3 rlang_0.4.11
## [43] VGAM_1.1-5 cartography_3.0.0 farver_2.1.0
## [46] generics_0.1.0 jsonlite_1.7.2 crosstalk_1.1.1
## [49] dplyr_1.0.7 magrittr_2.0.1 scam_1.2-11
## [52] Matrix_1.3-3 Rcpp_1.0.7 munsell_0.5.0
## [55] fansi_0.5.0 abind_1.4-5 distr_2.8.0
## [58] lifecycle_1.0.0 stringi_1.7.3 yaml_2.2.1
## [61] MASS_7.3-54 gamlss.dist_5.3-2 matrixcalc_1.0-4
## [64] grid_4.1.0 parallel_4.1.0 crayon_1.4.1
## [67] lattice_0.20-44 splines_4.1.0 startupmsg_0.9.6
## [70] tmvnsim_1.0-2 knitr_1.33 ismev_1.42
## [73] pillar_1.6.1 stats4_4.1.0 GJRM_0.2-4
## [76] magic_1.5-9 glue_1.4.2 evaluate_0.14
## [79] trust_0.1-8 mitools_2.4 data.table_1.14.0
## [82] vctrs_0.3.8 tidyr_1.1.3 gtable_0.3.0
## [85] purrr_0.3.4 cachem_1.0.5 ggplot2_3.3.5
## [88] xfun_0.24 Rmpfr_0.8-4 survey_4.1-1
## [91] viridisLite_0.4.0 ragg_1.1.3 survival_3.2-11
## [94] pcaPP_1.9-74 gsl_2.1-6 tibble_3.1.2
## [97] copula_1.0-1 memoise_2.0.0 gmp_0.6-2
## [100] ellipsis_0.3.2
Fischer, R., Ferreira, M. C., Van Meurs, N. et al. (2019). “Does Organizational Formalization Facilitate Voice and Helping Organizational Citizenship Behaviors? It Depends on (National) Uncertainty Norms.” Journal of International Business Studies, 50(1), 125-134. https://doi.org/10.1057/s41267-017-0132-6
Fischer, R., & Karl, J. A. (2019). “A Primer to (Cross-Cultural) Multi-Group Invariance Testing Possibilities in R.” Frontiers in psychology, 10, 1507. https://doi.org/10.3389/fpsyg.2019.01507
Geminiani, E. (2020). “A Penalized Likelihood-Based Framework for Single and Multiple-Group Factor Analysis Models.” PhD thesis, University of Bologna. http://amsdottorato.unibo.it/9355/
Geminiani, E., Marra, G., & Moustaki, I. (2021). “Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection.” Psychometrika, 86(1), 65-95. https://doi.org/10.1007/s11336-021-09751-8