The function penfa
fits single- and multiple-group
PENalized Factor Analysis models via a trust-region algorithm with
integrated automatic multiple tuning parameter selection.
In a single-group analysis, penfa
can automatically shrink a subset
of the factor loadings to zero. In a multiple-group analysis, it can
encourage sparse loading matrices and invariant factor loadings and
intercepts. The currently supported penalty functions are lasso, adaptive
lasso, scad, mcp, and ridge. Except for the latter, all penalties can
achieve sparsity.
penfa( model = NULL, data = NULL, group = NULL, pen.shrink = "alasso", pen.diff = "none", eta = list(shrink = c(lambda = 0.01), diff = c(none = 0)), strategy = "auto", ... )
model | A description of a user-specified model. It takes the form of a lavaan-like model syntax. See below for additional details on how to specify a model syntax. |
---|---|
data | A data frame containing the (continuous) observed variables used
in the model. Except for the |
group | Character. An optional variable name in the data frame defining the groups in a multiple-group analysis. |
pen.shrink | Character. The type of penalty function used for shrinking
a subset of the model parameters (see the |
pen.diff | Character. The type of penalty function used for shrinking
certain parameter differences across groups, and thus encouraging parameter
equivalence across groups (see the |
eta | A named list containing the starting value(s) of the tuning
parameter(s) if the automatic procedure is requested ( |
strategy | Character. The strategy used for the selection of the tuning
parameter(s). If |
... | Additional options that can be defined using |
An object of class penfa
, for which several
methods are available. See the manual pages of summary,penfa-method
,
show,penfa-method
, coef,penfa-method
, and
fitted,penfa-method
for details.
The penfa
function currently takes as input a data set, as opposed to
the sample moments (i.e., covariance matrices and mean vectors). Future
implementations will allow penfa
to additionally take as input sample
covariance matrices and sample means. For now, if only sample moments are
available, users can generate multivariate data from those sample moments,
and apply the penfa
function on the generated data.
All variables
(except for the group
variable in multiple-group analyses) are treated
as continuous.
Categorical items are not currently supported.
The model syntax in the model
argument describes the factor analysis
model to be estimated, and specifies the relationships between the observed
and latent variables (i.e., the common factors). To facilitate its
formulation, the rules for the syntax specification broadly follow the ones in
the lavaan
package.
The model syntax is composed of one or multiple formula-like expressions describing specific parts of the model. The model syntax can be specified as a literal string enclosed by single quotes as in the example below.
model_syntax <- ' # Common factors factor1 =~ x1 + x2 + x3 + x4 + x5 + x6 factor2 =~ x1 + x2 + x3 + x4 + x5 + x6 # Factor variances and covariances factor1 ~~ factor1 factor1 ~~ factor2 # Unique variances and covariances x1 ~~ x1 x1 ~~ x2 # Intercepts and factor means x1 ~ 1 factor1 ~ 1 '
Blank lines and comments can be used in between formulas, and formulas can be split over multiple lines. Multiple formulas can be placed on a single line if they are separated by a semicolon (;).
The current implementation allows for the following types of formula-like expressions in the model syntax:
Common factors: The
"=~"
operator can be used to define the continuous common factors
(latent variables). The name of the factor (e.g., factor1) is on the left of
the "=~"
operator, whereas the terms on the right (e.g., x1 + x2
+ x3 + x4 + x5 + x6
), separated by "+"
operators, are the indicators
of the factor. The operator "=~"
can be read as "is measured by".
Variances and covariances: The "~~"
("double tilde") operator
specifies the (residual) variance of an observed or latent variable, or a set
of covariances between one variable, and several other variables (either
observed or latent). The distinction between variances and residual variances
is made automatically. Covariances between unique factors are currently only
allowed when information = "fisher"
.
Intercepts and factor means: We can specify an intercept for an
observed variable (x1 ~ 1
) or a common factor (factor1 ~ 1
).
The variable name appears on the left of the "~"
operator. On the
right-hand side, there is the number "1", which stands for the
intercept/mean. Including an intercept/mean formula in the model
automatically implies meanstructure = TRUE
. The distinction between
observed variable intercepts and factor means is made automatically.
Usually, only a single variable name appears on the left side of an operator. However, if multiple variable names are specified, separated by the "+" operator, the formula is repeated for each element on the left side. For instance, the formula
x1 + x2 + x3 + x4 ~ 1
specifies an intercept for variables x1, x2, x3
and x4
.
On the right-hand side of these formula-like expressions, each element can be
modified (using the "*"
operator) by a numeric constant or the special
function start(). This provides the user with a mechanism to fix
parameters and provide alternative starting values, respectively. All
"*"
expressions are referred to as modifiers, and are explained in
detail in the sections below.
Each parameter in a model is automatically given a name consisting of three
parts, that are coerced to a single character vector. The first part is the
name of the variable on the left-hand side of the formula where the parameter
is implied. The middle part is based on the special "operator" used in the
formula (e.g., "=~"
, "~"
or "~~"
). The third part is the
name of the variable on the right-hand side of the formula where the
parameter is implied, or "1" if it is an intercept. The three parts are
pasted together in a single string. For example, the name of the factor
loading of x2
on factor1
is the string "factor1~x2"
.
The name of the parameter corresponding to the factor covariance between
factor1
and factor2
is the string "factor1~~factor2"
.
It is often desirable to fix a model parameter that is otherwise (by default) estimated. Any parameter in a model can be fixed by using a modifier resulting in a numerical constant. For instance:
Fixing factor loadings for scale setting or identification restrictions:
factor1 ~ 0.8*x1 + x2 + x3 + 0*x4 + x5 + x6 factor2 ~ 0*x1 + x2 + x3 + 0.8*x4 + x5 + x6
Specifying an orthogonal (zero) covariance between two factors:
factor1 ~~ 0*factor2
Notice that multiplying a certain parameter by NA
forces it to be
estimated.
User-defined starting values can be provided through the special function
start(), containing a numeric constant. For instance, the formula below
provides a starting value equal to 0.8 to the loading of x2
on
factor1
.
factor1 ~ x1 + start(0.8)*x2 + x3 + x4 + x5 + x6
In a multiple group factor analysis, the modifiers containing a single element
should be replaced by a vector of the same length as the number of groups.
If a single element is provided, it is used for all groups. In the
example below with two groups, the factor loadings of x1
on
factor1
are fixed to 0.8 in both groups, whereas the factor loadings
of x4
are fixed to 0.75 and 0.85 in the first and second group,
respectively.
multigroup_syntax <- ' factor1 ~ 0.8*x1 + x2 + x3 + x4 + x5 + x6 factor2 ~ x1 + x2 + x3 + c(0.75, 0.85)*x4 + x5 + x6 '
Penalized factor analysis allows to produce parsimonious models using largely an automated procedure. The use of sparsity-inducing penalty functions leads to optimally sparse factor structures supported by the data. The resulting models are less prone to instability in the estimation process and are easier to interpret and generalize than their unpenalized counterparts. Multiple-group penalized factor analysis can be used to automatically ascertain the differences and similarities of parameter estimates across groups.
In penfa
, estimation is achieved via a penalized likelihood-based
framework that builds upon differentiable approximations of
non-differentiable penalties, a theoretically founded definition of degrees
of freedom, and an algorithm with automatic multiple tuning parameter
selection (see section below for details).
The penfa
function uses a
trust-region
algorithm approach. This strategy constructs a model function whose behavior
near the current point and within a trust-region (usually a ball) is similar
to that of the actual objective function. The algorithm exploits second-order
analytical derivative information. This can come in the form of the penalized
Hessian matrix (if information = "hessian"
) or the penalized Fisher
information matrix (if information = "fisher"
). Models with a
mean structure can be only estimated with the penalized Fisher information
matrix, which exhibits similar performances to the penalized Hessian at a
reduced computational cost.
The selection of the tuning parameters is a crucial issue in penalized estimation strategies, as the tuning parameters are responsible for the optimal balance between goodness of fit and sparsity.
The penalized framework discussed above is easily integrated with automatic
multiple tuning parameter selection (if strategy = "auto"
). The tuning
parameters are chosen to minimize an approximate AIC. See below for
additional details on how to introduce more sparsity, if desired. The
automatic procedure is fast, efficient, and scales well with the number of
tuning parameters. It also eliminates the need for time-consuming and
computationally intensive grid-searches.
Note: Only lasso, adaptive lasso and ridge penalties can be used with the automatic procedure.
The automatic procedure returns the optimal value of the tuning parameter.
Notice, however, that the parameter estimates from this model will slightly
differ from the ones one would obtain by setting strategy = "fixed"
and
eta
equal to that optimal tuning value. This is due to the different
starting values employed in the two scenarios. In the automatic procedure,
the starting values of the final model come from the ones of the previous
model in the optimization loop; in the fixed-tuning context, the starting
values come from the default ones in penfa
.
If strategy = "fixed"
, penfa
estimates a penalized factor model
with the value of the tuning parameter stored in eta
. This is useful
if users wish to make multiple calls to the penfa
function using a
range of values for the tuning parameter. Then, the optimal penalized model
can be picked on the basis of information criteria, which are easily computed
by calling the AIC
and BIC
functions. It is often convenient
to use the (Generalized) Bayesian Information Criterion as a selector, due to
its recurrent use in sparse settings.
These information criteria use the theoretical definition of the effective
degrees of freedom (edf) as their bias terms. This is because the use
of differentiable penalty approximations make the objective function
twice-continuously differentiable. The total edf
are as the sum of the
effective degree of freedom for each model parameter, which in turn ranges
from 0 to 1 and quantifies the extend to which each parameter has been
penalized.
The penfa
function penalizes every element in the parameter
matrix/vector specified in the eta
argument. For instance, if
eta = list("shrink" = c("lambda" = 0.01), "diff" = c("none" = 0))
all
factor loadings are penalized through a shrinkage penalty.
It may be beneficial to try out different penalties, and see which one works best for the problem at hand. It is also useful to keep the following in mind:
Shrinkage: lasso, alasso, scad, and mcp are able to shrink parameters to zero, contrarily to the ridge penalty whose purpose is just regularizing the estimation process.
Unbiasedness: alasso, scad, and mcp enjoy the so-called "oracle" property. On the contrary, the lasso is biased since it applies the same penalization to all parameters.
Automatic procedure: only lasso, alasso, and ridge are supported by the automatic procedure. This means that these penalties are a convenient choice with all the analyses requiring multiple penalty terms (e.g., multiple-group analyses), for which the automatic procedure is the only feasible alternative to otherwise computationally intensive multi-dimensional grid-searches.
Geminiani, Marra, and Moustaki (2021) performed numerical and
empirical examples to evaluate and compare the performance of single- and
multiple-group penalized factor models under different penalty functions. The
alasso penalty generally produced the best trade-off between sparsity and
goodness of fit. However, unlike other penalties, the alasso requires a set
of adaptive weights. In some situations, the weights might not be available,
or might be difficult to obtain. If this is the case, users are encouraged to
resort to simpler penalties.
The penalized model automatically tries to generate the optimal trade-off
between goodness of fit and model complexity (if strategy = "auto"
).
As a result of this delicate balance, it may not provide the sparsest factor
solution. If users desire more sparsity, they can follow the guidelines
below.
Influence factor: increase the value of the influence factor stored in
the option gamma
. As a rule of thumb, in our experience, common values
for obtaining sparse solutions usually range between 3.5 and 4.5.
Penalties: some penalties rely on a second tuning parameter. It may be
helpful to try out different values for it, and see which one performs best.
For instance, increasing the value or the exponent of the alasso (by
specifying, for instance, a.alasso = 2
) leads to sparser solutions.
In case users fitted a penalized model with a fixed tuning parameter
(strategy = "fixed"
), they can manually and subjectively increase its
value in the option eta
to encourage more sparsity. When doing so, it
is helpful to first do some trials and understand a reasonable range of
values that the tuning parameter can take.
If strategy = "fixed"
, pen.shrink = "none"
, pen.diff =
"none"
, and eta = list("shrink" = c("none" = 0), "diff" = c("none" =
0))
, no penalization is applied, and the model is estimated through ordinary
maximum likelihood.
The function penfa
internally assesses the convergence of the fitted
model, and the admissibility of the final solution.
The convergence checks assess whether the penalized gradient vector is close
to zero and the penalized Hessian/Fisher information matrix is positive
definite. In case of convergence issues, penfa
warns the users with
explanatory messages.
Note: Due to the presence of possibly
multiple penalty terms, our experiments highlighted that the penalized
gradient need not be strictly close to zero to obtain meaningful results. It
is enough that its elements do not exceed a pre-specified threshold, whose
value can be changed through the optim.dx.tol
option.
The admissibility checks are carried out to determine whether the final
solution is admissible. Specifically, the penfa
function
sequentially checks whether:
The final model includes any negative unique variances (Heywood cases);
The final model includes any negative factor variances;
The estimated common factor covariance matrix is positive definite;
The estimated unique factor covariance matrix is positive definite;
The estimated factor loading matrix is of full column rank;
The estimated factor loading matrix does not contain any null rows.
In case of multiple-group analyses, the function checks the admissibility of
the parameter matrices of each group. If any of the above conditions are not
satisfied, the penfa
function warns the user with explanatory
messages on the reasons why.
Occasionally the penfa
function may print out warnings or produce
errors. If the errors concern convergence issues, it may be helpful to go
through the following steps:
Identification: please make sure that at least the minimum identification restrictions are satisfied. This implies fixing the scale and the origin of every factor in each group. In addition, other constraints - which usually come in the form of zero-restricted loadings - are necessary due to rotational freedom.
Starting values: the choice of the starting values is of paramount
importance when it comes to convergence. The starting values internally used
by penfa
correspond to the ones used by the
lavaan
package for
confirmatory factor analysis
. If users have some prior knowledge or
intuition about possible values for some of the parameters, it might be
beneficial to include this information by providing the starting values for
those parameters in the syntax specification (see below for additional
details). For instance, depending on the case, specifying the starting values
of the primary loadings equal to 1 (start(1)*x1 + ...
) often results
in more stable optimization processes, especially when dealing with
complicated models that require the estimation of many parameters, as in
multiple-group penalized factor analysis.
Sample size: the penalized models fitted by penfa
have
a larger number of parameters than confirmatory factor analytic applications.
This complexity should be accompanied by a reasonable sample size. If the
sample size is too small for the complexity of the model, convergence issues
will arise. In case of small sample sizes, it might in principle be more
reliable to select the tuning parameter through a grid-search with the GBIC
instead of using the automatic procedure.
Automatic procedure: if the starting values of the tuning parameters prevent the automatic procedure from finding the optimal estimates of the tuning parameters, the procedure is repeated with different starting values. If this fails, an error is printed out.
Adaptive weights: when using the alasso penalty, it is suggested to
manually provide a vector of adaptive weights, especially for complex models.
The adaptive weights often come in the form of (unpenalized) maximum
likelihood estimates. If no vector of weights is provided, the penfa
function internally estimates an unpenalized MLE model whose parameter
estimates will serve as weights. If the unpenalized model does not converge,
the penfa
function internally estimates a ridge-regularized factor
model and uses the resulting estimates as weights. If even this estimation
fails, an error is printed out.
Ultimately, if none of the above succeeds, users shall consider re-specifying the model, either by simplifying the hypothesized factor structure or considering a subset of the observed variables. Increasing the number of restrictions (for instance, by specifying some additional fixed loadings) might be advantageous. Also, as a general practice, when conducting a multiple-group analysis, make sure beforehand that the groups share similar factor structures: if the groups have different factor configurations, the final results will be distorted.
It is always important to assess whether the distributional assumptions of
the normal linear factor model hold. The penfa
function fits penalized
factor models to continuous observed variables; this excludes categorical
items or items with a few number of categories that would instead require
tailored approaches that specifically take into account the qualitative
nature of the data.
The standard errors are derived from the inverse of the penalized Fisher
information matrix (if information = "fisher"
) or penalized Hessian
(if information = "hessian"
), which relies on the Bayesian result for
the covariance matrix of the estimated parameters. The implemented framework
allows to have a standard error for every model parameter. However, users
should take extra caution when using the standard errors associated with the
penalized parameters that were shrunken to zero.
To learn more about penfa
, start with the vignettes and tutorials at
browseVignettes(package = "penfa")
and
https://egeminiani.github.io/penfa/articles/.
Geminiani, E., Marra, G., & Moustaki, I. (2021). "Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection." Psychometrika, 86(1), 65-95. doi: 10.1007/s11336-021-09751-8
Geminiani E. (2020), "A penalized likelihood-based framework for single and multiple-group factor analysis models" (Doctoral dissertation, University of Bologna). Available at http://amsdottorato.unibo.it/9355/.
Elena Geminiani geminianielena@gmail.com.
data(ccdata) ### Single-group analysis (no mean-structure, unit factor variances) syntax = 'help =~ h1 + h2 + h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5 voice =~ 0*h1 + h2 + h3 + h4 + h5 + h6 + h7 + v1 + v2 + v3 + v4 + v5' alasso_fit <- penfa(## factor model model = syntax, data = ccdata, std.lv = TRUE, ## penalization pen.shrink = "alasso", eta = list(shrink = c("lambda" = 0.01), diff = c("none" = 0)), ## automatic procedure strategy = "auto", gamma = 4)#> Computing weights for alasso (ML estimates)... done. #> #> Automatic procedure: #> Iteration 1 : 0.00298271 #> Iteration 2 : 0.00452604 #> #> Largest absolute gradient value: 12.76355181 #> Fisher information matrix is positive definite #> Eigenvalue range: [180.2917, 9189645] #> Trust region iterations: 15 #> Factor solution: admissible #> Effective degrees of freedom: 27.12936### Multiple-group analysis (mean structure, marker-variable approach, starting values) syntax_mg = ' help =~ 1*h1 + h2 + h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5 voice =~ 0*h1 + start(0)*h2 + start(0)*h3 + h4 + h5 + h6 + h7 + 1*v1 + v2 + v3 + v4 + v5 h1 + v1 ~ 0*1 ' # Compute weights for alasso from unpenalized model mle_fitMG <- penfa(model = syntax_mg, data = ccdata, group = "country", int.ov.free = TRUE, int.lv.free = TRUE, pen.shrink = "none", pen.diff = "none", eta = list(shrink = c("lambda" = 0), diff = c("none" = 0)), strategy = "fixed")#> #> Largest absolute gradient value: 0.00108292 #> Fisher information matrix is positive definite #> Eigenvalue range: [2.904073, 76169.99] #> Trust region iterations: 22 #> Factor solution: admissible #> Computing VCOV ... done. #> Effective degrees of freedom: 94mle_weightsMG <- coef(mle_fitMG) # Fit model alasso_fitMG <- penfa(## factor model model = syntax_mg, data = ccdata, group = "country", int.ov.free = TRUE, int.lv.free = TRUE, ## penalization pen.shrink = "alasso", pen.diff = "alasso", eta = list(shrink = c("lambda" = 0.01), diff = c("lambda" = 0.1, "tau" = 0.01)), ## automatic procedure strategy = "auto", gamma = 4, ## alasso weights = mle_weightsMG)#> #> Automatic procedure: #> Iteration 1 : 0.00215674 6115.68152854 0.00300201 #> Iteration 2 : 0.00196778 4011.40169767 0.00772393 #> #> Largest absolute gradient value: 61.89552041 #> Fisher information matrix is positive definite #> Eigenvalue range: [53.00923, 1.174025e+12] #> Trust region iterations: 18 #> Factor solution: admissible #> Effective degrees of freedom: 56.75744### For additional examples, see the vignettes and tutorials at ### browseVignettes(package = "penfa") and https://egeminiani.github.io/penfa/articles/