Single- and multiple-group penalized factor analysis

The function penfa fits single- and multiple-group PENalized Factor Analysis models via a trust-region algorithm with integrated automatic multiple tuning parameter selection.

In a single-group analysis, penfa can automatically shrink a subset of the factor loadings to zero. In a multiple-group analysis, it can encourage sparse loading matrices and invariant factor loadings and intercepts. The currently supported penalty functions are lasso, adaptive lasso, scad, mcp, and ridge. Except for the latter, all penalties can achieve sparsity.

penfa(
  model = NULL,
  data = NULL,
  group = NULL,
  pen.shrink = "alasso",
  pen.diff = "none",
  eta = list(shrink = c(lambda = 0.01), diff = c(none = 0)),
  strategy = "auto",
  ...
)

Arguments

model	A description of a user-specified model. It takes the form of a lavaan-like model syntax. See below for additional details on how to specify a model syntax.
data	A data frame containing the (continuous) observed variables used in the model. Except for the `group` variable, all variables are treated as numeric.
group	Character. An optional variable name in the data frame defining the groups in a multiple-group analysis.
pen.shrink	Character. The type of penalty function used for shrinking a subset of the model parameters (see the `eta` argument for details on how to specify which model parameters shall be penalized). Possible values for `pen.shrink` are "lasso", "alasso" (i.e., adaptive lasso), "scad" (i.e., smoothly clipped absolute deviation), "mcp" (i.e., minimax concave penalty), "ridge", and "none" in case of no shrinkage penalization.
pen.diff	Character. The type of penalty function used for shrinking certain parameter differences across groups, and thus encouraging parameter equivalence across groups (see the `eta` argument for details on how to specify which model parameters shall be encouraged to be equivalent). Possible values for `pen.diff` are "lasso", "alasso" (i.e., adaptive lasso), "scad" (i.e., smoothly clipped absolute deviation), "mcp" (i.e., minimax concave penalty), "ridge", and "none" in case of no difference penalization. Note that the specification of `pen.diff` is only valid for multiple-group factor analyses when a `group` variable is defined. If a difference penalty is requested, the groups must have the same parameters.
eta	A named list containing the starting value(s) of the tuning parameter(s) if the automatic procedure is requested (`strategy = "auto"`) or the fixed value(s) of the tuning parameter(s) to be used during optimization if `strategy = "fixed"`. The list has two components with names "shrink" and "diff", which refer to the tuning parameters to be used for shrinkage and group equivalence, respectively. The components of the list are, in turn, named vectors specifying the type of parameter matrices or vectors to be penalized. Common choices are "lambda" for the loading matrix and "tau" for the intercept vector of the observed variables. Other possible values are "phi" for the factor covariance matrix, "psi" for the covariance matrix of the unique factors, and "kappa" for the factor means. All non-fixed elements of the specified matrix/vector are penalized. When `strategy = "fixed"` and the tuning values in `eta` are equal to zero, specifying both list names as "none" results in ordinary maximum likelihood estimation (no penalization).
strategy	Character. The strategy used for the selection of the tuning parameter(s). If `strategy = "auto"`, the optimal values of the tuning parameters are determined via an automatic tuning parameter procedure; if `strategy = "fixed"`, a penalized factor model with the values of the tuning parameters stored in the option `eta` is estimated.
...	Additional options that can be defined using `name = "value"`. For a complete list, please refer to `penfaOptions`.

Value

An object of class penfa, for which several methods are available. See the manual pages of summary,penfa-method, show,penfa-method, coef,penfa-method, and fitted,penfa-method for details.

Data set vs Sample Moments

The penfa function currently takes as input a data set, as opposed to the sample moments (i.e., covariance matrices and mean vectors). Future implementations will allow penfa to additionally take as input sample covariance matrices and sample means. For now, if only sample moments are available, users can generate multivariate data from those sample moments, and apply the penfa function on the generated data.
All variables (except for the group variable in multiple-group analyses) are treated as continuous.
Categorical items are not currently supported.

Model syntax

The model syntax in the model argument describes the factor analysis model to be estimated, and specifies the relationships between the observed and latent variables (i.e., the common factors). To facilitate its formulation, the rules for the syntax specification broadly follow the ones in the lavaan package.

The model syntax is composed of one or multiple formula-like expressions describing specific parts of the model. The model syntax can be specified as a literal string enclosed by single quotes as in the example below.

model_syntax <- '
                # Common factors
                factor1 =~ x1 + x2 + x3 + x4 + x5 + x6
                factor2 =~ x1 + x2 + x3 + x4 + x5 + x6

                # Factor variances and covariances
                factor1 ~~ factor1
                factor1 ~~ factor2

                # Unique variances and covariances
                x1 ~~ x1
                x1 ~~ x2

                # Intercepts and factor means
                x1 ~ 1
                factor1 ~ 1
                '

Blank lines and comments can be used in between formulas, and formulas can be split over multiple lines. Multiple formulas can be placed on a single line if they are separated by a semicolon (;).

The current implementation allows for the following types of formula-like expressions in the model syntax:

Common factors: The "=~" operator can be used to define the continuous common factors (latent variables). The name of the factor (e.g., factor1) is on the left of the "=~" operator, whereas the terms on the right (e.g., x1 + x2 + x3 + x4 + x5 + x6), separated by "+" operators, are the indicators of the factor. The operator "=~" can be read as "is measured by".
Variances and covariances: The "~~" ("double tilde") operator specifies the (residual) variance of an observed or latent variable, or a set of covariances between one variable, and several other variables (either observed or latent). The distinction between variances and residual variances is made automatically. Covariances between unique factors are currently only allowed when information = "fisher".
Intercepts and factor means: We can specify an intercept for an observed variable (x1 ~ 1) or a common factor (factor1 ~ 1). The variable name appears on the left of the "~" operator. On the right-hand side, there is the number "1", which stands for the intercept/mean. Including an intercept/mean formula in the model automatically implies meanstructure = TRUE. The distinction between observed variable intercepts and factor means is made automatically.

Usually, only a single variable name appears on the left side of an operator. However, if multiple variable names are specified, separated by the "+" operator, the formula is repeated for each element on the left side. For instance, the formula

 x1 + x2 + x3 + x4 ~ 1

specifies an intercept for variables x1, x2, x3 and x4.

On the right-hand side of these formula-like expressions, each element can be modified (using the "*" operator) by a numeric constant or the special function start(). This provides the user with a mechanism to fix parameters and provide alternative starting values, respectively. All "*" expressions are referred to as modifiers, and are explained in detail in the sections below.

Each parameter in a model is automatically given a name consisting of three parts, that are coerced to a single character vector. The first part is the name of the variable on the left-hand side of the formula where the parameter is implied. The middle part is based on the special "operator" used in the formula (e.g., "=~", "~" or "~~"). The third part is the name of the variable on the right-hand side of the formula where the parameter is implied, or "1" if it is an intercept. The three parts are pasted together in a single string. For example, the name of the factor loading of x2 on factor1 is the string "factor1~x2". The name of the parameter corresponding to the factor covariance between factor1 and factor2 is the string "factor1~~factor2".

Fixing parameters

It is often desirable to fix a model parameter that is otherwise (by default) estimated. Any parameter in a model can be fixed by using a modifier resulting in a numerical constant. For instance:

Fixing factor loadings for scale setting or identification restrictions:

factor1 ~ 0.8*x1 + x2 + x3 +   0*x4 + x5 + x6
factor2 ~   0*x1 + x2 + x3 + 0.8*x4 + x5 + x6

Specifying an orthogonal (zero) covariance between two factors:
```
factor1 ~~ 0*factor2
```

Notice that multiplying a certain parameter by NA forces it to be estimated.

Starting values

User-defined starting values can be provided through the special function start(), containing a numeric constant. For instance, the formula below provides a starting value equal to 0.8 to the loading of x2 on factor1.

factor1 ~ x1 + start(0.8)*x2 + x3 + x4 + x5 + x6

Multiple groups

In a multiple group factor analysis, the modifiers containing a single element should be replaced by a vector of the same length as the number of groups. If a single element is provided, it is used for all groups. In the example below with two groups, the factor loadings of x1 on factor1 are fixed to 0.8 in both groups, whereas the factor loadings of x4 are fixed to 0.75 and 0.85 in the first and second group, respectively.

multigroup_syntax <- '
 factor1 ~  0.8*x1 + x2 + x3 +               x4 + x5 + x6
 factor2 ~      x1 + x2 + x3 + c(0.75, 0.85)*x4 + x5 + x6 '

Algorithm

Penalized factor analysis allows to produce parsimonious models using largely an automated procedure. The use of sparsity-inducing penalty functions leads to optimally sparse factor structures supported by the data. The resulting models are less prone to instability in the estimation process and are easier to interpret and generalize than their unpenalized counterparts. Multiple-group penalized factor analysis can be used to automatically ascertain the differences and similarities of parameter estimates across groups.

In penfa, estimation is achieved via a penalized likelihood-based framework that builds upon differentiable approximations of non-differentiable penalties, a theoretically founded definition of degrees of freedom, and an algorithm with automatic multiple tuning parameter selection (see section below for details).

The penfa function uses a trust-region algorithm approach. This strategy constructs a model function whose behavior near the current point and within a trust-region (usually a ball) is similar to that of the actual objective function. The algorithm exploits second-order analytical derivative information. This can come in the form of the penalized Hessian matrix (if information = "hessian") or the penalized Fisher information matrix (if information = "fisher"). Models with a mean structure can be only estimated with the penalized Fisher information matrix, which exhibits similar performances to the penalized Hessian at a reduced computational cost.

Tuning parameter selection

The selection of the tuning parameters is a crucial issue in penalized estimation strategies, as the tuning parameters are responsible for the optimal balance between goodness of fit and sparsity.

Automatic procedure

The penalized framework discussed above is easily integrated with automatic multiple tuning parameter selection (if strategy = "auto"). The tuning parameters are chosen to minimize an approximate AIC. See below for additional details on how to introduce more sparsity, if desired. The automatic procedure is fast, efficient, and scales well with the number of tuning parameters. It also eliminates the need for time-consuming and computationally intensive grid-searches.

Note: Only lasso, adaptive lasso and ridge penalties can be used with the automatic procedure.

The automatic procedure returns the optimal value of the tuning parameter. Notice, however, that the parameter estimates from this model will slightly differ from the ones one would obtain by setting strategy = "fixed" and eta equal to that optimal tuning value. This is due to the different starting values employed in the two scenarios. In the automatic procedure, the starting values of the final model come from the ones of the previous model in the optimization loop; in the fixed-tuning context, the starting values come from the default ones in penfa.

Grid-search

If strategy = "fixed", penfa estimates a penalized factor model with the value of the tuning parameter stored in eta. This is useful if users wish to make multiple calls to the penfa function using a range of values for the tuning parameter. Then, the optimal penalized model can be picked on the basis of information criteria, which are easily computed by calling the AIC and BIC functions. It is often convenient to use the (Generalized) Bayesian Information Criterion as a selector, due to its recurrent use in sparse settings.

These information criteria use the theoretical definition of the effective degrees of freedom (edf) as their bias terms. This is because the use of differentiable penalty approximations make the objective function twice-continuously differentiable. The total edf are as the sum of the effective degree of freedom for each model parameter, which in turn ranges from 0 to 1 and quantifies the extend to which each parameter has been penalized.

Penalization

The penfa function penalizes every element in the parameter matrix/vector specified in the eta argument. For instance, if eta = list("shrink" = c("lambda" = 0.01), "diff" = c("none" = 0)) all factor loadings are penalized through a shrinkage penalty.

Choosing the penalty function

It may be beneficial to try out different penalties, and see which one works best for the problem at hand. It is also useful to keep the following in mind:

Shrinkage: lasso, alasso, scad, and mcp are able to shrink parameters to zero, contrarily to the ridge penalty whose purpose is just regularizing the estimation process.
Unbiasedness: alasso, scad, and mcp enjoy the so-called "oracle" property. On the contrary, the lasso is biased since it applies the same penalization to all parameters.
Automatic procedure: only lasso, alasso, and ridge are supported by the automatic procedure. This means that these penalties are a convenient choice with all the analyses requiring multiple penalty terms (e.g., multiple-group analyses), for which the automatic procedure is the only feasible alternative to otherwise computationally intensive multi-dimensional grid-searches.

Geminiani, Marra, and Moustaki (2021) performed numerical and empirical examples to evaluate and compare the performance of single- and multiple-group penalized factor models under different penalty functions. The alasso penalty generally produced the best trade-off between sparsity and goodness of fit. However, unlike other penalties, the alasso requires a set of adaptive weights. In some situations, the weights might not be available, or might be difficult to obtain. If this is the case, users are encouraged to resort to simpler penalties.

More sparsity

The penalized model automatically tries to generate the optimal trade-off between goodness of fit and model complexity (if strategy = "auto"). As a result of this delicate balance, it may not provide the sparsest factor solution. If users desire more sparsity, they can follow the guidelines below.

Influence factor: increase the value of the influence factor stored in the option gamma. As a rule of thumb, in our experience, common values for obtaining sparse solutions usually range between 3.5 and 4.5.
Penalties: some penalties rely on a second tuning parameter. It may be helpful to try out different values for it, and see which one performs best. For instance, increasing the value or the exponent of the alasso (by specifying, for instance, a.alasso = 2) leads to sparser solutions.

In case users fitted a penalized model with a fixed tuning parameter (strategy = "fixed"), they can manually and subjectively increase its value in the option eta to encourage more sparsity. When doing so, it is helpful to first do some trials and understand a reasonable range of values that the tuning parameter can take.

Ordinary Maximum Likelihood

If strategy = "fixed", pen.shrink = "none", pen.diff = "none", and eta = list("shrink" = c("none" = 0), "diff" = c("none" = 0)), no penalization is applied, and the model is estimated through ordinary maximum likelihood.

Convergence & Admissibility

The function penfa internally assesses the convergence of the fitted model, and the admissibility of the final solution.

Convergence

The convergence checks assess whether the penalized gradient vector is close to zero and the penalized Hessian/Fisher information matrix is positive definite. In case of convergence issues, penfa warns the users with explanatory messages.
Note: Due to the presence of possibly multiple penalty terms, our experiments highlighted that the penalized gradient need not be strictly close to zero to obtain meaningful results. It is enough that its elements do not exceed a pre-specified threshold, whose value can be changed through the optim.dx.tol option.

Admissibility

The admissibility checks are carried out to determine whether the final solution is admissible. Specifically, the penfa function sequentially checks whether:

The final model includes any negative unique variances (Heywood cases);
The final model includes any negative factor variances;
The estimated common factor covariance matrix is positive definite;
The estimated unique factor covariance matrix is positive definite;
The estimated factor loading matrix is of full column rank;
The estimated factor loading matrix does not contain any null rows.

In case of multiple-group analyses, the function checks the admissibility of the parameter matrices of each group. If any of the above conditions are not satisfied, the penfa function warns the user with explanatory messages on the reasons why.

Warnings & Errors

Occasionally the penfa function may print out warnings or produce errors. If the errors concern convergence issues, it may be helpful to go through the following steps:

Identification: please make sure that at least the minimum identification restrictions are satisfied. This implies fixing the scale and the origin of every factor in each group. In addition, other constraints - which usually come in the form of zero-restricted loadings - are necessary due to rotational freedom.
Starting values: the choice of the starting values is of paramount importance when it comes to convergence. The starting values internally used by penfa correspond to the ones used by the lavaan package for confirmatory factor analysis. If users have some prior knowledge or intuition about possible values for some of the parameters, it might be beneficial to include this information by providing the starting values for those parameters in the syntax specification (see below for additional details). For instance, depending on the case, specifying the starting values of the primary loadings equal to 1 (start(1)*x1 + ...) often results in more stable optimization processes, especially when dealing with complicated models that require the estimation of many parameters, as in multiple-group penalized factor analysis.
Sample size: the penalized models fitted by penfa have a larger number of parameters than confirmatory factor analytic applications. This complexity should be accompanied by a reasonable sample size. If the sample size is too small for the complexity of the model, convergence issues will arise. In case of small sample sizes, it might in principle be more reliable to select the tuning parameter through a grid-search with the GBIC instead of using the automatic procedure.
Automatic procedure: if the starting values of the tuning parameters prevent the automatic procedure from finding the optimal estimates of the tuning parameters, the procedure is repeated with different starting values. If this fails, an error is printed out.
Adaptive weights: when using the alasso penalty, it is suggested to manually provide a vector of adaptive weights, especially for complex models. The adaptive weights often come in the form of (unpenalized) maximum likelihood estimates. If no vector of weights is provided, the penfa function internally estimates an unpenalized MLE model whose parameter estimates will serve as weights. If the unpenalized model does not converge, the penfa function internally estimates a ridge-regularized factor model and uses the resulting estimates as weights. If even this estimation fails, an error is printed out.

Ultimately, if none of the above succeeds, users shall consider re-specifying the model, either by simplifying the hypothesized factor structure or considering a subset of the observed variables. Increasing the number of restrictions (for instance, by specifying some additional fixed loadings) might be advantageous. Also, as a general practice, when conducting a multiple-group analysis, make sure beforehand that the groups share similar factor structures: if the groups have different factor configurations, the final results will be distorted.

It is always important to assess whether the distributional assumptions of the normal linear factor model hold. The penfa function fits penalized factor models to continuous observed variables; this excludes categorical items or items with a few number of categories that would instead require tailored approaches that specifically take into account the qualitative nature of the data.

Standard Errors

The standard errors are derived from the inverse of the penalized Fisher information matrix (if information = "fisher") or penalized Hessian (if information = "hessian"), which relies on the Bayesian result for the covariance matrix of the estimated parameters. The implemented framework allows to have a standard error for every model parameter. However, users should take extra caution when using the standard errors associated with the penalized parameters that were shrunken to zero.

Vignettes and Tutorials

To learn more about penfa, start with the vignettes and tutorials at browseVignettes(package = "penfa") and https://egeminiani.github.io/penfa/articles/.

References

Geminiani, E., Marra, G., & Moustaki, I. (2021). "Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection." Psychometrika, 86(1), 65-95. doi: 10.1007/s11336-021-09751-8

Geminiani E. (2020), "A penalized likelihood-based framework for single and multiple-group factor analysis models" (Doctoral dissertation, University of Bologna). Available at http://amsdottorato.unibo.it/9355/.

Author

Elena Geminiani geminianielena@gmail.com.

Examples


data(ccdata)

### Single-group analysis (no mean-structure, unit factor variances)
syntax = 'help  =~   h1 + h2 + h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5
          voice =~ 0*h1 + h2 + h3 + h4 + h5 + h6 + h7 +   v1 + v2 + v3 + v4 + v5'

alasso_fit <- penfa(## factor model
                    model  = syntax,
                    data   = ccdata,
                    std.lv = TRUE,
                    ## penalization
                    pen.shrink = "alasso",
                    eta = list(shrink = c("lambda" = 0.01), diff = c("none" = 0)),
                    ## automatic procedure
                    strategy = "auto",
                    gamma = 4)
#> Computing weights for alasso (ML estimates)... done.
#> 
#> Automatic procedure: 
#> Iteration  1 : 0.00298271 
#> Iteration  2 : 0.00452604 
#> 
#> Largest absolute gradient value: 12.76355181
#> Fisher information matrix is positive definite
#> Eigenvalue range: [180.2917, 9189645]
#> Trust region iterations: 15 
#> Factor solution: admissible 
#> Effective degrees of freedom: 27.12936  



### Multiple-group analysis (mean structure, marker-variable approach, starting values)
syntax_mg = '
help  =~ 1*h1 +          h2 +          h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5
voice =~ 0*h1 + start(0)*h2 + start(0)*h3 + h4 + h5 + h6 + h7 + 1*v1 + v2 + v3 + v4 + v5
h1 + v1 ~ 0*1 '

# Compute weights for alasso from unpenalized model
mle_fitMG <- penfa(model = syntax_mg,
                   data  = ccdata,
                   group = "country",
                   int.ov.free = TRUE,
                   int.lv.free = TRUE,
                   pen.shrink = "none",
                   pen.diff = "none",
                   eta = list(shrink = c("lambda" = 0), diff = c("none" = 0)),
                   strategy = "fixed")
#> 
#> Largest absolute gradient value: 0.00108292
#> Fisher information matrix is positive definite
#> Eigenvalue range: [2.904073, 76169.99]
#> Trust region iterations: 22 
#> Factor solution: admissible 
#> Computing VCOV ... done.
#> Effective degrees of freedom: 94  
mle_weightsMG <- coef(mle_fitMG)

# Fit model
alasso_fitMG <- penfa(## factor model
                      model = syntax_mg,
                      data = ccdata,
                      group = "country",
                      int.ov.free = TRUE,
                      int.lv.free = TRUE,
                      ## penalization
                      pen.shrink = "alasso",
                      pen.diff = "alasso",
                      eta = list(shrink = c("lambda" = 0.01),
                      diff = c("lambda" = 0.1, "tau" = 0.01)),
                      ## automatic procedure
                      strategy = "auto",
                      gamma = 4,
                      ## alasso
                      weights = mle_weightsMG)
#> 
#> Automatic procedure: 
#> Iteration  1 : 0.00215674 6115.68152854 0.00300201 
#> Iteration  2 : 0.00196778 4011.40169767 0.00772393 
#> 
#> Largest absolute gradient value: 61.89552041
#> Fisher information matrix is positive definite
#> Eigenvalue range: [53.00923, 1.174025e+12]
#> Trust region iterations: 18 
#> Factor solution: admissible 
#> Effective degrees of freedom: 56.75744  

### For additional examples, see the vignettes and tutorials at
### browseVignettes(package = "penfa") and https://egeminiani.github.io/penfa/articles/