High-Performance Open-Source Archive
This vignette shows the basic workflow of msPCA on the
built-in mtcars dataset. We compute two sparse principal
components, inspect the solution, and compare the sparse result with
dense PCA.
Install the package directly from CRAN.
You can then load the package as usual.
We work with the correlation matrix of mtcars and ask
for two 4-sparse principal components under the default orthogonality
constraint.
Sigma <- cor(datasets::mtcars)
set.seed(42)
res <- mspca(Sigma, r = 2, ks = c(4, 4), feasibilityConstraintType = 0, verbose = FALSE)
print_mspca(res, Sigma)
#>
#> msPCA solution:
#> 2 sparse PCs
#> Pct. of variance explained: 32.5 28.0
#> Num. of non-zero loadings : 4 4
#> Sparse PCs
#> [,1] [,2]
#> mpg -0.499 0.000
#> cyl 0.495 0.000
#> disp 0.510 0.000
#> hp 0.000 -0.518
#> wt 0.495 0.000
#> qsec 0.000 0.506
#> vs 0.000 0.494
#> carb 0.000 -0.482Sparse PCA typically requires a constraint to avoid redundancy
between the PCs. Traditionally, this is done by enforcing orthogonality
of the loading vectors, which is the default in mspca.
Another notion of non-redundancy is to enforce zero pairwise correlation
between the PCs. The package allows for both options, and the choice can
lead to different solutions when the variables are strongly correlated.
feasibilityConstraintType = 0 (default) enforces
orthogonality of the loading vectors.
feasibilityConstraintType = 1 instead enforces zero
pairwise correlation between the resulting components.
res_corr <- mspca(Sigma, r = 2, ks = c(4, 4), feasibilityConstraintType = 1, verbose = FALSE)
print_mspca(res_corr, Sigma)
#>
#> msPCA solution:
#> 2 sparse PCs
#> Pct. of variance explained: 24.7 22.8
#> Num. of non-zero loadings : 4 4
#> Sparse PCs
#> [,1] [,2]
#> hp 0.312 0.000
#> drat 0.000 -0.337
#> wt 0.000 0.087
#> qsec -0.674 0.000
#> vs -0.279 0.000
#> am 0.000 -0.624
#> gear 0.000 -0.700
#> carb 0.609 0.000The package provides helper functions for checking feasibility and summarizing variance explained. Below, we report the same diagnostic checks for each fitted solution.
cat("Diagnostics for res (feasibilityConstraintType = 0)\n")
#> Diagnostics for res (feasibilityConstraintType = 0)
feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 0)
#> [1] 0
feasibility_violation_off(Sigma, res$x_best, feasibilityConstraintType = 1)
#> [1] 2.335602
fraction_variance_explained(Sigma, res$x_best)
#> [1] 0.6043866
fraction_variance_explained_perPC(Sigma, res$x_best)
#> [1] 0.3245835 0.2798031
cat("\nDiagnostics for res_corr (feasibilityConstraintType = 1)\n")
#>
#> Diagnostics for res_corr (feasibilityConstraintType = 1)
feasibility_violation_off(Sigma, res_corr$x_best, feasibilityConstraintType = 0)
#> [1] 0
feasibility_violation_off(Sigma, res_corr$x_best, feasibilityConstraintType = 1)
#> [1] 9.62078e-05
fraction_variance_explained(Sigma, res_corr$x_best)
#> [1] 0.4753908
fraction_variance_explained_perPC(Sigma, res_corr$x_best)
#> [1] 0.2472306 0.2281602For reference, the first two dense principal components explain more variance, but they are not sparse.
Sparse PCA typically trades some explained variance for a much more
interpretable loading pattern. For a quick summary of the fitted
components, print_mspca() is usually the most useful first
diagnostic.
Need mirroring services?
Contact our team at info@vpspulse.com.
Mirror powered by VPSpulse
Infrastructure sponsored by VPSPulse & Secure Payments by ArionPay.