User:Wanghe07/sandbox

This is the user sandbox of Wanghe07. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Sparse principal component analysis (sparse PCA) is a specialised technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of Principal component analysis (PCA) for the reduction of dimensionality of data by adding sparsity constraint on the input variables.

Ordinary principal component analysis (PCA) uses a vector space transform to reduce multidimensional data sets to lower dimensions. It finds linear combinations of input variables, and transform them into new variables (called "principal components") that correspond to directions of maximal variance in the data. The number of new variables created by these linear combinations is usually much lower than the number of input variables in the original dataset, while still explaining most of the variance present in the data.

A particular disadvantage of ordinary PCA is that the principal components are usually linear combinations of all input variables. Sparse PCA overcomes this disadvantage by finding linear combinations that only contains only a few input variables.

Mathematical Formulation[edit]

Consider a data matrix, X, where each of the p columns represent an input variable, and each of the n rows represents an independent sample from data population. We assume each column of X has mean zero, otherwise we can substract column-wise mean from each element of X. Let Σ=X^TX be the empirical covariance matrix of X, which has dimension p×p. Given an integer k with 1≤k≤p, the sparse PCA problem can be formulated as maximizing the variance of along a direction represented by vector $v\in \mathbb {R} ^{p}$ while constraining its cardinality:

{\begin{aligned}\max \quad &v^{T}\Sigma v\\{\text{subject to}}\quad &\left\Vert v\right\Vert _{2}=1\\&\left\Vert v\right\Vert _{0}\leq k.\end{aligned}}

Eq. 1

The first constraint specifies that v is a unit vector. In the second constraint, $\left\Vert v\right\Vert _{0}$ represents the L0 norm of v, which is defined as the number of its non-zero components. So the second constraint specifies that the number of non-zero components in v is less than or equal to k, which is typically an integer that is much smaller than dimension p. The optimal value of Eq. 1 is known as the k-sparse largest eigenvalue.

In we take k=p, the problem reduces the ordinary PCA, and the optimal value becomes the largest eigenvalue of covariance matrix Σ.

After finding the optimal solution v, we deflate Σ to obtain a new matrix

\Sigma _{1}=\Sigma -(v^{T}\Sigma v)vv^{T},

and iterate this process to obtain further principle components. However, unlike PCA, sparse PCA cannot guarantee that different principle components are orthogonal. In order to achieve orthogonality, additional contraints must be enforced.

Because of the cardinality constraint, the maximization problem is hard to solve exactly, especially when dimension p is high. In fact, problem in Eq. 1 is NP-hard.^[1] Several alternative approaches have been proposed, including

a regression framework,^[2]
a convex relaxation/semidefinite programming framework,^[3]
a generalized power method framework^[4]
an alternating maximization framework^[5]
forward/backward greedy search and exact methods using branch-and-bound techniques,^[6]
Bayesian formulation framework.^[7]

Semidefinite Programming Relaxation[edit]

It has been proposed that sparse PCA can be approximated by semidefinite programming (SDP) ^[3]. Let $V$ be a p×p symmetric matrix, we can rewrite the sparse PCA problem as

{\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\Vert V\Vert _{0}\leq k^{2}\\&Rank(V)=1,V\succeq 0.\end{aligned}}

Eq. 2

Tr is the matrix trace, and $\Vert V\Vert _{0}$ represents the non-zero elements in matrix V. The last line specifies that V has matrix rank one and is positive semidefinite. The last line means that we have $V=vv^{T}$ , so Eq. 2 is equivalent to Eq. 1. If we drop the rank constraint and relax the cardinality contraint by a 1-norm convex constraint, we get a semidefinite programming relaxation, which can be solved efficiently in polynomial time:

{\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\mathbf {1} ^{T}|V|\mathbf {1} \leq k\\&V\succeq 0.\end{aligned}}

Eq. 3

In the second constraint, $\mathbf {1}$ is a p×1 vector of ones, and |V| is the matrix whose elements are the absolute values of the elements of V.

Unfortunately, the optimal solution $V$ to the relaxed problem Eq. 3 is not guaranteed to have rank one. In that case, $V$ can be truncated to retain only the dominant eigenvector.

Applications[edit]

Financial Data Analysis[edit]

Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may generate principle components that are weighted combination of all the assets. In constrast, sparse PCA would produce principle components that are weighted combination of only a few input assets, so one can easily interpret its meaning. Furthermore, if one uses a trading strategy based on these principle components, fewer assets imply less transaction costs.

Biology[edit]

Consider a dataset where each input variable corresponds to a specific gene. Sparse PCA can produce a principle component that involves only a few genes, so researchers can focus on these specific genes for further analysis.

Sparse PCA in High Dimension[edit]

Contemporary datasets often have the number of input variables (p) comparable with or even much larger than the number of samples (n). It has been shown that if p/n does not converge to zero, the classical PCA is not consistent. In other words, if we let k=p in Eq. 1, then the optimal value does not converge to the largest eigenvalue of data population when the sample size n→∞, and the optimal solution does not converge to the direction of maximum variance. But sparse PCA can retain consistency even if p>>n.^[8]

An Example of Hypothesis Testing in High Dimension using Spare PCA[edit]

More specifically, the k-sparse largest eigenvalue (the optimal value of Eq. 1) can be used to discriminate an isometric model, where every direction has the same variance, from a spiked covariance model in high-dimensional setting.^[1] Consider a hypothesis test where the null hypothesis specifies that data $X$ are generated from multivariate normal distributuion with mean 0 and covariance equal to an identity matrix, and the alternative hypothesis specifies that data $X$ is generated from a spiked model with signal strength $\theta$ :

H_{0}:X\sim N(0,I_{p}),\quad H_{1}:X\sim N(0,I_{p}+\theta vv^{T}),

where $v\in \mathbb {R} ^{p}$ has only k non-zero coordinates. The largest k-sparse eigenvalue can discriminate the two hypothesis if and only if $\theta >\Theta ({\sqrt {k\log(p)/n}})$ . Note that

Since computing k-sparse eigenvalue is NP-hard, one can approximate it by the optimal value of semidefinite programming relaxation (Eq. 3). If that case, we can discriminate the two hypotheses if $\theta >\Theta ({\sqrt {k^{2}\log(p)/n}})$ . The additional ${\sqrt {k}}$ term cannot be improved by any other polynomical time algorithm if a conjecture for the hidden clique problem holds.

References[edit]

^ ^a ^b Quentin Berthet, Philippe Rigollet (2013). "Optimal Detection of Sparse Principal Components in High Dimension". The Annals of Statistics, 2013, Vol. 41, No. 1, 1780–1815.
^ H. Zou and T. Hastie and R. Tibshirani (2006). "Sparse principal component analysis" (PDF). Jcgs 2006 15(2): 262-286.
^ ^a ^b Alexandre d’Aspremont, Laurent El Ghaoui, Michael I. Jordan, Gert R. G. Lanckriet (2007). "A Direct Formulation for Sparse PCA Using Semidefinite Programming" (PDF). SIAM Review 49(3):434–448.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Michel Journee, Yurii Nesterov, Peter Richtarik, Rodolphe Sepulchre (2008). "Generalized Power Method for Sparse Principal Component Analysis" (PDF). CORE Discussion Paper 2008/70, Journal of Machine Learning Research 11 (2010) 517-553. 0811: 4724. arXiv:0811.4724.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Peter Richtarik, Martin Takac and S. Damla Ahipasaoglu (2012). "Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes". arXiv:1212.4137. {{cite journal}}: Cite journal requires |journal= (help)
^ Baback Moghaddam, Yair Weiss, Shai Avidan (2005). "Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms" (PDF). Advances in Neural Information Processing Systems (NIPS), MIT Press.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Yue Guan, Jennifer Dy (2009). "Sparse Probabilistic Principal Component Analysis" (PDF). Journal of Machine Learning Research Workshop and Conference Proceedings. 5: AISTATS 2009.
^ Iain M Johnstone, Arthur Yu Lu (2009). "On Consistency and Sparsity for Principal Components Analysis in High Dimensions". Journal of the American Statistical Association, Volume 104, Issue 486, 2009.

[optimal-1] Quentin Berthet, Philippe Rigollet (2013). "Optimal Detection of Sparse Principal Components in High Dimension". The Annals of Statistics, 2013, Vol. 41, No. 1, 1780–1815.

[2] H. Zou and T. Hastie and R. Tibshirani (2006). "Sparse principal component analysis" (PDF). Jcgs 2006 15(2): 262-286.

[SDP-3] Alexandre d’Aspremont, Laurent El Ghaoui, Michael I. Jordan, Gert R. G. Lanckriet (2007). "A Direct Formulation for Sparse PCA Using Semidefinite Programming" (PDF). SIAM Review 49(3):434–448.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[4] Michel Journee, Yurii Nesterov, Peter Richtarik, Rodolphe Sepulchre (2008). "Generalized Power Method for Sparse Principal Component Analysis" (PDF). CORE Discussion Paper 2008/70, Journal of Machine Learning Research 11 (2010) 517-553. 0811: 4724. arXiv:0811.4724.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[5] Peter Richtarik, Martin Takac and S. Damla Ahipasaoglu (2012). "Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes". arXiv:1212.4137. {{cite journal}}: Cite journal requires |journal= (help)

[6] Baback Moghaddam, Yair Weiss, Shai Avidan (2005). "Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms" (PDF). Advances in Neural Information Processing Systems (NIPS), MIT Press.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[7] Yue Guan, Jennifer Dy (2009). "Sparse Probabilistic Principal Component Analysis" (PDF). Journal of Machine Learning Research Workshop and Conference Proceedings. 5: AISTATS 2009.

[consistency-8] Iain M Johnstone, Arthur Yu Lu (2009). "On Consistency and Sparsity for Principal Components Analysis in High Dimensions". Journal of the American Statistical Association, Volume 104, Issue 486, 2009.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]