1. Purpose and Interpretations of Principal Components Analysis (PCA)

The goal of PCA is to construct sets of weights (called principal components) based on the covariance of a set of correlated variables (electrodes) so that components explain all of the variance of the data such that the components (1) are uncorrelated with each other and (2) are created so that the first principal component explains as much variance as possible for one variable, the second principal component explains as much of the residual variance as possible for one variable while being orthogonal to the first component, and so on, for as many components as there are electrodes (variables).
PCA can be used to create a set of spatial filters in which the weights for each electrode are defined by patterns of interelectrode temporal covariance. PCA therefore highlights specific features of the data that might be difficult to identify in the spatially unfiltered data, because they are created by weighted combinations of all electrodes. PCA can also be interpreted as a data reduction technique by assuming that the components that account for a relatively large proportion of variance reflect true signal, whereas components that account for relatively little variance reflect noise.
Another way to explain PCA is that it rotates axes in an N-dimensional space (where N is the number of variables or electrodes) such that the variance along each dimension is maximized, and the axes are orthogonal to each other. An example of principal components for two-dimensional (2-D) data is represented visually in the following figure.

2. Difference between PCA and other methods

Difference between the Surface Laplacian and PCA

Whereas the surface Laplacian explicitly attenuates low-spatial frequency activity and therefore highlights local features of the data, PCA highlights global spatial features of the data by identifying patterns of large-scale covariance. Local topographical features contribute relatively little to the covariance matrix and are thus less likely to contribute to the first few principal components that are typically analyzed.

Difference between Independent Component Analysis (ICA) and PCA

PCA is used to decorrelate and reduce the dimensionality of a multivariate signal; ICA is used to demix independent sources that are embedded in multivariate signals. Thus, PCA decorrelates and ICA demixes.
There are other important differences between PCA and ICA. PCA is computed using only second-order statistics (variances), whereas ICA also uses higher-order statistics such as skew and kurtosis. ICA uses iterative methods based on minimizing cost functions (typically via mutual information) to adjust weights, whereas PCA does not iterate to adjust weights except during some rotation methods. PCA components are constrained to be orthogonal to each other (except in some methods of rotation), whereas ICA does not assume that sources are orthogonal to each other.
ICA is a standard and widely used technique in EEG data analyses. It is used to clean data prior to analyses (as discussed in section 8.1), and it is also used directly in analyses by analyzing component time courses instead of electrode time courses.

3. How PCA Is Computed

The first step of computing a PCA is to construct a covariance matrix.

in which X is an electrodes-by-time-points matrix, and

is the mean signal of each electrode over time. Dividing by n can introduce a bias in the covariance at small n (e.g., less than 50). For larger n, the bias is likely to be negligible. EEG data covariance matrices are often computed from hundreds or thousands of observations, so whether n or (n – 1) is used is unlikely to have a significant negative impact on the PCA results.
Once a covariance matrix is computed, the next step is to perform an eigendecomposition via the Matlab function eig, which returns eigenvalues and eigenvectors in ascending order. Thinking back to the above figure, the eigenvectors are the new rotated axes, and the eigenvalues are the lengths of those axes. In the parlance of PCA the eigenvectors are the principal components. Each column in the eigenvector matrix is a principal component, and each row stores the weights of each electrode. The eigenvalues can be scaled to percentage variance accounted for by dividing each eigenvalue by the sum of all eigenvalues and multiplying by 100. Converting eigenvalues to percentage variance accounted for provides an easily interpretable metric and has the added benefit that it puts all eigenvalues in the same scale and thus comparable across conditions, time windows, subjects, and so on regardless of the scale of the original data.
The electrode weights for each component can be plotted as topographical maps, and time courses of the components can be obtained by multiplying the weights by the electrode time series data. Note that this means that each component has one associated time course (in other words, one electrode), and that time course is a weighted sum of the activity of all electrodes.

4. Distinguishing Significant from Nonsignificant Components

When considering a plot of the percentage variance accounted for over components, you will notice that only the first several components account for “ a lot ” of variance, and most of the later components account for relatively little variance. Often in the PCA the first several components will be considered signal, whereas the later components will be considered noise. Keep in mind that PCs are defined based on strengths of interelectrode covariance. Thus, weak interelectrode covariance would result in components with small eigenvalues even if they contain meaningful signal. In other words, spatially local features of the data can be meaningful and can be obscured by spatially broad features. So, labeling components as “ signal ” or “ noise ” based only on their eigenvalues should be done cautiously.
Distinguishing significant from nonsignificant components involves determining a percentage variance threshold and considering components that explain variance above that threshold to be significant.
There are several ways to determine the threshold. One way is to compute the percentage explained variance that would be expected from each component if all electrodes were uncorrelated with each other. In this case, each component would explain [100 × 1/ M] percent of the variance, where M is the number of electrodes/components. Another way to determine a threshold is to use permutation testing, whereby the data are randomly shuffled, a PCA is computed on the shuffled data, and the amount of variance explained in shuffled data, averaged over many repetitions, is taken as the threshold. The advantage of permutation-based thresholding over the analytic method described in the previous paragraph is that the data distributions and characteristics are retained. The two methods are showed as follows.

Other methods for determining whether a component is significant include examining changes in the slope of the eigenvalue function or visually inspecting the plot of variance accounted for by each PC.

5. Rotating PCA Solutions

Rotation involves selecting a number of significant components and applying iterative algorithms to further minimize variance along the selected dimensions. There are several methods for rotating principal components that make different assumptions about whether and how components can be correlated. Note that a rotated PCA solution will change if you rotate a different number of components, whereas the unrotated solution will always be the same.
To rotate PCA solutions, you can try the PCA toolbox for Matlab (Dien 2010), which is designed specifically for handling EEG data.

6. Time-Resolved PCA

It may seem inappropriate to include all time points into the covariance matrix, particularly the buffer zones (for edge artifacts) or intertrial intervals. Furthermore, different time periods within the trial might be associated with different neuro electrical modes or covariance configurations. Thus, PCA can be performed over successive time windows.

If you perform a time-resolved PCA, you will need to decide how much time to include when computing the covariance matrix. There is a trade-off between temporal specificity (shorter periods of time help isolate temporal dynamics, particularly temporally brief dynamics) and signal-to-noise ratio (longer periods of time produce more stable estimates of covariance). In most cases, a time window of around 400 ms should be enough.

7. PCA with Time-Frequency Information

PCA can be combined with temporal bandpass filtering to highlight frequency-band-specific spatial features. This can be done by first bandpass filtering the time-domain signal (either via FIR filtering or via the real part of the result of wavelet convolution) and then following the same procedures as already outlined above.

8. PCA across Conditions

PCA can be used for testing condition differences in part because PCA can be used as a data reduction technique. Rather than comparing the electrode-level activity across conditions, you can compare the PC-defined activity across conditions.

Link:
Brief Introduction to PCA in Chinese, Tengjun Liu.

Reference:
Cohen, Mike X. Analyzing neural time series data: theory and practice. MIT press, 2014.

脑机接口，从未如此有趣！

Principal Components Analyses/主成分分析