What are the proper steps for preprocessing my waveforms in order to perform an independent component analysis (ICA) later? I understand the how, though further explanation of that doesn't hurt, but I'm more interested in the why.
Answer
Independent component analysis (ICA) is used to separate a linear mixture of statistically independent and most importantly, non-Gaussian† components into its constituents. The standard model for a noise-free ICA is
$$\mathbf{x}=\mathbf{As}$$
where $\mathbf{x}$ is the observation or data vector, $\mathbf{s}$ is a source signal/original components (non-Gaussian) and $\mathbf{A}$ is a transformation vector that defines the linear mixing of the constituent signals. Typically, $\mathbf{A}$ and $\mathbf{s}$ are unknown.
Pre-processing
There are two main pre-processing strategies in ICA, namely centering and whitening/sphering. The primary reasons for pre-processing are:
- Simplification of algorithms
- Reduction of dimensionality of the problem
- Reduction of number of parameters to be estimated.
- Highlighting features of the data set not readily explained by the mean and covariance.
From the introduction of G. Li and J. Zhang, "Sphering and its properties", The Indian Journal of Statistics, Vol. 60, Series A, Part I, pp. 119-133, 1998:
Outliers, clusters or other kind of groups, and concentrations near curves or non-flat surfaces are probably the important features that interest data analysts. They are, in general, not obtainable through mere knowledge of the sample mean and covariance matrix. In these circumstances, it is desirable to separate off the information contained in the mean and the covariance matrices and forces us to examine aspects of our data sets other than those well-understood natures. Centering and sphering is a simple and intuitive approach that eliminates the mean-covariance information and helps to highlight structures beyond linear correlation and elliptic shapes, and therefore is often performed before exploring displays or analyses of data sets
1. Centering:
Centering is a very simple operation and simply refers to subtracting the mean $\mathbb{E}\{\mathbf{x}\}$. In practice, you use the sample mean and create a new vector $\mathbf{x}_c=\mathbf{x}-\overline{\mathbf{x}}$, where $\overline{\mathbf{x}}$ is the mean of the data. Geometrically, subtracting the mean is equivalent to translating the center of coordinates to the origin. The mean can always be re-added to the result the end (this is possible because matrix multiplication is distributive).
2. Whitening:
Whitening is a transformation that converts the data such that it has an identity covariance matrix, i.e., $\mathbb{E}\{\mathbf{x}_c\mathbf{x}_c^T\}=\mathbf{I}$. Normally, you work with the sample covariance matrix,
$$\widehat{\mathbf{\Sigma}}=C.\mathbf{x}_c\mathbf{x}_c^T$$
where $C$ is just my lazy placeholder for the appropriate normalization factor (depending on the dimensions of $\mathbf{x}$). A new whitened vector is created as
$$\mathbf{x}_w=\widehat{\mathbf{\Sigma}}^{-1/2}\mathbf{x}_c$$
which will have a covariance of $\mathbf{I}$. Geometrically, whitening is a scaling transformation. Here is a small example in Mathematica:
s = RandomReal[{-1, 1}, {2000, 2}];
A = {{2, 3}, {4, 2}};
x = s.A;
whiteningMatrix = Inverse@CholeskyDecomposition[Transpose@x.x/Length@x];
y = x.whiteningMatrix;
FullGraphics@GraphicsRow[
ListPlot[#, AspectRatio -> 1, Frame -> True] & /@ {s, x, y}]
The first plot is the joint density of two uniformly distributed random vectors, or the components $\mathbf{s}$. The second shows the effect of multiplying by a transformation vector $\mathbf{A}$. The square gets skewed and scaled into a rhombus. By multiplying with the whitening matrix, the joint density is back to a square which is slightly rotated from the original.
Because of the whitening transformation, in the new system that is being solved, i.e. $\mathbf{x}_w=\mathbf{A}_w\mathbf{s}_w$, $\mathbf{A}_w$ is an orthogonal matrix. This can be easily shown:
$$ \begin{align} \mathbb{E}\{\mathbf{x}_w\mathbf{x}_w^T\}&=\mathbb{E}\{\mathbf{A}_w\mathbf{s}_w(\mathbf{A}_w\mathbf{s}_w)^T\}\\ &=\mathbf{A}_w\mathbb{E}\{\mathbf{s}_w\mathbf{s}_w^T\}\mathbf{A}_w^T\\ &=\mathbf{A}_w\mathbf{A}_w^T=\mathbf{I} \end{align} $$
where the last step follows because of the statistical independence of $\mathbf{s}_i$ The orthogonality condition means that there are only about half as many parameters that need to be estimated. (Note: Although this is true in this case and in my example, $\mathbf{A}$ need not be square to begin with).
If, after the transformation, there are eigenvalues close to zero, then these can be safely discarded as they are just noise and will only hamper the estimation due to "overlearning".
3. Other pre-processing
There might be other pre-processing steps involved in certain specific applications that are impossible to cover in an answer. For example, I've seen a few articles which use the log of the time-series and a few others that filter the time-series. While it might be suited for their particular application/conditions, the results don't carry over to all fields.
†I believe it is possible to use ICA if at most one of the components is Gaussian, although I can't find a reference for this right now.
Why is it called "sphering"?
This is probably well known, but just as a fun fact, sphering comes from the change in the structure of covariance matrices in the case of Gaussian components from an $n$-dimensional hyper ellipsoid to an $n$-dimensional sphere due to whitening. Here's an example (use the same code as above, but replace {-1,1}
with NormalDistribution[]
)
The first is the joint density for two uncorrelated Gaussians, the second under transformation and the third is after whitening. In practice only steps 2 and 3 are visible.
No comments:
Post a Comment