Submitted by eiliya_20 t3_103641x in MachineLearning
comradeswitch t1_j33wz4y wrote
Why is X being converted into a diagonal matrix here? Can you show some sample code? It's difficult to tell what exactly is happening here.
But I can say this- if the covariance matrix estimate is positive semidefinite, and the inverse exists, the inverse is positive semidefinite (psd from here) as well. In fact, if the inverse exists the matrix and its inverse are both positive definite. But if you have a psd matrix that is not pd, you have several options.
First, the Moore-Penrose pseudoinverse is a generalization of the matrix inverse that always exists, even if it is nonsquare. This occurs in least-squares estimation of over- and under-determined linear systems of equations. Using the pseudoinverse of the covariance matrix is equivalent to assuming that the normal distribution you have is restricted to the lower-dimensional subspace spanned by your observed data rather than the whole space. This can be calculated in the same amount of time roughly as a matrix inverse and if a singular value decomposition is available for the matrix, it can be constructed via VS^+V^T where S^+ has the reciprocals of the nonzero singular values on the diagonal and the zero singular values remain zero. This approach has the benefit of giving an "inverse" that has the same rank as the covariance matrix- if you have a low rank covariance matrix, the data can be transformed via Y = S^{+/2} V^T X (where S^{+/2}_{ii} = 1/sqrt{s_ii} for nonzero s_ii and zero else). Then, Y^T Y = X^T (X^T X)^{+} X which makes the dot product (and Euclidean distance) between columns of Y equal to the Mahalanobis inner product/distance...and if the covariance matrix is low rank, this reduces the number of dimensions in the data. Unfortunately, it's also not a very good estimator of the precision matrix (inverse of covariance) on mathematical statistics grounds as well as being prone to numerical instability for small singular values.
As others have said, you can also add a small multiple of the identity matrix to the estimate of the covariance matrix before inverting it. This has more desirable estimator performance, and it corresponds to penalization of the L2 norm of the estimate. This is significantly more robust and well behaved numerically. It adds that multiple to each singular value of the covariance matrix, which ideally prevents very small singular values with very low reliability blowing up when the reciprocal is taken. Unfortunately, if the covariance matrix is low rank, this regularized estimate is going to be full rank, which can be problematic or unjustified, and significantly more difficult to work with in cases where the rank is much lower than the dimension.
The third option is actually a hybrid of the previous two- it combines the low rank of the pseudoinverse approach and regularization. If A is the covariance matrix estimate, calculate (A^2 + cI)^{-1} A
. The singular values of that matrix are s/(s^2 + c)
for positive singular values, quite similar to that in the second approach which has eigenvalues 1/(s+c)
for all singular values...including zeros. However, while (A^2+cI) is full rank, if A is not, their product will not be full rank either. If Av = 0, then v^T (A^2 + cI)^{-1} A = 1/c v^T A = 0. This retains the low rank benefits while still regularizing. This can be derived from a minimization problem of ||AX - I||^2 + c||X||^2 and solving the gradient root. Conveniently, in the limit as c approaches 0, this matrix becomes the pseudoinverse (and if the matrix is invertible, the pseudoinverse and the inverse coincide).
Viewing a single comment thread. View all comments