Hi all!

Given X ∈ ℝ ^(Nx), Y ∈ ℝ ^(Ny), β ∈ ℝ^(+), so

W = YX^(T)(XX^(T)+βI)^(-1) (with the Moore–Penrose pseudoinverse)

where A = YX^(T) and B = XX^(T)+βI.

If we consider an arbitrary number of indices/units < Nx, and so we consider only some columns of matrix A and some columns and rows (crosses) of B. The rest of A and B are zeros.

The approach above of sparsify A and B will break the ridge regression solution when W=AB^(-1)? If yes, there are ways to avoid it?

Many thanks!

Comments

[deleted] t1_j6hg3ri wrote on January 30, 2023 at 10:17 AM

[removed]

HateRedditCantQuitit t1_j6k9goi wrote on January 30, 2023 at 10:40 PM

https://en.wikipedia.org/wiki/Elastic_net_regularization

should_go_work t1_j6kooe8 wrote on January 31, 2023 at 12:25 AM

If your goal is to do linear regression and enforce hard sparsity constraints on W, then there are several algorithms to do this directly (not guaranteed to recover the true sparse W unless certain conditions are met though). A simple starting point might be orthogonal matching pursuit: https://scikit-learn.org/stable/auto_examples/linear_model/plot_omp.html.

thevillagersid t1_j6kvatk wrote on January 31, 2023 at 1:13 AM

Are you asking about the feasibility ridge regression with sparse inputs? Or about regularization to enforce a sparse solution?

antodima OP t1_j6m5aqd wrote on January 31, 2023 at 8:27 AM

Basically is the feasibility ridge regression with sparse inputs, but I want to select partial units of W acting on A and B. For instance, if I have A of (2x5) and B of (5x5) and I choose units 2 and 4, the columns [0,1,3] of A are zeros and columns and rows of B with index [0,1,3] are also zero. I select the units 2 and 4 with some importance mechanism. The question is: there is a way of having W* resulting from filter A and B that is similar to W computed without filtering A and B?

I asked because filtering A and B break the inversion and so the computation of W. I don't know if there exists some way of decomposing B in order to invert more easily or something like this.

Anyway thanks for your interest!

thevillagersid t1_j6nfjle wrote on January 31, 2023 at 4:00 PM

You can still compute the estimator with sparse inputs because the regularization term ensures the denominator is full rank. If the zeros are standing in for missing values, however, your estimates will be biased.

As for your second question, W* computed from only columns 2 and 4 will only yield the same values as W in the unrestricted model if the columns of X are orthogonal. Could you work with an orthogonal transform (e.g. PCA projection) of the X matrix?