Viewing a single comment thread. View all comments

suflaj t1_j43eanz wrote

Well depends on what usefulness is.

If you can prove that all of your samples belong to the same distribution, then simply looking up which have the greatest gradient norm will be a measure of how useful they are for the model. Another approach is looking at how much their contribution would be in improving the performance of other samples, but then your dataset becomes a dependent variable.

But obviously this is dependent on the current weights, the loss function and various other biases. This is because gradient norm is proportional to the error, and so the samples for which the model predicts the most erroneous result will end up being most useful, given the perfect LR for it.

1