Viewing a single comment thread. View all comments

suflaj t1_iuya2f9 wrote

It can be seen as an approximation of the variance between the noise and the noise predicted conditioned on some data.

If it's on the training set it is not even usable as a metric, and if it is not directly related to the performance it is not a good metric. You want to see how it acts on unseen data.

1

tivotox t1_iuyb2oh wrote

The split has been done such as the train and test are highly different. the loss are almost equal on both datasets.

1

suflaj t1_iuybshu wrote

That seems very bad. You want your train-dev-test to be different samples of the same distribution, so, not very different sets.

Furthermore, if you're using test for model validation, that means you will have no dataset to finally evaluate your model on. Reconsider your process.

Finally, again, I urge you to evaluate your dataset on an established evaluation metric for the task, not the loss you use to train the model. What is the exact task?

2

[deleted] OP t1_iuyf897 wrote

[deleted]

1

suflaj t1_iuyg2am wrote

Well I couldn't understand what your task was when you didn't say what it was until now.

Other than that, skimming through the paper it quite clearly says the following:

> Our present results do not indicate our procedure can generalize to motifs that are not present in the training set

Because what they're doing doesn't generalize, I think the starting assumptions (that there will be imprevements with a larger model) are wrong, and so the question is unnecessary... The issue is with the method or the data, they do not elaborate more than that.

2