Submitted by newperson77777777 t3_109hcyw in MachineLearning

I am working on a medical ML project and my advisor would not like to publish our dataset. I would like to publish our results to a top-tier ML conference. Would this affect us during the review process? If so, are there any ways to mitigate against this like also including results on separate publicly available datasets?

Just to note, not publishing the research dataset seems much more common in medical publication venues.

11

Comments

You must log in or register to comment.

[deleted] t1_j3yaz0i wrote

You may not publish your dataset but you should:

  1. benchmark on a public dataset

  2. benchmark other approaches on your private dataset

48

chatterbox272 t1_j40jame wrote

Not publishing the dataset is becoming less common as we start inching our way slowly to reproducible science. Public code with public data is the simplest form of reproducible research, where we can re-run your experiments with the same code and should get the same result (modulo some extremely low-level randomness or hardware differences that we may not be able to control).

That alone isn't enough to kill a paper, but it doesn't help. As another commenter said, showing your approach on public datasets and other approaches on your dataset will help, as it gives the rest of the community something that is reproducable.

It's more common in medical venues because of a few reasons:

  1. Difficulties around safely releasing medical data. Proper anonymisation and informed consent.
  2. It is more common in medical science to go for a higher level of reproducibility, where the same or a similar study will be done on a different population (i.e. same method, different data). This is pretty uncommon in ML, it's hard to get papers accepted in this format.
10

Insighteous t1_j430uuc wrote

Publishing everything is a good thing. At the moment I am trying to reproduce some results of a paper and have to work with „we created X datasets by three methods“. And NO WHERE in the paper it is stated what these three methods are. Also no code.

It is so annoying. Cannot put it in words.

1