Submitted by DreamyPen t3_zsbivc in MachineLearning
I have collected experimental data for various conditions. In order to ensure repeatability, each test is replicated 5 times: which means same input but slightly different output due to experimental variability.
If you were to build a machine learning algorithm, would you use all 5 data points for each given test, hoping that your algorithm will learn to converge towards the mean response? Or it is advisable to pre-compute the means and only feed these to the model? ( so that you ensure that one input can only have one output)
I can see pros and cons to both approches and would welcome feedback. Thank you.
dimsycamore t1_j178fqp wrote
I would recommend using all of the replicates. The model should learn the expectation sans any mean-zero noise that might vary between them. Basing this on a hand wavy interpretation of some results from the original noise2noise paper and more recent work on SSL. You can even consider each replicate an "augmentation" of your ground truth mean and use principles of SSL to enforce consistency between the replicates.