Submitted by xylont t3_zlm587 in MachineLearning

The standard method is to normalize the entire dataset (the training part) then send it to the model to train on. However I’ve noticed that in this manner the model doesn’t really work well when dealing with values outside the range it was trained on.

So how about normalizing each sample between a fixed range, say 0 to 1 and then sending them in.

Of course the testing data and the values to predict on would also be normalized in the same way.

Would it change the neural network for the better or worse?

0

Comments

You must log in or register to comment.

robot_lives_matter t1_j062iq1 wrote

I normalise each sample almost always now. Seems to work great for me.

0

UnnAmmEdd t1_j0677tu wrote

>So how about normalizing each sample between a fixed range, say 0 to 1 and then sending them in.

How would you normalize each sample?

3

killver t1_j069atv wrote

This is already done in computer vision most of the time by just dividing the pixels by 255. You can also do actual sample normalization by let's say dividing by maximum value of the sample.

But as always there is no free lunch. Just try all options and see what works better for your problem.

8

barvazduck t1_j06dikt wrote

I also like to add another value like scale that hints at the pre-normalized size. That way an ocr model a handwritten period and zero won't seem the same (also relevant for other goals)

1

ShadowPirate42 t1_j06glhn wrote

I assume you mean normalizing on axis 1. In most cases this is a bad idea. think about a house price predictor. You have a sq footage and number of bathrooms. If you normalize on axis 1, the number of bathrooms will be 0.0003 and the sq footage might be 0.6. You still dealing with different scales and you might as well not normalize at all. You would be better off capping the upper and lower end after normalization, but still normalizing on the axis 0. E.g. convert any value above 1 to 1 and below 0 to 0.
Edit: alternatively if your data has a lot of outliers, you may want to clip prior to normalization:
pd.DataFrame = xtrain.apply(lambda col: col.clip(*col.quantile([min_clip, max_clip]).values))
or just use standardization:
https://dataakkadian.medium.com/standardization-vs-normalization-da7a3a308c64#:~:text=In%20statistics%2C%20Standardization%20is%20the,range%20between%200%20and%201.

0

mr_birrd t1_j06s0mg wrote

Well uint 8 goes to 255, so there you take those values. Images come in that format often but the ReLUs and other activations hate it so better take it to a 0-1 range. Btw min max just subtracts the min of the sample and then divide by max. I don't see the problem.

Edit: Also think about why we do BatchNormalization

−1

UnnAmmEdd t1_j06vfhf wrote

Okay, there is nowhere written that we are working on images. If yes, then ofc dividing by 255 doesn't seem to be wrong, it is usually done when casting uint8 to float.

But if doesn't make assumption, that the input is an image (it may be an embedding from token in NLP or a row if we work with tabular data), then input values may be from (-inf, +inf), so we need min/max to put boundaries on this interval.

2

Internal-Diet-514 t1_j07w3r6 wrote

Depends on the range of min and max values for every other sample in the dataset. For instance if one of your samples ranges from (0-12) and most others from (0-64) you would be missing out on the fact that 12 was actually a pretty low value comparative to other observations as it would be set to 1 for that sample.

1

SnooDogs3089 t1_j08qrh1 wrote

Batch norm is the way. Do not touch anything before feeding

1

SnooDogs3089 t1_j0bn5w3 wrote

Because anyway the NN will "undo" or "if needed" do it. Assuming a reasonable big enough NN. Batch norm makes a lot of sense but inside the layers. I don't know how are you planning to use your NN but normalizing requires that the deployed model will need an additional preprocessing that in most cases it's not necessary and will only require resources plus the possibility of errors for end users. Moreover you have to be very careful about the future normalization. To synthesize, at best is only slightly useful and worst case scenario very dangerous.

1