For ImageNet classification, there are two common ways of normalizing the input images:

- Normalize to [-1, 1] using an affine transformation (2*(x/255) - 1).

- Normalize using ImageNet mean = (0.485, 0.456, 0.406) and std = (0.229, 0.224, 0.225).

I observe that the first one is more common in TensorFlow codebases (including Jax models with TensorFlow data processing, e.g. the official Vision Transformers code), whereas the second is ubiquitous in PyTorch codebases.

I tried to find empirical comparisons of the two, but there doesn't seem to be any.

Which one is better in your opinion? I guess the performance shouldn't be too different, but still it's interesting to hear your experience.

Comments

melgor89 t1_j6xufba wrote on February 2, 2023 at 5:38 PM

#1,705,272

From my experience, they are equal now, especially when we are using now BatchNorm or LayerNorm. Both normalization methods also use mean and std value, and I make irrelevant, which kind of method you are using. Then I prefere the TensorFlow idea as it is simpler one.

puppet_pals t1_j6ygho0 wrote on February 2, 2023 at 7:56 PM

#1,706,629

ImageNet normalization is an artifact of the era of feature engineering. In the modern era you shouldn’t use it. It’s unintuitive and overfits the research dataset.

MadScientist-1214 t1_j6yj0v6 wrote on February 2, 2023 at 8:11 PM

#1,706,793

Some models actually just use [0, 1] normalization (divide by 255). Some normalization is necessary, but [0, 1] is enough. On real world datasets, computing the specific mean/std never gave me better results.

nicholsz t1_j6yniui wrote on February 2, 2023 at 8:39 PM

#1,707,080

Replying to puppet_pals (#1,706,629)

With data augmentation techniques (especially contrast or luminance randomization), normalizing would end up being a no-op in the end, right?

netw0rkf10w OP t1_j6z0oia wrote on February 2, 2023 at 10:01 PM

#1,707,775

Replying to melgor89 (#1,705,272)

So no noticeable difference in performance in your experiments?

netw0rkf10w OP t1_j6z15t0 wrote on February 2, 2023 at 10:04 PM

#1,707,804

Replying to nicholsz (#1,707,080)

I think normalization will be here to stay (maybe not the ImageNet one though), as it usually speeds up training.

nicholsz t1_j6z1jgm wrote on February 2, 2023 at 10:07 PM

#1,707,818

Replying to netw0rkf10w (#1,707,804)

Oh I meant fitting to the statistics of ImageNet / the training dataset. There's always got to be some kind of normalization

netw0rkf10w OP t1_j6zb957 wrote on February 2, 2023 at 11:12 PM

#1,708,313

Replying to puppet_pals (#1,706,629)

If I remember correctly it was first used in AlexNet, which started the deep learning era though. I agree that it doesn't make much sense nowadays, but it's still be used everywhere :\

netw0rkf10w OP t1_j6zbbkb wrote on February 2, 2023 at 11:12 PM

#1,708,316

Replying to nicholsz (#1,707,818)

Agreed!

netw0rkf10w OP t1_j6zbfz4 wrote on February 2, 2023 at 11:13 PM

#1,708,320

Replying to MadScientist-1214 (#1,706,793)

Indeed. Maybe we have a new battle between [-1, 1] and [0, 1] lol.

puppet_pals t1_j701uqt wrote on February 3, 2023 at 2:27 AM

#1,709,699

Replying to netw0rkf10w (#1,707,804)

>I think normalization will be here to stay (maybe not the ImageNet one though), as it usually speeds up training.

the reality is you are tied to the normalization scheme of whatever you are transfer learning from. (assuming you are transfer learning). Framework authors and people publishing weights should make normalization as easy as possible; typically via a 1/255.0 rescaling operation (or x/127.5 - 1, I'm indifferent though I opt for 1/255 personally)

CyberDainz t1_j715ayh wrote on February 3, 2023 at 9:14 AM

#1,711,341

use trainable normalization

self._in_beta = nn.parameter.Parameter( torch.Tensor(in_ch,), requires_grad=True)
self._in_gamma = nn.parameter.Parameter( torch.Tensor(in_ch,), requires_grad=True)
...
self._out_gamma = nn.parameter.Parameter( torch.Tensor(out_ch,), requires_grad=True)
self._out_beta = nn.parameter.Parameter( torch.Tensor(out_ch,), requires_grad=True)

...

x = x + self._in_beta[None,:,None,None]
x = x * self._in_gamma[None,:,None,None]
...
x = x * self._out_gamma[None,:,None,None]
x = x + self._out_beta[None,:,None,None]

netw0rkf10w OP t1_j71r8e8 wrote on February 3, 2023 at 1:30 PM

#1,712,194

Replying to CyberDainz (#1,711,341)

Any references?