Submitted by MichelMED10 t3_ysah21 in MachineLearning

Hey ,

In Timm's implementation of stochastic depth (https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/layers/drop.py) the tensor is scaled by the probability of keeping the actual block. I didn't understand why he does so specially that this is not mentioned in the paper.

Can anyone explain this to me please ?

Thanks !

The code :

def drop_path(x, drop_prob: float = 0., training: bool = False, scale_by_keep: bool = True):

keep_prob = 1 - drop_prob shape = (x.shape[0],) + (1,) * (x.ndim - 1)

random_tensor = x.new_empty(shape).bernoulli_(keep_prob)

if keep_prob > 0.0 and scale_by_keep:

random_tensor.div_(keep_prob)

return x * random_tensor

9

Comments

You must log in or register to comment.

killver t1_ivzoqe1 wrote

Why don't you ask in his repo?

6

Pretend-Economics758 t1_iw0n3l8 wrote

I guess it’s due to normalisation idea similar to using dropout to reduce overfitting?

3