Submitted by Rishh3112 t3_120gvgw in deeplearning
humpeldumpel t1_jdhcgg6 wrote
Reply to comment by Rishh3112 in Cuda out of memory error by Rishh3112
Well then it's the memory issue. Hard to say without seeing your code
Rishh3112 OP t1_jdhd785 wrote
class CNN(nn.Module):
def __init__(self, num_chars):
super(CNN, self).__init__()
# Convolution Layer
self.conv1 = nn.Conv2d(3, 128, kernel_size=(3, 6), padding=(1, 1))
self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))
self.conv2 = nn.Conv2d(128, 64, kernel_size=(3, 6), padding=(1, 1))
self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))
# Dense Layer
self.fc1 = nn.Linear(768, 64)
self.dp1 = nn.Dropout(0.2)
# Recurrent Layer
self.lstm = nn.GRU(64, 32, bidirectional=True)
# Output Layer
self.output = nn.Linear(64, num_chars + 1)
def forward(self, images, targets=None):
bs, _, _, _ = images.size()
x = F.relu(self.conv1(images))
x = self.pool1(x)
x = F.relu(self.conv2(x))
x = self.pool2(x)
x = x.permute(0, 3, 1, 2)
x = x.view(bs, x.size(1), -1)
x = F.relu(self.fc1(x))
x = self.dp1(x)
x, _ = self.lstm(x)
x = self.output(x)
x = x.permute(1, 0, 2)
if targets is not None:
log_probs = F.log_softmax(x, 2)
input_lengths = torch.full(
size=(bs,), fill_value=log_probs.size(0), dtype=torch.int32
)
target_lengths = torch.full(
size=(bs,), fill_value=targets.size(1), dtype=torch.int32
)
loss = nn.CTCLoss(blank=0)(
log_probs, targets, input_lengths, target_lengths
)
return x, loss
return x, None
if __name__ == '__main__':
model = CNN(74)
img = torch.rand(config.BATCH_SIZE, 3, 50, 200)
target = torch.randint(1, 20, (config.BATCH_SIZE, 5))
x, loss = model(img, target)
print(loss)
trajo123 t1_jdhi7u8 wrote
The problem is likely in your training loop. Perhaps your computation graphs keeps going because you keep track of the average loss as an autograd variable rather than a plain numerical one. Make sure that for any metrics/logging you use loss.item().
humpeldumpel t1_jdhpl0w wrote
And also make use of the training and validation mode of the model
Rishh3112 OP t1_jdhib79 wrote
sure ill will give it a try thanks a lot.
Rishh3112 OP t1_jdhiguj wrote
i just checked in my training loop I'm using loss.item()
_vb__ t1_jdiwjqk wrote
Are you calling the zero_grad method on your optimizer in every step of your training loop?
Viewing a single comment thread. View all comments