AKavun

AKavun OP t1_j3l4gb1 wrote

u/trajo123 u/FastestLearner u/trajo123

I am giving this as a general update. In my original post, I said "I am doing something very obvious wrong" and indeed I was. The reason my model did not learn at all was that the whole python script with the exception of my main method was being re-executed every few seconds which actually caused my model to reinitilize and reset. I believe this was caused by PyTorch's handling of the "num_workers" parameter in the dataloader which tries to do some multithreading magic and ends up re-executing the script multiple times.

So fixing that allowed my model to learn but it still performed poorly due to the reasons all of you so generously explained in great detail. My first instinctive reaction to this was to switch to resnet18 and change the output layer. I also switched to crossentropy loss as I learned I can still use softmax in postprocessing to obtain the prediction confidence, this was something I did not think it was possible to do previously. Now my model performs with 90% accuracy in my test set and rest I think is just tweaking the hyperparameters, enlarging and augmenting the data, and maybe doing some partial training with different learning rates etc.

However I still do want to learn how to design an architecture from scratch so I am experimenting with that after carefully reading the answers you provided. I thank each of you so much and wish all the success in your careers. You are great people and we are a great community

2

AKavun OP t1_irx83tz wrote

>I see. In the tutorial, for each output, a 1-dimensional Dense layer with a sigmoid activation function is used, along with binary crossentropy as the loss function. You could exchange that by an n-dimensional Dense layer with softmax activation, along with categorical crossentropy. So the basic architecture can remain similar, you just have to adapt the outputs.

I will first learn what these things mean, then I will get back to you. Thank you for your guidance.

2

AKavun OP t1_irwurmu wrote

Yeah, this is mostly similar to what I want to do but there is a difference.

In the tutorial, there are only binary attributes like if a celebrity is bald or not. But I want to do multi-value attributes like the color of the clothing which can take a lot of values, not just 1 or 0.

With this in mind, is it still multilabel classification

1