Hi,

I have been trying to draw a bounding box around objects using a ML/NN approach.

The project uses Transfer Learning. This is a pretrained VGG16 with a Regression Head. I selected this one because it seems a good architecture and was easy to implemented it in Javascript, where you wont find it.

The first 16 layers use pretrained weights taken from the Keras creator GH page.

I have trained the out putlayers it with Caltech101 datasets airplanes (800), faces (400), stop signs (60) etc.

It predicts reasonably well with images of the same dataset not seen before by the model.

Yet for any new image (any picture with a face that I have in the laptop) the predictions are terrible.

After running out of ideas I am reaching out for some help. I have tried:

changed number of layers,
changed number of units,
train some VGG16 inner layers

Solution

I did it with tiny yolo v7, using google colab. Spent a day finding a project that is usable.

THIS ONE IS FINE > https://github.com/WongKinYiu/yolov

The results are indeed great. Most of the time I find Roboflow extremely handy, I used it to merge datasets, augmentate, read tutorials and that kind of thing.

Thanks for your support ! Specially to u/PaleontologistDue620

Why didnt VGG work for this task?

Pytorch has several usable models one of which is SSD-VGG which also explains what my idea was lacking of (a strategy to combine with the CNN)
Why doesnt pytorch have yolo! https://github.com/pytorch/vision/issues/6341

More options

Apart from VGG-SSD (Source code https://pytorch.org/vision/main/_modules/torchvision/models/detection/ssd.html), there are other models available there. And also the one I tried my self (yolov7 https://github.com/WongKinYiu/yolov)
Nice implemention of Yolo that is BSD license (not GPL) https://github.com/Megvii-BaseDetection/YOLOX

Comments

You must log in or register to comment.

[deleted] OP t1_j8rdy53 wrote on February 16, 2023 at 12:22 PM

#1,829,021

One possible reason https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e i.e the VGG convolutional model wont be good for bounding boxes but only for classification task.

trajo123 t1_j8rdzfd wrote on February 16, 2023 at 12:22 PM

#1,829,025

Some general things to try:

(more aggressive) data augmentation when training to make you model behave better on other data, not in the dataset
if by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there are specialized architectures for this, e.g. R-CNN, Yolo.
If you have to do it with the regression head, then go for at least Resnet50, it should get you better performance across the board, assuming it was pre-trained on a large dataset like ImageNet. Vgg16 is quite small/weak by modern standards.

Why do you need to implement this in JavaScript? Wouldn't it make sense to decouple the model development from the deployment? Get a Pytorch or Tensorflow model working first, then worry about deployment. This way you can access a zoo of pre-trained models - at Hugging Face for instance.

[deleted] OP t1_j8re6mi wrote on February 16, 2023 at 12:24 PM

#1,829,038

Replying to trajo123 (#1,829,025)

> by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there

I just replied myself a similar thing in a comment. You are correct,

Indeed I am planning to do it in Keras because there arent implemented models in TF.js and doing it is quite difficult.

I do not think Pytorch models can be easily used in Tensorflow JS afterwards right

PaleontologistDue620 t1_j8s39ou wrote on February 16, 2023 at 3:41 PM

#1,831,080

do yourself a favor and go for the good old yolo, i think it has a tensorflow.js version too.

erunim t1_j8s4hki wrote on February 16, 2023 at 3:49 PM

#1,831,167

Just curious. You did align the normalization methods right?

[deleted] OP t1_j8s6viw wrote on February 16, 2023 at 4:05 PM

#1,831,368

Replying to erunim (#1,831,167)

I am not sure what you mean, but if it is normalizing the images and also the box dimensions, plus having them in the right order then yes.

Otherwise maybe not

[deleted] OP t1_j8s73tb wrote on February 16, 2023 at 4:07 PM

#1,831,389

Replying to PaleontologistDue620 (#1,831,080)

I am taking a look. But most tensorflow.js versions are either untrainable or too large, so I will probably use Keras and export the models as Layers, If I can manage to. I am still looking for light weight NNs, it's a challenge, so I may ask somewhere for candidates.

PaleontologistDue620 t1_j8s7qkn wrote on February 16, 2023 at 4:11 PM

#1,831,449

Replying to [deleted] (#1,831,389)

train it with either darknet or python then convert the weights if you need to, there are already scripts for everything you need to do, that's why i said YOLO in the first place.

[deleted] OP t1_j8s8e0x wrote on February 16, 2023 at 4:15 PM

#1,831,493

Replying to PaleontologistDue620 (#1,831,449)

Sorry for the ignorance but wdym by darknet here? All I was planning is to use tensorflow's keras, as tf.js won't make it I think. I started a week ago so I am still digesting things.

PaleontologistDue620 t1_j8s9y0j wrote on February 16, 2023 at 4:25 PM

#1,831,607

Replying to [deleted] (#1,831,493)

you can train yolo on any framework you want and convert the weights later, then load them into your preferred inference framework. ( I'm not sure about js but in python you can load yolo models into opencv as well ). darknet is the original yolo framework which gives you scripts for training the model.

[deleted] OP t1_j8sdmmp wrote on February 16, 2023 at 4:49 PM

#1,831,891

Replying to PaleontologistDue620 (#1,831,607)

Oh, that helps. I am not sure how true this is though. For example, TF.Keras and TF.SavedModel cant be converted into one another and have different features..Both can be used to predict but only one can be re trained and "tweaked" or extended from JS itself. And I am not sure you can convert Pytorch weights to Keras, but I will investigate. Apparently there is ONX that can be used to do it. I just dont want to train something that can not be converted and loaded into a browser.

What I learnt so far is that Sliding Window, Region of Interest, and Yolo are more like ways to prepare your data, and mostly any CNN could do the job, with more or less precision, I may be wrong. I am following this series https://www.youtube.com/watch?v=XXYG5ZWtjj0&list=PLhhyoLH6Ijfw0TpCTVTNk42NN08H6UvNq&index=2&ab_channel=AladdinPersson

LuckyNumber-Bot t1_j8sdnn3 wrote on February 16, 2023 at 4:50 PM

#1,831,894

Replying to [deleted] (#1,831,891)

All the numbers in your comment added up to 69. Congrats!

  5
+ 6
+ 42
+ 8
+ 6
+ 2
= 69

^(Click here to have me scan all your future comments.)
^(Summon me on specific comments with u/LuckyNumber-Bot.)

StrasJam t1_j8sgfvm wrote on February 16, 2023 at 5:07 PM

#1,832,097

Replying to [deleted] (#1,831,368)

Think they mean do you apply the same normalization you applied to images during training to the new images you are trying to predict

[deleted] OP t1_j8sop7t wrote on February 16, 2023 at 6:00 PM

#1,832,707

Replying to StrasJam (#1,832,097)

Yes, I pad them to make them square and resize to 224 using bilinear algorithm, and /255, then the inverse for predicting.

tedmobsky t1_j8v9fnl wrote on February 17, 2023 at 4:45 AM

#1,839,710

Replying to LuckyNumber-Bot (#1,831,894)

Good bot

B0tRank t1_j8v9glq wrote on February 17, 2023 at 4:45 AM

#1,839,712

Replying to tedmobsky (#1,839,710)

Thank you, tedmobsky, for voting on LuckyNumber-Bot.

This bot wants to find the best and worst bots on Reddit. You can view results here.

^(Even if I don't reply to your comment, I'm still listening for votes. Check the webpage to see if your vote registered!)

tsgiannis t1_j8w2s8i wrote on February 17, 2023 at 10:46 AM

#1,841,543

Try instead of the rescale = 1. /255 to use its own preprocess function and report back

[deleted] OP t1_j8wd1cz wrote on February 17, 2023 at 12:42 PM

#1,842,259

Replying to tsgiannis (#1,841,543)

What do you mean? The values were normalized in that way.

I think VGG16 will never work for an object detection task after reading about it. Do you disagee?

[deleted] OP t1_j8wddyy wrote on February 17, 2023 at 12:46 PM

#1,842,278

Replying to PaleontologistDue620 (#1,831,449)

It is interesting you commented this. I have spent a day yesterday trying to train some Yolo in python, but all the implementations on github are quite obsolete, apart from Yolov7.

Unless you refer to ultralytics only?

tsgiannis t1_j8wgwsj wrote on February 17, 2023 at 1:17 PM

#1,842,610

Replying to [deleted] (#1,842,259)

I had serious issues with the whole training of VGG...this fixed it...

give it a spin.

[deleted] OP t1_j8wh57w wrote on February 17, 2023 at 1:19 PM

#1,842,631

Replying to tsgiannis (#1,842,610)

Yes but I dont really know what is your solution ?

I can skip rescaling it, but what then?

Mind to tell what were you using it for?

PaleontologistDue620 t1_j8wuk9b wrote on February 17, 2023 at 3:00 PM

#1,843,854

Replying to [deleted] (#1,842,278)

go for YOLO v3 or YOLO v4. i promise they'll be good enough for you, don't be bothered with version numbers . (if you need lighter models go for tiny versions of v3 and v4).

[deleted] OP t1_j8x413i wrote on February 17, 2023 at 4:03 PM

#1,844,664

Replying to PaleontologistDue620 (#1,843,854)

I did it with tiny yolo v7, using google colab. My point is that there are barely any projects that are usable, unless you found some?

Yes the results were great, I am thinking of writing a little blogpost for others, it is actually quite simple because I found a tutorial in roboflow this time around.

Thanks for your support !

[deleted] OP t1_j8zb580 wrote on February 18, 2023 at 12:56 AM

#1,851,155

Replying to PaleontologistDue620 (#1,843,854)

I have found out that VGG can indeed be used with SSD for the same task. Idk exactly what is the general idea but mostly you can combine CNNs with something else and get the bouding box. Pytorch has a SSD-VGG model.

I wonder why pytorch has no yolo implemented that we can just use..

[deleted] OP t1_j90ysce wrote on February 18, 2023 at 11:51 AM

#1,855,464

Replying to PaleontologistDue620 (#1,843,854)

Sorry to be annoying but I thought it was nice to give you some news as well. I was confused as to why there isnt yolo in pytorch, here it is why https://github.com/pytorch/vision/issues/6341

PaleontologistDue620 t1_j90z1yo wrote on February 18, 2023 at 11:54 AM

#1,855,479

Replying to [deleted] (#1,855,464)

no you're not annoying at all, thanks for the update :)

[deleted] OP t1_ja32trh wrote on February 26, 2023 at 2:10 PM

#2,023,507

Replying to PaleontologistDue620 (#1,855,479)

Another update, I am reading the first yolo paper:

>We also train YOLO using VGG-16. This model is more accurate but also significantly slower than YOLO. It is useful for comparison to other detection systems that rely on VGG-16 but since it is slower than real-time the rest of the paper focuses on our faster models.

Which also explains that my main error was to use VGG16 without a good idea of how to make it understand where the objects are, which is what they did..