Submitted by [deleted] t3_113o5up in deeplearning

Hi,

I have been trying to draw a bounding box around objects using a ML/NN approach.

The project uses Transfer Learning. This is a pretrained VGG16 with a Regression Head. I selected this one because it seems a good architecture and was easy to implemented it in Javascript, where you wont find it.

The first 16 layers use pretrained weights taken from the Keras creator GH page.

I have trained the out putlayers it with Caltech101 datasets airplanes (800), faces (400), stop signs (60) etc.

It predicts reasonably well with images of the same dataset not seen before by the model.

Yet for any new image (any picture with a face that I have in the laptop) the predictions are terrible.

After running out of ideas I am reaching out for some help. I have tried:

  • changed number of layers,
  • changed number of units,
  • train some VGG16 inner layers

Solution

I did it with tiny yolo v7, using google colab. Spent a day finding a project that is usable.

THIS ONE IS FINE > https://github.com/WongKinYiu/yolov

The results are indeed great. Most of the time I find Roboflow extremely handy, I used it to merge datasets, augmentate, read tutorials and that kind of thing.

Thanks for your support ! Specially to u/PaleontologistDue620

Why didnt VGG work for this task?

More options

6

Comments

You must log in or register to comment.

trajo123 t1_j8rdzfd wrote

Some general things to try:

  • (more aggressive) data augmentation when training to make you model behave better on other data, not in the dataset
  • if by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there are specialized architectures for this, e.g. R-CNN, Yolo.
  • If you have to do it with the regression head, then go for at least Resnet50, it should get you better performance across the board, assuming it was pre-trained on a large dataset like ImageNet. Vgg16 is quite small/weak by modern standards.

Why do you need to implement this in JavaScript? Wouldn't it make sense to decouple the model development from the deployment? Get a Pytorch or Tensorflow model working first, then worry about deployment. This way you can access a zoo of pre-trained models - at Hugging Face for instance.

2

[deleted] OP t1_j8re6mi wrote

> by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there

I just replied myself a similar thing in a comment. You are correct,

Indeed I am planning to do it in Keras because there arent implemented models in TF.js and doing it is quite difficult.

I do not think Pytorch models can be easily used in Tensorflow JS afterwards right

1

PaleontologistDue620 t1_j8s39ou wrote

do yourself a favor and go for the good old yolo, i think it has a tensorflow.js version too.

5

erunim t1_j8s4hki wrote

Just curious. You did align the normalization methods right?

2

[deleted] OP t1_j8s6viw wrote

I am not sure what you mean, but if it is normalizing the images and also the box dimensions, plus having them in the right order then yes.

Otherwise maybe not

1

[deleted] OP t1_j8s73tb wrote

I am taking a look. But most tensorflow.js versions are either untrainable or too large, so I will probably use Keras and export the models as Layers, If I can manage to. I am still looking for light weight NNs, it's a challenge, so I may ask somewhere for candidates.

0

PaleontologistDue620 t1_j8s9y0j wrote

you can train yolo on any framework you want and convert the weights later, then load them into your preferred inference framework. ( I'm not sure about js but in python you can load yolo models into opencv as well ). darknet is the original yolo framework which gives you scripts for training the model.

2

[deleted] OP t1_j8sdmmp wrote

Oh, that helps. I am not sure how true this is though. For example, TF.Keras and TF.SavedModel cant be converted into one another and have different features..Both can be used to predict but only one can be re trained and "tweaked" or extended from JS itself. And I am not sure you can convert Pytorch weights to Keras, but I will investigate. Apparently there is ONX that can be used to do it. I just dont want to train something that can not be converted and loaded into a browser.

What I learnt so far is that Sliding Window, Region of Interest, and Yolo are more like ways to prepare your data, and mostly any CNN could do the job, with more or less precision, I may be wrong. I am following this series https://www.youtube.com/watch?v=XXYG5ZWtjj0&list=PLhhyoLH6Ijfw0TpCTVTNk42NN08H6UvNq&index=2&ab_channel=AladdinPersson

0

tsgiannis t1_j8w2s8i wrote

Try instead of the rescale = 1. /255 to use its own preprocess function and report back

1

[deleted] OP t1_j8wddyy wrote

It is interesting you commented this. I have spent a day yesterday trying to train some Yolo in python, but all the implementations on github are quite obsolete, apart from Yolov7.

Unless you refer to ultralytics only?

1

[deleted] OP t1_j8x413i wrote

I did it with tiny yolo v7, using google colab. My point is that there are barely any projects that are usable, unless you found some?

Yes the results were great, I am thinking of writing a little blogpost for others, it is actually quite simple because I found a tutorial in roboflow this time around.

Thanks for your support !

2

[deleted] OP t1_j8zb580 wrote

I have found out that VGG can indeed be used with SSD for the same task. Idk exactly what is the general idea but mostly you can combine CNNs with something else and get the bouding box. Pytorch has a SSD-VGG model.

I wonder why pytorch has no yolo implemented that we can just use..

1

[deleted] OP t1_ja32trh wrote

Another update, I am reading the first yolo paper:

>We also train YOLO using VGG-16. This model is more accurate but also significantly slower than YOLO. It is useful for comparison to other detection systems that rely on VGG-16 but since it is slower than real-time the rest of the paper focuses on our faster models.

Which also explains that my main error was to use VGG16 without a good idea of how to make it understand where the objects are, which is what they did..

1