Submitted by [deleted] t3_113o5up in deeplearning

Hi,

I have been trying to draw a bounding box around objects using a ML/NN approach.

The project uses Transfer Learning. This is a pretrained VGG16 with a Regression Head. I selected this one because it seems a good architecture and was easy to implemented it in Javascript, where you wont find it.

The first 16 layers use pretrained weights taken from the Keras creator GH page.

I have trained the out putlayers it with Caltech101 datasets airplanes (800), faces (400), stop signs (60) etc.

It predicts reasonably well with images of the same dataset not seen before by the model.

Yet for any new image (any picture with a face that I have in the laptop) the predictions are terrible.

After running out of ideas I am reaching out for some help. I have tried:

  • changed number of layers,
  • changed number of units,
  • train some VGG16 inner layers

Solution

I did it with tiny yolo v7, using google colab. Spent a day finding a project that is usable.

THIS ONE IS FINE > https://github.com/WongKinYiu/yolov

The results are indeed great. Most of the time I find Roboflow extremely handy, I used it to merge datasets, augmentate, read tutorials and that kind of thing.

Thanks for your support ! Specially to u/PaleontologistDue620

Why didnt VGG work for this task?

More options

6

Comments

You must log in or register to comment.

PaleontologistDue620 t1_j8s39ou wrote

do yourself a favor and go for the good old yolo, i think it has a tensorflow.js version too.

5

[deleted] OP t1_j8s73tb wrote

I am taking a look. But most tensorflow.js versions are either untrainable or too large, so I will probably use Keras and export the models as Layers, If I can manage to. I am still looking for light weight NNs, it's a challenge, so I may ask somewhere for candidates.

0

PaleontologistDue620 t1_j8s7qkn wrote

train it with either darknet or python then convert the weights if you need to, there are already scripts for everything you need to do, that's why i said YOLO in the first place.

3

[deleted] OP t1_j8s8e0x wrote

Sorry for the ignorance but wdym by darknet here? All I was planning is to use tensorflow's keras, as tf.js won't make it I think. I started a week ago so I am still digesting things.

1

PaleontologistDue620 t1_j8s9y0j wrote

you can train yolo on any framework you want and convert the weights later, then load them into your preferred inference framework. ( I'm not sure about js but in python you can load yolo models into opencv as well ). darknet is the original yolo framework which gives you scripts for training the model.

2

[deleted] OP t1_j8sdmmp wrote

Oh, that helps. I am not sure how true this is though. For example, TF.Keras and TF.SavedModel cant be converted into one another and have different features..Both can be used to predict but only one can be re trained and "tweaked" or extended from JS itself. And I am not sure you can convert Pytorch weights to Keras, but I will investigate. Apparently there is ONX that can be used to do it. I just dont want to train something that can not be converted and loaded into a browser.

What I learnt so far is that Sliding Window, Region of Interest, and Yolo are more like ways to prepare your data, and mostly any CNN could do the job, with more or less precision, I may be wrong. I am following this series https://www.youtube.com/watch?v=XXYG5ZWtjj0&list=PLhhyoLH6Ijfw0TpCTVTNk42NN08H6UvNq&index=2&ab_channel=AladdinPersson

0

LuckyNumber-Bot t1_j8sdnn3 wrote

All the numbers in your comment added up to 69. Congrats!

  5
+ 6
+ 42
+ 8
+ 6
+ 2
= 69

^(Click here to have me scan all your future comments.)
^(Summon me on specific comments with u/LuckyNumber-Bot.)

2

tedmobsky t1_j8v9fnl wrote

Good bot

1

B0tRank t1_j8v9glq wrote

Thank you, tedmobsky, for voting on LuckyNumber-Bot.

This bot wants to find the best and worst bots on Reddit. You can view results here.


^(Even if I don't reply to your comment, I'm still listening for votes. Check the webpage to see if your vote registered!)

1

[deleted] OP t1_j8wddyy wrote

It is interesting you commented this. I have spent a day yesterday trying to train some Yolo in python, but all the implementations on github are quite obsolete, apart from Yolov7.

Unless you refer to ultralytics only?

1

PaleontologistDue620 t1_j8wuk9b wrote

go for YOLO v3 or YOLO v4. i promise they'll be good enough for you, don't be bothered with version numbers . (if you need lighter models go for tiny versions of v3 and v4).

1

[deleted] OP t1_j8x413i wrote

I did it with tiny yolo v7, using google colab. My point is that there are barely any projects that are usable, unless you found some?

Yes the results were great, I am thinking of writing a little blogpost for others, it is actually quite simple because I found a tutorial in roboflow this time around.

Thanks for your support !

2

[deleted] OP t1_j90ysce wrote

Sorry to be annoying but I thought it was nice to give you some news as well. I was confused as to why there isnt yolo in pytorch, here it is why https://github.com/pytorch/vision/issues/6341

2

PaleontologistDue620 t1_j90z1yo wrote

no you're not annoying at all, thanks for the update :)

1

[deleted] OP t1_ja32trh wrote

Another update, I am reading the first yolo paper:

>We also train YOLO using VGG-16. This model is more accurate but also significantly slower than YOLO. It is useful for comparison to other detection systems that rely on VGG-16 but since it is slower than real-time the rest of the paper focuses on our faster models.

Which also explains that my main error was to use VGG16 without a good idea of how to make it understand where the objects are, which is what they did..

1

[deleted] OP t1_j8zb580 wrote

I have found out that VGG can indeed be used with SSD for the same task. Idk exactly what is the general idea but mostly you can combine CNNs with something else and get the bouding box. Pytorch has a SSD-VGG model.

I wonder why pytorch has no yolo implemented that we can just use..

1

trajo123 t1_j8rdzfd wrote

Some general things to try:

  • (more aggressive) data augmentation when training to make you model behave better on other data, not in the dataset
  • if by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there are specialized architectures for this, e.g. R-CNN, Yolo.
  • If you have to do it with the regression head, then go for at least Resnet50, it should get you better performance across the board, assuming it was pre-trained on a large dataset like ImageNet. Vgg16 is quite small/weak by modern standards.

Why do you need to implement this in JavaScript? Wouldn't it make sense to decouple the model development from the deployment? Get a Pytorch or Tensorflow model working first, then worry about deployment. This way you can access a zoo of pre-trained models - at Hugging Face for instance.

2

[deleted] OP t1_j8re6mi wrote

> by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there

I just replied myself a similar thing in a comment. You are correct,

Indeed I am planning to do it in Keras because there arent implemented models in TF.js and doing it is quite difficult.

I do not think Pytorch models can be easily used in Tensorflow JS afterwards right

1

erunim t1_j8s4hki wrote

Just curious. You did align the normalization methods right?

2

[deleted] OP t1_j8s6viw wrote

I am not sure what you mean, but if it is normalizing the images and also the box dimensions, plus having them in the right order then yes.

Otherwise maybe not

1

StrasJam t1_j8sgfvm wrote

Think they mean do you apply the same normalization you applied to images during training to the new images you are trying to predict

3

[deleted] OP t1_j8sop7t wrote

Yes, I pad them to make them square and resize to 224 using bilinear algorithm, and /255, then the inverse for predicting.

1

tsgiannis t1_j8w2s8i wrote

Try instead of the rescale = 1. /255 to use its own preprocess function and report back

1

[deleted] OP t1_j8wd1cz wrote

What do you mean? The values were normalized in that way.

I think VGG16 will never work for an object detection task after reading about it. Do you disagee?

1

tsgiannis t1_j8wgwsj wrote

I had serious issues with the whole training of VGG...this fixed it...

give it a spin.

1

[deleted] OP t1_j8wh57w wrote

Yes but I dont really know what is your solution ?

I can skip rescaling it, but what then?

Mind to tell what were you using it for?

1