vannak139

vannak139 t1_j7w5otz wrote

So, the simple strategy here, which kind of ignores your variable length objects, is to simply classify CNN receptive fields directly, and then Max Pool the multiple classification frames.

So, lets say that your sequence is 1024. You build a CNN that has a receptive field of 32, and a stride of 16. This network applied to the sequence will offer something like 63 "frames". Typically, the CNN would expand this network representation up with a large number of channels, take the GlobalMaxPooling to merge these frame's information, and then classify the sample.

Instead, you should classify the frames directly, meaning your output looks like 63 separate sigmoid classifications associated with regions of the signal. Then, you simply take the maximum of each classification likelihood, and use this for your image-level classification.

After training, you can remove the GlobalMaxPooling layer, and look at the segment classifications directly.

1

vannak139 t1_j7puon5 wrote

How you would approach this really depends on a few things. The most important question is, do you have the target data you want to get out of the network? It is possible, in some cases, to highlight regions of interest using only sample-level classification data. However, this usually is very context specific. If you have target data where these regions are already specified, a normal supervised learning method for wave forms should be perfectly workable, and will likely use 1D CNNs.

2

vannak139 t1_j4p7uu1 wrote

Buy a cheap car cover, it's like a sock you slide over the whole car. It'll help you with privacy and laying low in residential areas. Almost no one is going to actually touch the cover to lift it and check if someone's in there. And it's also not that weird to see a car with a cover running, remote start and all. Mind your car's exhaust though.

1