Object detection per-frame is certainly one of the simpler ways to do it, but has some limitations depending on the use-case you are looking to solve. It doesn't handle occlusion very well, for example. Depending on the type of video you are operating on, and the number of frames you are processing, it can also be pretty inefficient (after all, almost nothing has changed from one frame to the next, so in theory you should not have to fully re-analyze the entire frame, and instead carry forwards some processing you did on the last frame).
There are also quite a number of sub-tasks. Are you trying to track a single object only? Are you trying to track every object of a given class? Do you need to identify and track new objects as they enter the scene, or do you know everything you want to track from the first frame? Do you need to be running in realtime?
Multi-Object-Tracking is the computer vision term most commonly used for the task, so you can find a lot of algorithms under that name. DeepSORT was one I found pretty interesting, even though it is not that great anymore, just from the combination of methods it used to accomplish the task; it detects the objects, attempts to calculate a velocity frame to frame, predicts the most-likely locations with a kalman filter, then uses a NN to re-identify the target in the next frame.
saynay t1_isplccn wrote
Reply to [D] Video Tracking vs Image detection by Dense-Smf-6032
Object detection per-frame is certainly one of the simpler ways to do it, but has some limitations depending on the use-case you are looking to solve. It doesn't handle occlusion very well, for example. Depending on the type of video you are operating on, and the number of frames you are processing, it can also be pretty inefficient (after all, almost nothing has changed from one frame to the next, so in theory you should not have to fully re-analyze the entire frame, and instead carry forwards some processing you did on the last frame).
There are also quite a number of sub-tasks. Are you trying to track a single object only? Are you trying to track every object of a given class? Do you need to identify and track new objects as they enter the scene, or do you know everything you want to track from the first frame? Do you need to be running in realtime?
Multi-Object-Tracking is the computer vision term most commonly used for the task, so you can find a lot of algorithms under that name. DeepSORT was one I found pretty interesting, even though it is not that great anymore, just from the combination of methods it used to accomplish the task; it detects the objects, attempts to calculate a velocity frame to frame, predicts the most-likely locations with a kalman filter, then uses a NN to re-identify the target in the next frame.