I am trying to follow the steps of database creation that was followed in this research paper.

In the paper, they have stated they use sequential data as input for the model they are developing, wherein 20 frames are taken before every point-of-impact in a tennis match.

How can I go about doing that? Anyone could explain in as easy words as possible or guide me to an appropriate medium?

Quoting the paper:

>For this dataset, we used a tennis match video (1080 × 720 pixels, 25 fps) of a professional tennis match uploaded to YouTube. The video is input, making the players’ rectangular images of the time of impact of each player to 20 frames before impact

Comments

dual_carriageway t1_iueqojt wrote on October 30, 2022 at 7:42 PM

Another option could be tracking the audio(if there is any) - the ball hitting the racket should make a similar ish noise each time and you may be able to automate detecting impacts using that

michelin_chalupa t1_iuesch3 wrote on October 30, 2022 at 7:53 PM

Building off this: could annotate a hundred or so impacts, fine tune an audio embedding, and do a similarity search over the video audio.

Could work pretty well, since players grunting would probably be pretty distinct from other audio.

quichemiata t1_iuh4w01 wrote on October 31, 2022 at 8:30 AM

A player can grunt and Miss though

michelin_chalupa t1_iui244a wrote on October 31, 2022 at 2:25 PM

Yep.

ChaosAdm OP t1_iuesrn5 wrote on October 30, 2022 at 7:55 PM

That's an interesting alternative. Finding all instances of a particular sound in a video will be another mini project I will start looking into to test this approach as well!

impossiblefork t1_iuhof3j wrote on October 31, 2022 at 12:36 PM

For hard baseline shots that would work, but not necessarily for slices and the touch shots that are intentionally soft.

ChaosAdm OP t1_iuiqrb5 wrote on October 31, 2022 at 5:13 PM

This worked quite wonderfully and saved me manual labor, exceptions to a few captured sequences of a replay/side camera view of a replay where the frames are not the standard view of the court. (I manually deleted the folders for such exceptions)

michelin_chalupa t1_iuekl9p wrote on October 30, 2022 at 7:02 PM

The simple way would be to just annotate those impact frames. A more sophisticated way might involve tracking the ball, and annotating those frames where it’s estimated velocity is low (which will of course be noisy, depending on the angle of the camera wrt it’s trajectory).

If it were me, I’d just hunker down for an afternoon and manually annotate those impact frames.

ChaosAdm OP t1_iuekxi4 wrote on October 30, 2022 at 7:04 PM

So... I am supposed to screenshot 20 frames before every impact frame manually? o_o

nins_ t1_iuen1e3 wrote on October 30, 2022 at 7:18 PM

The manual annotation would involve you noting down the timestamps in a csv. Then you write a short script (I would do it with OpenCV) to read the video files, get 20 frames prior to each timestamp, save them as images into whatever directory structure you need.

Edit: MoviePy package will probably be easier than OpenCV for you.

ChaosAdm OP t1_iueptr6 wrote on October 30, 2022 at 7:36 PM

Perfect! Thank you. I will try to do this tomorrow =)