Submitted by ChaosAdm t3_yhmlen in MachineLearning

I am trying to follow the steps of database creation that was followed in this research paper.

In the paper, they have stated they use sequential data as input for the model they are developing, wherein 20 frames are taken before every point-of-impact in a tennis match.

How can I go about doing that? Anyone could explain in as easy words as possible or guide me to an appropriate medium?

Quoting the paper:

>For this dataset, we used a tennis match video (1080 × 720 pixels, 25 fps) of a professional tennis match uploaded to YouTube. The video is input, making the players’ rectangular images of the time of impact of each player to 20 frames before impact

3

Comments

You must log in or register to comment.

dual_carriageway t1_iueqojt wrote

Another option could be tracking the audio(if there is any) - the ball hitting the racket should make a similar ish noise each time and you may be able to automate detecting impacts using that

6

michelin_chalupa t1_iuesch3 wrote

Building off this: could annotate a hundred or so impacts, fine tune an audio embedding, and do a similarity search over the video audio.

Could work pretty well, since players grunting would probably be pretty distinct from other audio.

5

ChaosAdm OP t1_iuesrn5 wrote

That's an interesting alternative. Finding all instances of a particular sound in a video will be another mini project I will start looking into to test this approach as well!

2

impossiblefork t1_iuhof3j wrote

For hard baseline shots that would work, but not necessarily for slices and the touch shots that are intentionally soft.

1

ChaosAdm OP t1_iuiqrb5 wrote

This worked quite wonderfully and saved me manual labor, exceptions to a few captured sequences of a replay/side camera view of a replay where the frames are not the standard view of the court. (I manually deleted the folders for such exceptions)

1

michelin_chalupa t1_iuekl9p wrote

The simple way would be to just annotate those impact frames. A more sophisticated way might involve tracking the ball, and annotating those frames where it’s estimated velocity is low (which will of course be noisy, depending on the angle of the camera wrt it’s trajectory).

If it were me, I’d just hunker down for an afternoon and manually annotate those impact frames.

2

ChaosAdm OP t1_iuekxi4 wrote

So... I am supposed to screenshot 20 frames before every impact frame manually? o_o

2

nins_ t1_iuen1e3 wrote

The manual annotation would involve you noting down the timestamps in a csv. Then you write a short script (I would do it with OpenCV) to read the video files, get 20 frames prior to each timestamp, save them as images into whatever directory structure you need.

Edit: MoviePy package will probably be easier than OpenCV for you.

5

ChaosAdm OP t1_iueptr6 wrote

Perfect! Thank you. I will try to do this tomorrow =)

2