eeng_ t1_iy82r1q wrote on November 29, 2022 at 1:24 PM

Reply to [D] Are problems with massive amount of input features feasible? by Vae94

This is probably obvious to you, but most of the frames in a long video are redundant and provide little additional information. You could easily extract some key frames (eg substract previous frame from current frame and apply a fixed threshold), then run your network only on key frames and then ensemble these key frame predictions into a single label per video.