Action recognition in multiplayer sports has been a very challenging problem to be solved at a fine-grained level. Several research literature on soccer, baseball, tennis, have focused on using OCRs to find start and end times of full continuous time game videos, to align of video time to game event (e.g. at 3 min 26 seconds of a video, Messi scores a goal, or Stephen Curry scores a 3-pointer), by matching time of the game with play-by-play commentary, in order to gather large amount of training data for action/event recognition. Inspired by these researches, and equipped with a lot of rich business partnership driven play-by-play commentary data, we tried to seek such techniques to be used in production (to annotate game videos with markers of players and actions within the video play time) for various sport leagues Yahoo! Sports reports and covers(NBA/NFL/NHL/MLB/Soccer)
Our primary contributions include:
(1) elaboration on cost-effective training methods without ample manual labeling for accurately identifying bounding boxes for a specific type of objects, and text inside them and use them to expand to several new sports domains quickly,
(2) optimize core video processing run-time to support the computational needs of processing videos, to accommodate O and O videos and a massive corpus of evergreen Youtube videos of longer duration, and
(3) a novel high-quality basketball, football, and ice hockey datasets for action recognition research.
(4) converting a training data generation technique from full length videos, to work on sliced video highlights, by training high precision models, such that it can enable in video semantic search.