Scalable High Precision OCRs for Game Event Detections

Action recognition in multiplayer sports has been a very challenging problem to be solved at a fine-grained level. Several research literature on soccer, baseball, tennis, have focused on using OCRs to find start and end times of full continuous time game videos, to align of video time to game event (e.g. at 3 min 26 seconds of a video, Messi scores a goal, or Stephen Curry scores a 3-pointer), by matching time of the game with play-by-play commentary, in order to gather large amount of training data for action/event recognition. Inspired by these researches, and equipped with a lot of rich business partnership driven play-by-play commentary data, we tried to seek such techniques to be used in production (to annotate game videos with markers of players and actions within the video play time) for various sport leagues Yahoo! Sports reports and covers(NBA/NFL/NHL/MLB/Soccer). Our primary motivation, was to enable sports users to skip to exciting section of sports videos (where there favorite player scores etc) so as to decrease abandonment rate video watch times, as well to give personalized clips of game highlights to Yahoo Sports app users who have declared explicit interest of watching a team or a player (like Lakers in the NBA etc). However to utilize such automated training data collection methods, and turn them into models which serve production on a day to day basis, we face challenges of general purpose trained text detectors and recognizers not accurate enough to identify box score/game clocks with high accuracy. We also do not have full uninterrupted videos like the selected videos used in retrieving training data in the above research literature. Composed/edited highlight videos of the game, are freely available all the time. We innovate on efficiently transfer learning general purpose text detectors and recognizers by efficiently collecting training data, by using knowledge constraints. Employing transfer learning on cheaply collected training data by virtue of noise correction of training data obtained by using general purpose pre-trained models to predict on unlabelled videos, with domain knowledge based constraints, we develop high precision, focused models. These models while being almost 100% accurate in identifying game times, from box scores, helps us expand to different leagues like NBA, NFL, NHL and several soccer leagues without any humanly labelled training data. We also propose strategies of scaling prediction times and scale beyond our licensed videos to evergreen Youtube highlight videos from different broadcasting channels. Finally, we propose a high-quality fine-grained action recognition datasets for training non-OCR based action recognition classifiers, which are bigger and more diverse than proposed in the research community.

Our primary contributions include:
(1) elaboration on cost-effective training methods without ample manual labeling for accurately identifying bounding boxes for a specific type of objects, and text inside them and use them to expand to several new sports domains quickly,
(2) optimize core video processing run-time to support the computational needs of processing videos, to accommodate O and O videos and a massive corpus of evergreen Youtube videos of longer duration, and
(3) a novel high-quality basketball, football, and ice hockey datasets for action recognition research.
(4) converting a training data generation technique from full length videos, to work on sliced video highlights, by training high precision models, such that it can enable in video semantic search.

View the slides for this session

Session Summary

Topojoy Biswas

Code of Conduct

Refund Policy

Press Inquiries

Don't miss a thing!