If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing. Inspired by the recent development of matrix factorization methods with rich theory but poor computational complexity and by the relative ease of mapping matrices onto distributed architectures, we introduce Divide-Factor-Combine (DFC), a scalable divide-and-conquer framework for noisy matrix factorization. Our experiments with collaborative filtering, video background modeling, subspace segmentation, graph-based semi-supervised learning and simulated data demonstrate the near-linear to super-linear speed-ups attainable with our approach. Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.
Session Summary
If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing
MLconf 2013
Ameet Talwalker
UCLA
Assistant Professor, Computer Science
Learn more »