Unsupervised Learning on Huge Data with Apache Spark: Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Spark’s MLLib module contains implementations of several unsupervised learning algorithms that scale to large datasets. In this talk, we’ll discuss how to use and implement large-scale machine learning algorithms with the Spark programming model, diving into MLLib’s K-means clustering and Principal Component Analysis (PCA).
Session Summary
Unsupervised Learning on Huge Data with Apache Spark:
MLconf 2014 Atlanta
Sandy Ryza
Cloudera
Software Engineer
Learn more »