Spark is a new cluster computing engine that is rapidly gaining popularity. It is one of the most active open source projects in big data, surpassing even Hadoop MapReduce. Spark was designed to both make traditional MapReduce programming easier and to support new types of applications, with one of the earliest focus areas being machine learning. MLlib is a Spark subproject providing machine learning primitives. In this talk, we’ll demonstrate how to use Spark’s high-level API to implement scalable machine learning algorithms, and how MLlib integrates with other components (Streaming, SQL, and GraphX) of the Spark distribution to create practical machine learning pipelines. We’ll also show new features in the upcoming v1.0 release.
Session Summary
Spark is a new cluster computing engine that is ra
MLconf 2014 New York City
Xiangrui Meng
Databricks
Software Engineer
Learn more »