MLconf 2014 San Francisco was held on Friday, November 14, 2014 at the Parc 55 Wyndham in San Francisco. MLconf was created to host the thought leaders in Machine Learning and Data Science to discuss their most recent experience with applying techniques, tools, algorithms and methodologies to the seemingly impossible problems that occur when dealing with massive and noisy data. MLconf is independent of any outside company or university – it’s simply a conference organized to gather the Machine Learning communities in various cities to share knowledge and create an environment for the community to coalesce.
Steffen Rendle, Research Scientist, Google
Steffen Rendle is a research scientist at Google. Previous to this, he was an assistant professor at the University of Konstanz, Germany. Steffen’s research interest is in large-scale machine learning using factorization models. His research received the best paper award at WWW 2010 and a best student paper award at WSDM 2010. Steffen has applied his research in various machine learning competitions, receiving awards at the ECML Discovery Challenges 2009 & 2013, both tasks in KDDCup 2012 and other contests.
Factorization Machines: Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
Tamara Kolda, Distinguished Member of Technical Staff, Sandia National Laboratories
Tamara Kolda is a Distinguished Member of Technical at Sandia National Laboratories in Livermore, California, where she works on a broad range of problems including network modeling and analysis, multilinear algebra and tensor decompositions, data mining, and cybersecurity. She has also worked in optimization, nonlinear solvers, parallel computing, and the design of scientific software. She has authored numerous software packages, including the well-known Tensor Toolbox for MATLAB. Before joining Sandia, Kolda held the Householder Postdoctoral Fellowship in Scientific Computing at Oak Ridge National Laboratory. She has received several awards including a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE), two best papers awards (ICDM’08 and SDM’13), and Distinguished Member of the Association for Computing Machinery (ACM). She is an elected member of the Society for Industrial and Applied Mathematics (SIAM) Board of Trustees, Section Editor for the Software and High Performance Computing section of the SIAM Journal on Scientific Computing, and Associate Editor for SIAM Journal on Matrix Analysis. She received her Ph.D. in applied mathematics from the University of Maryland at College Park in 1997.
Tensor Analysis for Networks and Sparse Data: Tensors are higher-order or n-way arrays. They have proven useful in a wide variety of data analysis tasks in applications ranging from chemometrics to sociology to neuroscience, and much more. We consider the utility of canonical polyadic (aka CANDECOMP or PARAFAC) tensor decompositions and briefly survey. Tensors are useful for analyzing large-scale networks with attributed connections. For instance, a time-evolving network can be naturally expressed as a third-order tensor. We explore the applicability of tensor analysis, its connection to matrix-based methods, different statistical assumptions and corresponding optimization objective functions, and how to efficiently handle spares data. We illustrate the utility of tensor decompositions with several examples.
Xavier Amatriain, Director of Algorithms Engineering, Netflix
Xavier Amatriain (PhD) is Director of Algorithms Engineering at Netflix. He leads a team of researchers and engineers designing the next wave of machine learning approaches to power the Netflix product. Previous to this, he was a Researcher in Recommender Systems, and neighboring areas such as Data Mining, Machine Learning, Information Retrieval, and Multimedia. He has authored more than 50 papers including book chapters, journals, and articles in international conferences. He has also lectured in different universities including the University of California Santa Barbara and UPF in Barcelona, Spain.
10 Lessons Learned from building real-life large-scale ML systems: There are many good textbooks and courses where you can be introduced to machine learning and maybe even learn some of the most intricate details about a particular approach or algorithm. While understanding that theory is a very important base and starting point, there are many other practical issues related to building real-life ML systems that you don’t usually hear about. In this talk I will share some of the most important lessons learned in years of building the large-scale ML solutions that power the Netflix product and scale to millions of users across many countries. I will discuss issues such as model and feature complexity, sampling, regularization, distributing/parallelizing algorithms, or how to think about offline vs. online computation.
Lise Getoor, Professor, Computer Science, UC Santa Cruz
Lise Getoor is a professor in the Computer Science Department at UC Santa Cruz. Her research areas include machine learning and reasoning under uncertainty; in addition she works in data management, visual analytics and social network analysis. She has over 200 publications and extensive experience with machine learning and probabilistic modeling methods for graph and network data. She is a Fellow of the Association for Artificial Intelligence, an elected board member of the International Machine Learning Society, has served as Machine Learning Journal Action Editor, Associate Editor for the ACM Transactions of Knowledge Discovery from Data, JAIR Associate Editor, and she has served on the AAAI Council. She was co-chair for ICML 2011, and has served on the PC of many conferences including the senior PC of AAAI, ICML, KDD, UAI, WSDM and the PC of SIGMOD, VLDB, and WWW. She is a recipient of an NSF Career Award and eight best paper and best student paper awards. She was recently recognized as one of the top ten emerging researchers leaders in data mining and data science based on citation and impact, according to KDD Nuggets. She received her PhD from Stanford University in 2001, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a professor at the University of Maryland, College Park from 2001-2013.
Big Graph Data Science: One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. We need data science techniques which an represent and reason effectively with this form of rich and multi-relational graph data. In this presentation, I will describe some common collective inference patterns needed for graph data including: collective classification (predicting missing labels for nodes in a network), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe three key capabilities required: relational feature construction, collective inference, and scaling. Finally, I briefly describe some of the cutting edge analytic tools being developed within the machine learning, AI, and database communities to address these challenges.
Lorien Pratt, Cofounder/Chief Scientist, Quantellia
Pratt is co-founder and chief scientist of Mountain View-based Quantellia, which offers data, analytics, and decision intelligence software and services worldwide. Pratt previously served as global director of telecommunications research for Stratecast (a division of Frost & Sullivan) and also worked at Bellcore and IBM. A graduate of Dartmouth College and Rutgers University, she holds three degrees in computer science, and served on the computer science faculty at the Colorado School of Mines. A recipient of the CAREER award from the National Science Foundation, and the author of dozens of technical papers and articles, Pratt is also a well-known speaker, author, and co-editor (with Sebastian Thrun) of the book Learning to Learn.
Decision Intelligence is an emerging discipline that unifies machine learning, complex systems, predictive analytics, causal reasoning, optimization, and more into a unified framework that overcomes limitations of the current data stack that are faced by organizations worldwide. Just as the Unified Modeling Language (UML), along with associated tool companies like Rational, brought the discipline of design to software development, decision intelligence is a methodology, supported by software, that overcomes a number of barriers that have limited the practical use cases of the analytic / data stack. In particular, Decision Intelligence brings engineering practices to decision making, treating the “decision” as an engineered artifact. This means that best practices from design, agile development, and more can now be used to evolve decisions over time, creating a continuous “organizational learning” framework in diverse settings such as the US government and transnational corporations.
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel
Ted Willke leads the Graph Analytics Operation within Intel’s Datacenter Group, which designs, develops, and deploys enterprise software for distributed parallel machine learning and data mining. He developed his expertise in datacenter systems over his 16 years with Intel. He has researched cluster computing technologies in Intel Labs and developed server technologies and standards within Intel’s product organizations. His work covers high-performance I/O, virtualization, next-gen microservers, Hadoop optimization tools, and open source libraries for distributed parallel computing. Ted holds a Doctorate in electrical engineering from Columbia University, where he graduated with Distinction. He has authored over 25 papers in book chapters, journals, and conferences, and he holds 10 patents. He won the MASCOTS Best Paper Award in 2013 for his work on Hadoop MapReduce performance modeling and an Intel Achievement Award this year for his work on graph processing systems.
How graphs became just another big data primitive: Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.
Oscar Celma, Director of Research, Pandora
Òscar Celma is currently Director of Research at Pandora, where he leads a team of scientists to provide the best personalized radio experience. From 2011 till 2014 Òscar was Senior Research Scientist at Gracenote. His work focused on music and video recommendation and discovery. Before that he was co-founder and Chief Innovation Officer at Barcelona Music and Audio Technologies (BMAT). Òscar published a book named “Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space” (Springer, 2010). In 2008, Òscar obtained his Ph.D. in Computer Science and Digital Communication, in the Pompeu Fabra University (Barcelona, Spain). He holds a few patents from his work on music discovery as well as on Vocaloid, a singing voice-synthesizer bought by Yamaha in 2004.
Pandora internet radio is best known for the Music Genome Project; the most unique and richly labeled music catalog of 1.5 million+ tracks. While this content-based approach to music recommendation is extremely effective and still used today as the foundation to the leading online radio service, Pandora has also collected more than a decade of contextual listener feedback in the form of more than 45 billion thumbs from 76M+ monthly active users who have created more than 6 billion stations. This session will look at how the interdisciplinary team at Pandora goes about making sense of these massive data sets to successfully make large scale music recommendations to the masses.
Following this session the audience will have an in-depth understanding of how Pandora uses Big Data science to determine the perfect balance of familiarity, discovery, repetition and relevance for each individual listener, measures and evaluates user satisfaction and how our online and offline architecture stack plays a critical role in our success.
View the slides for this presentation »
Watch this presentation on YouTube »
Quoc Le, Software Engineer, Google
Quoc Le is software engineer at Google and will become an assistant professor at Carnegie Mellon University in Fall 2014. At Google, Quoc works on large scale brain simulation using unsupervised feature learning and deep learning. His work focuses on object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honours and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics. Quoc won best paper award as ECML 2007.
Deep Learning for Language Understanding: Many current language understanding algorithms rely on expert knowledge to engineer models and features. In this talk, I will discuss how to use Deep Learning to understand texts without much prior knowledge. In particular, our algorithms will learn the vector representations of words. These vector representations can be used to solve word analogy or translate unknown words between languages. Our algorithms also learn vector representations of sentences and documents. These vector representations preserve the semantics of sentences and documents and therefore can be used for machine translation, text classification, information retrieval and sentiment analysis.
Arno Candel, Physicist & Hacker, 0xData
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
Distributed Deep Learning for Classification and Regression problems using H2O: Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Ameet Talwalkar, assistant professor of Computer Science, UCLA
Ameet Talwalkar is an assistant professor of Computer Science at UCLA and a technical advisor for Databricks. His research addresses scalability and ease-of-use issues in the field of statistical machine learning, with applications in computational genomics. He started the MLlib project in Apache Spark and is a co-author of the graduate-level textbook ‘Foundations of Machine Learning’ (2012, MIT Press). Prior to UCLA, he was an NSF post-doctoral fellow in the AMPLab at UC Berkeley. He obtained a B.S. from Yale University and a Ph.D. from the Courant Institute at NYU.
Model search at scale: Apache Spark’s MLlib is a terrific library for fitting large-scale machine learning models. However, translating high-level problem statements like “learn a classifier” into a working model presently requires significant manual effort (via ad hoc parameter tuning) and computational resources (to fit several models). We present our work on the MLbase optimizer – a system designed on top of Spark to quickly and automatically search through a hyperparameter space and find a good model. By leveraging performance enhancements, better search algorithms, and statistical heuristics, our system offers an order of magnitude speedup over standard methods.
Johann Schleier-Smith, Co-Founder and CTO, if(we)
Johann Schleier-Smith is Co-Founder and CTO at if(we), the social network for meeting new people. Under Johann’s leadership, if(we) has produced highly scalable web and mobile products with its platform supporting 300 million users in over 200 countries. With an interest in machine learning, data science, analytics and software development and a passion for recommender systems, he works closely with teams to solve hard science problems, while meeting the trends of 21st century social life, adapting cutting-edge academic work to internet-size and internet-speed applications. Johann holds an A.B. in Physics and Mathematics from Harvard University and pursued a Ph.D. in Physics at Stanford for several years, before leaving to fully focus on if(we).
Agile Machine Learning for Recommender Systems: What can data scientists and machine learning engineers learn from software developers? When it comes to process and tools, and managing complexity, the answer is: quite a bit. When we first started to deploy machine learning at if(we), it felt like we hit a speed bump in the middle of the highway. Accustomed to shipping software to millions of members multiple times a day, to constantly iterating toward better products, we were stunned at how long it took us to try new ideas using available machine learning tools. I will share what what we’ve learned from applying agile software development principles to building recommender systems, describing the tools and platforms that allow us to go from new ideas to proven product improvements in just a few days.
Andy Feng, Distinguished Architect, Yahoo
Andy Feng is a Distinguished Architect at Yahoo leading the architecture and design of nextgen Big Data platforms as well as machine learning initiatives. He is a PPMC member and commiter of the Apache Storm project and a contributor to the Apache Spark project. He served as a track chair and program committee member at Hadoop Summit and Spark Summit in both 2013 and 2014. At Yahoo, he has also architected major platforms for personalization, ads serving, NoSQL, serving containers and messaging infrastructure. Prior to Yahoo, Andy served as Chief Architect at Netsape/AOL and Principal Scientist at Xerox.
Scalable Machine Learning at Yahoo: Yahoo scientists have developed variety of machine learning libraries (supervised learning, unsupervised learning, deep learning) for online search, advertising and personalization. The emerging business needs require us to address 2 problems:
Can we apply these libraries against massive datasets (billions of training examples, and millions of features) using commodity hardware clusters? Can we reduce the learning time from days to minutes or seconds? We have thus examined system architecture options (including Hadoop, Spark and Storm), and developed a fault-tolerant MPI solution that allows hundreds of machines to jointly build a model. We are collaborating with open source community for a better system architecture for next-gen machine learning applications. Yahoo ML libraries are being revised for much better scalability and latency. In the talk, we will share system architecture of our ML platform and its use cases.
Ted Dunning, Chief Application Architect, MapR
Ted Dunning is Chief Application Architect at MapR and has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the world’s most advanced identity theft detection system, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 24 issued and numerous pending patents and contributes to Apache Mahout, Zookeeper and Drill™. He is also a mentor for Apache Spark, Storm, DataFu and Stratosphere.
Near real-time Updates for Cooccurrence-based Recommenders: Most recommendation algorithms are inherently batch oriented and require all relevant history to be processed. In some contexts such as music, this does not cause significant problems because waiting a day or three before recommendations are available for new items doesn’t significantly change their impact. In other contexts, the value of items drops precipitously with time so that recommending day-old items has little value to users.
In this talk, I will describe how a large-scale multi-modal cooccurrence recommender can be extended to include near real-time updates. In addition, I will show how these real-time updates are compatible with delivery of recommendations via search engines.
Anthony Bak, Principal Data Scientist and Mathematician, Ayasdi
Anthony Bak is a principal research scientist at Ayasdi where he designs machine learning and analytic solutions. Prior to Ayasdi he was a postdoc with Ayasdi co-founder Gunnar Carlsson in the Stanford University Mathematics Department. His PhD is on connections between algebraic geometry and string theory.
Topological Learning with Ayasdi: Ayasdi has a unique approach to machine learning and data analysis using topology. This framework represents a revolutionary way to look at and understand data that is orthogonal but complementary to traditional machine learning and statistical tools. In this presentation I will show you what is meant by this statement: How does topology help with data analysis? Why would you use topology? I will illustrate with both synthetic examples and problems we’ve solved for our clients.
Scott Clark, Software Engineer, Yelp
After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I’ve been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.
Introducing the Metric Optimization Engine (MOE); an open source, black box, Bayesian Global Optimization engine for optimal experimental design: In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system’s click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.