MLconf 2014 Parc 55 Wyndham, San Francisco. Friday, November 14th
MLconf was created to host the thought leaders in Machine Learning and Data Science to discuss their most recent experience with applying techniques, tools, algorithms and methodologies to the seemingly impossible problems that occur when dealing with massive and noisy data. MLconf is independent of any outside company or university – it’s simply a conference organized to gather the Machine Learning communities in various cities to share knowledge and create an environment for the community to coalesce.
Andy Feng, Distinguished Architect, Yahoo
Abstract: Scalable Machine Learning at Yahoo
Yahoo scientists have developed variety of machine learning libraries (supervised learning, unsupervised learning, deep learning) for online search, advertising and personalization. The emerging business needs require us to address 2 problems:
- Can we apply these libraries against massive datasets (billions of training examples, and millions of features) using commodity hardware clusters?
- Can we reduce the learning time from days to minutes or seconds?
We have thus examined system architecture options (including Hadoop, Spark and Storm), and developed a fault-tolerant MPI solution that allows hundreds of machines to jointly build a model. We are collaborating with open source community for a better system architecture for next-gen machine learning applications. Yahoo ML libraries are being revised for much better scalability and latency. In the talk, we will share system architecture of our ML platform and its use cases.
Andy Feng is a Distinguished Architect at Yahoo leading the architecture and design of nextgen Big Data platforms as well as machine learning initiatives. He is a PPMC member and commiter of the Apache Storm project and a contributor to the Apache Spark project. He served as a track chair and program committee member at Hadoop Summit and Spark Summit in both 2013 and 2014. At Yahoo, he has also architected major platforms for personalization, ads serving, NoSQL, serving containers and messaging infrastructure. Prior to Yahoo, Andy served as Chief Architect at Netsape/AOL and Principal Scientist at Xerox.
Oscar Celma, Director of Research, Pandora
Òscar Celma is currently Director of Research at Pandora, where he leads a team of scientists to provide the best personalized radio experience. From 2011 till 2014 Òscar was Senior Research Scientist at Gracenote. His work focused on music and video recommendation and discovery. Before that he was co-founder and Chief Innovation Officer at Barcelona Music and Audio Technologies (BMAT). Òscar published a book named “Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space” (Springer, 2010). In 2008, Òscar obtained his Ph.D. in Computer Science and Digital Communication, in the Pompeu Fabra University (Barcelona, Spain). He holds a few patents from his work on music discovery as well as on Vocaloid, a singing voice-synthesizer bought by Yamaha in 2004.
Xavier Amatriain, Director of Algorithms Engineering, Netflix
Xavier Amatriain (PhD) is Director of Algorithms Engineering at Netflix. He leads a team of researchers and engineers designing the next wave of machine learning approaches to power the Netflix product. Previous to this, he was a Researcher in Recommender Systems, and neighboring areas such as Data Mining, Machine Learning, Information Retrieval, and Multimedia. He has authored more than 50 papers including book chapters, journals, and articles in international conferences. He has also lectured in different universities including the University of California Santa Barbara and UPF in Barcelona, Spain.
Steffen Rendle, Research Scientist, Google
Title: Factorization Machines
Steffen Rendle is a research scientist at Google. Previous to this, he was an assistant professor at the University of Konstanz, Germany. Steffen’s research interest is in large-scale machine learning using factorization models. His research received the best paper award at WWW 2010 and a best student paper award at WSDM 2010. Steffen has applied his research in various machine learning competitions, receiving awards at the ECML Discovery Challenges 2009 & 2013, both tasks in KDDCup 2012 and other contests.
Arno Candel, Physicist & Hacker, 0xData
Title: Distributed Deep Learning for Classification and Regression problems using H2O
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
Johann Schleier-Smith, Co-Founder and CTO, Tagged
Abstract: Agile Machine Learning for Recommender Systems
What can data scientists and machine learning engineers learn from software developers? When it comes to process and tools, and managing complexity, the answer is: quite a bit. When we first started to deploy machine learning at Tagged, it felt like we hit a speed bump in the middle of the highway. Accustomed to shipping software to millions of members multiple times a day, to constantly iterating toward better products, we were stunned at how long it took us to try new ideas using available machine learning tools. I will share what what we’ve learned from applying agile software development principles to building recommender systems, describing the tools and platforms that allow us to go from new ideas to proven product improvements in just a few days.
Johann Schleier-Smith is Co-Founder and CTO at Tagged, the social network for meeting new people. Under Johann’s leadership, Tagged has produced highly scalable web and mobile products with its platform supporting 300 million users in over 200 countries. With an interest in machine learning, data science, analytics and software development and a passion for recommender systems, he works closely with teams to solve hard science problems, while meeting the trends of 21st century social life, adapting cutting-edge academic work to internet-size and internet-speed applications. Johann holds an A.B. in Physics and Mathematics from Harvard University and pursued a Ph.D. in Physics at Stanford for several years, before leaving to fully focus on Tagged.
Quoc Le, Software Engineer, Google
Quoc Le is software engineer at Google and will become an assistant professor at Carnegie Mellon University in Fall 2014. At Google, Quoc works on large scale brain simulation using unsupervised feature learning and deep learning. His work focuses on object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honours and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics. Quoc won best paper award as ECML 2007.
Lorien Pratt, Cofounder/Chief Scientist, Quantellia
Decision Intelligence is an emerging discipline that unifies machine learning, complex systems, predictive analytics, causal reasoning, optimization, and more into a unified framework that overcomes limitations of the current data stack that are faced by organizations worldwide. Just as the Unified Modeling Language (UML), along with associated tool companies like Rational, brought the discipline of design to software development, decision intelligence is a methodology, supported by software, that overcomes a number of barriers that have limited the practical use cases of the analytic / data stack. In particular, Decision Intelligence brings engineering practices to decision making, treating the “decision” as an engineered artifact. This means that best practices from design, agile development, and more can now be used to evolve decisions over time, creating a continuous “organizational learning” framework in diverse settings such as the US government and transnational corporations.
Pratt is co-founder and chief scientist of Mountain View-based Quantellia, which offers data, analytics, and decision intelligence software and services worldwide. Pratt previously served as global director of telecommunications research for Stratecast (a division of Frost & Sullivan) and also worked at Bellcore and IBM. A graduate of Dartmouth College and Rutgers University, she holds three degrees in computer science, and served on the computer science faculty at the Colorado School of Mines. A recipient of the CAREER award from the National Science Foundation, and the author of dozens of technical papers and articles, Pratt is also a well-known speaker, author, and co-editor (with Sebastian Thrun) of the book Learning to Learn.
Pinar Donmez, Chief Data Scientist, Kabbage
Pinar is a data scientist with a keen focus on finding meaning in data, and turn the insights into data-driven businesses. As the Chief Data Scientist at Kabbage, she is passionately transforming raw data into valuable knowledge to improve underwriting for SMBs and help them succeed. She is leading a world-class team of data scientists in turning unusually rich data sources on how a business operates into predictive systems that can determine the risk, capacity, and character of the business. Kabbage uses this data-driven technology to lend over $1million per day. Prior to Kabbage, she applied her machine learning skills to attrition prediction and sales propensity estimation at Salesforce’s Data and Analytics team, and user intention understanding through her work at Yahoo! Labs which led to patents and numerous publications. Pinar holds a Ph.D. In Computer Science from CMU, where her main interest lied in machine learning and its applications on active and unsupervised learning. She has published numerous articles in top-tier peer-reviewed ML journals and conferences such as JMLR, ICML, KDD to name a few.
Anthony Bak, Principal Data Scientist and Mathematician, Ayasdi
Abstract: Topological Learning with Ayasdi
Ayasdi has a unique approach to machine learning and data analysis using topology. This framework represents a revolutionary way to look at and understand data that is orthogonal but complementary to traditional machine learning and statistical tools. In this presentation I will show you what is meant by this statement: How does topology help with data analysis? Why would you use topology? I will illustrate with both synthetic examples and problems we’ve solved for our clients.
Anthony Bak is a principal research scientist at Ayasdi where he designs machine learning and analytic solutions. Prior to Ayasdi he was a postdoc with Ayasdi co-founder Gunnar Carlsson in the Stanford University Mathematics Department. His PhD is on connections between algebraic geometry and string theory.
Scott Clark, Software Engineer, Yelp
Abstract: Introducing the Metric Optimization Engine (MOE); an open source, black box, Bayesian Global Optimization engine for optimal experimental design.
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system’s click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
After finishing my PhD in Applied Mathematics at Cornell University in 2012 I have been working on the Ad Targeting team at Yelp Inc. I’ve been employing a variety of machine learning and optimization techniques from multi-armed bandits to Bayesian Global Optimization and beyond to their vast dataset and problems. I have also been trying to lead the charge on academic research and outreach within Yelp by leading projects like the Yelp Dataset Challenge and open sourcing MOE.
Ameet Talwalkar, assistant professor of Computer Science, UCLA
Ameet Talwalkar is an assistant professor of Computer Science at UCLA and a technical advisor for Databricks. His research addresses scalability and ease-of-use issues in the field of statistical machine learning, with applications in computational genomics. He started the MLlib project in Apache Spark and is a co-author of the graduate-level textbook ‘Foundations of Machine Learning’ (2012, MIT Press). Prior to UCLA, he was an NSF post-doctoral fellow in the AMPLab at UC Berkeley. He obtained a B.S. from Yale University and a Ph.D. from the Courant Institute at NYU.
Ted Dunning, Chief Application Architect, MapR
Ted Dunning is Chief Application Architect at MapR and has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the world’s most advanced identity theft detection system, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 24 issued and numerous pending patents and contributes to Apache Mahout, Zookeeper and Drill™. He is also a mentor for Apache Spark, Storm, DataFu and Stratosphere.
Tamara Kolda, Distinguished Member of Technical Staff, Sandia National Laboratories
Title: Tensor Analysis for Networks and Sparse Data
Tensors are higher-order or n-way arrays. They have proven useful in a wide variety of data analysis tasks in applications ranging from chemometrics to sociology to neuroscience, and much more. We consider the utility of canonical polyadic (aka CANDECOMP or PARAFAC) tensor decompositions and briefly survey. Tensors are useful for analyzing large-scale networks with attributed connections. For instance, a time-evolving network can be naturally expressed as a third-order tensor. We explore the applicability of tensor analysis, its connection to matrix-based methods, different statistical assumptions and corresponding optimization objective functions, and how to efficiently handle spares data. We illustrate the utility of tensor decompositions with several examples.
Tamara Kolda is a Distinguished Member of Technical at Sandia National Laboratories in Livermore, California, where she works on a broad range of problems including network modeling and analysis, multilinear algebra and tensor decompositions, data mining, and cybersecurity. She has also worked in optimization, nonlinear solvers, parallel computing, and the design of scientific software. She has authored numerous software packages, including the well-known Tensor Toolbox for MATLAB. Before joining Sandia, Kolda held the Householder Postdoctoral Fellowship in Scientific Computing at Oak Ridge National Laboratory. She has received several awards including a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE), two best papers awards (ICDM’08 and SDM’13), and Distinguished Member of the Association for Computing Machinery (ACM). She is an elected member of the Society for Industrial and Applied Mathematics (SIAM) Board of Trustees, Section Editor for the Software and High Performance Computing section of the SIAM Journal on Scientific Computing, and Associate Editor for SIAM Journal on Matrix Analysis. She received her Ph.D. in applied mathematics from the University of Maryland at College Park in 1997.
Ted Willke, Senior Principal Engineer and Director, Datacenter Software Division, Intel
Ted Willke is the Director of Architecture & Technology for the Datacenter Software Division (DSD) at Intel Corporation. As the division’s “chief technology officer,” Ted is responsible for driving innovation into its big data, cloud computing, and HPC software products. Ted joined DSD this year after transforming his Intel Labs research on large-scale machine learning and data mining systems into a commercial operation. Prior to joining Intel Labs in 2010, Ted spent 12 years developing server I/O technologies and standards within Intel’s other product organizations. He holds a Doctorate in electrical engineering from Columbia University, where he graduated with Distinction. He has authored over 25 papers in book chapters, journals, and conferences, and he holds 10 patents. He won the MASCOTS Best Paper Award in 2013 for his work on Hadoop MapReduce performance modeling and an Intel Achievement Award this year for his work on graph processing systems.