Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon University
Alex Smola is a Professor in the Machine Learning Department of Carnegie Mellon University and cofounder and CEO of Marianas Labs. Prior to that he worked at Google Strategic Technologies, Yahoo Research, and National ICT Australia. Prior to joining CMU he was professor at UC Berkeley and the Australian National University. Alex obtained his PhD at TU Berlin in 1998. He has published over 200 papers and written or coauthored 5 books.
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
Braxton McKee, CEO & Founder, Ufora
Braxton is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.
Is Machine Learning Code for 100 Rows or a Billion the Same?: We have built an automatically distributed, implicitly parallel data science platform for running large scale machine learning applications. By abstracting away the computer science required to scale machine learning models, The Ufora platform lets data scientists focus on building data science models in simple scripting code, without having to worry about building large-scale distributed systems, their race conditions, fault-tolerance, etc. This automatic approach requires solving some interesting challenges, like optimal data layout for different ML models. For example, when a data scientist says “do a linear regression on this 100GB dataset”, Ufora needs to figure out how to automatically distribute and lay out that data across a cluster of machines in the cluster in order to minimize travel over the wire. Running a GBM against the same dataset might require a completely different layout of that data. This talk will cover how the platform works, in terms of data and thread distribution, how it generates parallel processes out of single-threaded programs, and more.
Isabelle Guyon, President at ChaLearn
Isabelle Guyon is an independent consultant, specialized in statistical data analysis, pattern recognition and machine learning. Her areas of expertise include computer vision and and bioinformatics. Her recent interest is in applications of machine learning to the discovery of causal relationships. Prior to starting her consulting practice in 1996, Isabelle Guyon was a researcher at AT&T Bell Laboratories, where she pioneered applications of neural networks to pen computer interfaces and co-invented Support Vector Machines (SVM), a machine learning technique, which has become a textbook method. She is also the primary inventor of SVM-RFE, a variable selection technique based on SVM. The SVM-RFE paper has thousands of citations and is often used as a reference method against which new feature selection methods are benchmarked. She also authored a seminal paper on feature selection that received thousands of citations. She organized many challenges in Machine Learning over the past few years supported by the EU network Pascal2, NSF, and DARPA, with prizes sponsored by Microsoft, Google, and Texas Instrument. Isabelle Guyon holds a Ph.D. degree in Physical Sciences of the University Pierre and Marie Curie, Paris, France. She is president of Chalearn, a non-profit dedicated to organizing challenges, vice-president of the Unipen foundation, adjunct professor at New-York University, action editor of the Journal of Machine Learning Research, and editor of the Challenges in Machine Learning book series of Microtome.
Network Reconstruction: The Contribution of Challenges in Machine Learning: Networks of influence are found at all levels of physical, biological, and societal systems: climate networks, gene networks, neural networks, and social networks are a few examples. These networks are not just descriptive of the “State of Nature”, they allow us to make predictions such as forecasting disruptive weather patterns, evaluating the possible effect of a drug, locating the focus of a neural seizure, and predicting the propagation of epidemics. This, in turns, allows us to device adequate interventions or change in policies to obtain desired outcomes: evacuate people before a region is hit by a hurricane, administer treatment, vaccinate, etc. But knowing the network structure is a prerequisite, and this structure may be very hard and costly to obtain with traditional means. For example, the medical community relies on clinical trials, which cost millions of dollars; the neuroscience community engages in connection tracing with election microscopy, which take years before establishing the connectivity of 100 neurons (the brain contains billions).
This presentation will review recent progresses that have been made in network reconstruction methods based solely on observational data. Great advances have been recently made using machine learning. We will analyze the results of several challenges we organized, which point us to new simple and practical methodologies.
Quoc Le, Software Engineer, Google
Quoc Le is software engineer at Google and will become an assistant professor at Carnegie Mellon University in Fall 2014. At Google, Quoc works on large scale brain simulation using unsupervised feature learning and deep learning. His work focuses on object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honours and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics. Quoc won best paper award as ECML 2007.
Deep Learning: Overview and Latest Results: Deep Learning has many important progress in speech recognition and machine vision in the past few years.
However, in text understanding, Deep Learning is still facing many challenges. I will talk about our latest work in sequence-to-sequence learning to address this challenge. The model can now be used for translation, conversation modelling, speech recognition, and many other tasks.
Irina Rish, Research Staff, IBM T.J. Watson Research Center
Irina Rish is a research staff member at the IBM T.J. Watson Research Center. She received MS in Applied Mathematics from Moscow Gubkin Institute, Russia, and PhD in Computer Science from the University of California, Irvine. Her areas of expertise include artificial intelligence and machine learning, with a particular focus on probabilistic graphical models such as Bayesian and Markov networks, sparsity and compressed sensing, information-theoretic experiment design and active learning, with numerous applications ranging from diagnosis and performance management of distributed computer systems (“autonomic computing”) to predictive modeling and statistical biomarker discovery in neuroimaging (functional MRI and EEG) and other biological data. Irina has published over 50 papers, several book chapters, two edited books, and a monograph on Sparse Modeling, taught several tutorials and organized numerous workshops at top machine-learning conferences, such as NIPS, ICML and ECML. She holds 24 patents and several IBM awards, including IBM Technical Excellence award, IBM Technical Accomplishment award, and multiple Invention Achievement Awards. Also, as an adjunct professor at the EE Department of Columbia University, she taught several advanced graduate courses on statistical learning and sparse signal modeling.
Learning About Brain: Sparse Modeling and Beyond: Sparse modeling is a rapidly developing area at the intersection of statistical learning and signal processing, motivated by the age-old statistical problem of finding a relatively small subset of ”important” variables in high-dimensional datasets. Variable selection is particularly important for improving the interpretability of predictive models in scientific applications such as computational biology and neuroscience, where the main objective is to gain a better insight into functioning of a biological system, besides just learning ”black-box” predictors. Moreover, variable selection provides an effective way of avoiding the “curse of dimensionality” as it helps to prevent overfitting and reduce computational complexity in high-dimensional but relatively small-sample datasets, such as, for example, functional MRI (fMRI), where the number of variables (brain voxels) can range from 10 to 100 thousands, while the number of samples is typically limited to several hundreds.
In this talk, I will summarize our work on sparse models and other machine-learning approaches to ”brain decoding” (aka ”mind reading”), i.e. to prediction of mental states from functional MRI data, in a wide range of applications, from analyzing pain perception to discovering predictive patterns of brain activity associated with schizophrenia and cocaine addiction. I will mention several lessons learned from those applications that can hopefully generalize to other practical machine-learning problems. Finally, I will briefly discuss our recent project that focuses on inferring mental states from ”cheap” (unlike fMRI), easily collected data, such as speech and wearable sensors, with applications ranging from clinical settings (”computational psychiatry”) to everyday life (”augmented human”).
Allison Gilmore, Data Scientist, Ayasdi
Dr. Gilmore is currently a data scientist on the team at Ayasdi where she specializes in highly complex and dimensional data across a variety of industries. Prior to joining Ayasdi, Allison served as a National Science Foundation Post-Doctoral Fellow and an Assistant Adjunct Professor in mathematics at the University of California Los Angeles. Dr. Gilmore also did post-doctoral research at Princeton University. She received her Ph.D. in mathematics from Columbia University in New York in May 2011.
Allison completed her undergraduate and masters degrees from Washington University where she was selected as a Rhodes Scholar. She studied at Green College, Oxford University, and graduated in 2006 with an M.Phil. (with distinction) in sociology.
Her research interests include topology, geometry, network analysis and social movements. Dr. Gilmore serves on the board of The Friends of the Mandela Rhodes Foundation whose mission is to fund the development of exceptional leadership capacity in southern Africa.
A Role for Topology in Data Science: The mathematical discipline of topology offers a new approach to data analysis that is especially important in today’s world of complex, high-dimensional, noisy data. Topological methods extend and enhance traditional machine learning techniques to enable more nuanced data exploration, more refined segmentation, and more effective modeling. This talk will describe a topological method for detecting the underlying shape of any dataset, then give examples applying this technique in practice.
Subutai Ahmad, VP of Research, Numenta
Subutai Ahmad is the VP of Research at Numenta, a company focused on Machine Intelligence. Our technology, Hierarchical Temporal Memory (HTM), is a detailed computational framework based on principles of the brain. Our HTM learning algorithms are available through the NuPIC open source community and are embedded in our commercial streaming analytics applications.
Subutai’s experience includes computational neuroscience, machine learning, computer vision and building real time commercial systems. He has previously served as VP Engineering at YesVideo where he helped grow the company from a three-person start-up to a leader in automated digital media authoring. In 1997, Subutai co-founded ePlanet Interactive, a spin-off from Interval Research. ePlanet developed the IntelPlay Me2Cam, the first computer vision product developed for consumers. Subutai holds a B.S. in Computer Science from Cornell, and a Ph.D in Computer Science from the University of Illinois at Urbana-Champaign.
Real-time Anomaly Detection for Real-time Data Needs: Much of the world’s data is becoming streaming, time-series data, where anomalies give significant information in often-critical situations. Examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. Are there algorithms up for the challenge? Which are the most capable? The Numenta Anomaly Detection Benchmark (NAB) attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. These characteristics are formalized in NAB, using a custom scoring algorithm to evaluate the detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and describe the end-to-end scoring process. We give results and analyses for several algorithms to illustrate NAB in action. The goal for NAB is to provide a standard, open-source framework for which we can compare and evaluate different algorithms for detecting anomalies in streaming data.
Xavier Amatriain, VP of Engineering, Quora
Xavier Amatriain is VP of Engineering at Quora, where he leads the team building the best source of knowledge in the Internet. With over 50 publications in different fields, Xavier is best known for his work on Machine Learning in general, and Recommender Systems in particular. Before Quora, he was Research/Engineering Director at Netflix, where he lead the team building the famous Netflix Recommendation algorithms. Previously, Xavier was also Research Scientist at Telefonica Research and Research Director at UCSB. He has also lectured at different universities both in the US and Spain and is frequently invited as a speaker at conferences and companies.
10 More Lessons Learned from Building Real-Life ML Systems: A year ago I presented a collection of 10 lessons in MLConf. These goal of the presentation was to highlight some of the practical issues that ML practitioners encounter in the field, many of which are not included in traditional textbooks and courses. The original 10 lessons included some related to issues such as feature complexity, sampling, regularization, distributing/parallelizing algorithms, or how to think about offline vs. online computation.
Since that presentation and associated material was published, I have been asked to complement it with more/newer material. In this talk I will present 10 new lessons that not only build upon the original ones, but also relate to my recent experiences at Quora. I will talk about the importance of metrics, training data, and debuggability of ML systems. I will also describe how to combine supervised and non-supervised approaches or the role of ensembles in practical ML systems.
Ben Hamner, Co-founder and CTO, Kaggle
Ben Hamner is Kaggle’s co-founder and CTO. At Kaggle, he currently’s focused on creating tools that empower data scientists to frictionlessly collaborate on analytics and promote their results. He has worked with machine learning across many domains, including natural language processing, computer vision, web classification, and neuroscience. Prior to Kaggle, Ben applied machine learning to improve brain-computer interfaces as a Whitaker Fellow at the École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland. He graduated with a BSE in Biomedical Engineering, Electrical Engineering, and Math from Duke University.
Lessons learned from Running Hundreds of Kaggle Competitions: At Kaggle, we’ve run hundreds of machine learning competitions and seen over 80,000 data scientists make submissions. One thing is clear: winning competitions isn’t random. We’ve learned that certain tools and methodologies work consistently well on different types of problems. Many participants make common mistakes (such as overfitting) that should be actively avoided. Similarly, competition hosts have their own set of pitfalls (such as data leakage).
In this talk, I’ll share what goes into a winning competition toolkit along with some war stories on what to avoid. Additionally, I’ll share what we’re seeing on the collaborative side of competitions. Our community is showing an increasing amount of collaboration in developing machine learning models and analytic solutions. I’ll showcase examples of this and discuss how these types of collaboration will improve how data science is learned and applied.
Justin Basilico, Research/ Engineering Manager at Netflix
Justin Basilico is a Research/Engineering Manager for Page Algorithms Engineering at Netflix. He leads an applied research team focused on developing the next generation of algorithms used to generate the Netflix homepage through machine learning, ranking, recommendation, and large-scale software engineering. Prior to Netflix, he worked on machine learning in the Cognitive Systems group at Sandia National Laboratories. He is also the co-creator of the Cognitive Foundry, an open-source software library for building machine learning algorithms and applications.
Recommendations for Building Machine Learning Software: Building a real system that uses machine learning can be a difficult both in terms of the algorithmic and engineering challenges involved. In this talk, I will focus on the engineering side and discuss some of the practical lessons we’ve learned from years of developing the machine learning systems that power Netflix. I will go over what it takes to get machine learning working in a real-life feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. This involves lessons around challenges such as where to place algorithmic components, how to handle distribution and parallelism, what kinds of modularity are useful, how to support both production experimentation, and how to test machine learning systems.
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix
Brad Klingenberg is the Director of Styling Algorithms at Stitch Fix in San Francisco. His team uses data and algorithms to improve the selection of merchandise sent to clients. Prior to joining Stitch Fix Brad worked with data and predictive algorithms at financial and technology companies. He studied applied mathematics at the University of Colorado at Boulder and earned his PhD in Statistics at Stanford University in 2012.
Combining Statistics and Expert Human Judgment for Better Recommendations: Most algorithmic recommendation engines target the consumer directly. Combining these recommendation algorithms with expert human selection and curation can make them more effective. But it also makes things more complicated. In this talk I’ll share lessons from combining statistics and human judgement for personal styling recommendations at Stitch Fix, where we are committed to our recommendations through the physical delivery of merchandise to clients. I’ll discuss both statistical and practical challenges of machine learning with humans in the loop: training with selection bias, making predictions for human consumption and measuring success.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
Tensor Methods – A New Paradigm for Training Probabilistic Models, Neural Networks and Reinforcement Learning:
Alessandro Magnani, Data Scientist, @WalmartLabs
Alessandro Magnani received his Ph.D. and M.S. from Stanford University in Electrical Engineering. He is currently a data scientist at @WalmartLabs, working on product classification and attribute extraction. Prior to that, he was a research scientist at Adchemy where he worked on optimization algorithms for online advertising. His current interests are machine learning, large scale computing and natural language processing.
Classification Labels in a Fast Moving Environment: Classification problems are very common in ecommerce. Collecting and storing labels from different sources is key to train and evaluate such models.
Labels are expensive to obtain, thus selecting which products to get labels for is key to optimally use any available labeling budget, both when training and evaluating a model. At the same time, if available labels are not correctly used, incorrect or suboptimal results can be produced.
In this talk I will discuss some of the challenges and potential pitfalls of acquiring and using labels for classification in a quickly evolving environment. I will present a system that store labels, provides a way to select labels to optimize budget while providing accurate and unbias evaluations of the classification models.
Narayanan Sundaram, Research Scientist, Intel Labs
Narayanan Sundaram is a research scientist in Intel’s Parallel Computing Lab. His research interests include large scale graph analytics and machine learning. He has worked on several big data and high performance computing problems, with recent emphasis on developing fast, distributed graph analytics frameworks. He has also worked on making computer vision and machine learning algorithms computationally efficient through numerical optimizations and parallelization on multicore architectures, GPUs, and clusters. He received his PhD from the University of California at Berkeley and has authored more than 15 papers in peer-reviewed CS conferences.
GraphMat: Bridging the Productivity-Performance Gap in Graph Analytics: With increasing interest in large-scale distributed graph analytics for machine learning and data mining, more data scientists and developers are struggling to achieve high performance without sacrificing productivity on large graph problems. In this talk, I will discuss our solution to this problem: GraphMat. Using generalized sparse matrix-based primitives, we are able to achieve performance that is very close to hand-optimized native code, while allowing users to write programs using the familiar vertex-centric programming paradigm. I will show how we optimized GraphMat to achieve this performance on distributed platforms and provide programming examples. We have integrated GraphMat with Apache Spark in a manner that allows the combination to outperform all other distributed graph frameworks. I will explain the reasons for this performance and show that our approach achieves very high hardware efficiency in both single-node and distributed environments using primitives that are applicable to many machine learning and HPC problems. GraphMat is open source software and available for download.
Melanie Warrick, Deep Learning Engineer, Skymind.io
Deep Learning Engineer at Skymind.io. Previous experience included data science and engineering at Change.org and a comprehensive consulting career. I have a passion for working on machine learning problems at scale and AI.
Attention Neural Net Model Fundamentals: Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.
This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.
Ted Willke, Sr Principal Engineer, Intel
Ted Willke leads a team that researches large-scale machine learning and data mining techniques in Intel Labs. Prior to returning to the Labs this year, he led the analytics team within Intel’s Datacenter Group, which develops cloud solutions for machine learning and data mining. He developed his expertise over his 17 years with Intel. He has developed both software and hardware technologies for datacenters in Intel Labs and Intel’s product organizations. Ted holds a Doctorate in electrical engineering from Columbia University. He won Intel’s highest award last year for starting a venture focused on graph-shaped data.
Eric Battenberg, Research Scientist, Baidu Silicon Valley Artificial Intelligence Lab (SVAIL)
Eric Battenberg is a research scientist at the Baidu Silicon Valley Artificial Intelligence Lab (SVAIL) where he works on applications of deep learning to machine perception and understanding. He received his MS and PhD in Electrical Engineering and Computer Sciences from UC Berkeley where he worked on machine learning, signal processing, and parallel computing as applied to problems in automatic music understanding and processing. Previously, Eric worked at Gracenote in Emeryville, CA on problems in music classification and audio event detection. His overall research goal is to enable more natural and efficient interactions with computers so that intelligent devices can serve as a seamlessly integrated tool rather than a distraction.
Video Footage Provided By: