The 2016 Machine Learning Conference in SF took place on November 11th, 2016 at Hotel Nikko.



Rajat Monga, Engineering Director, TensorFlow, Google

Rajat Monga leads TensorFlow, an open source machine-learning library and the center of Google’s efforts at scaling up deep learning. He is one of the founding members of the Google Brain team and is interested in pushing Machine Learning research forward towards General AI. Prior to Google, as the chief architect and director of engineering at Attributor, Rajat led the labs and operations and built out the engineering team. A veteran developer, Rajat has worked at eBay, Infosys, and a number of startups.

Abstract summary

Machine Learning with TensorFlow: TensorFlow has enabled cutting-edge machine learning research at the top AI labs in the world. At the same time it has made the technology accessible to a large audience leading to some amazing uses. TensorFlow is used for classification, recommendation, text parsing, sentiment analysis and more. This talk will go over the design that makes it fast, flexible, and easy to use, and describe how we continue to make it better.

View the slides for this presentation »

Watch this presentation on YouTube »


Guy Lebanon, Director, Machine Learning and Data Science, Netflix

Guy Lebanon works on personalizing Netflix’s homepage and optimizing the assets that are used to represent different videos. Before that he lead LinkedIn’s news-feed personalization team and Amazon’s Seattle machine learning team. Prior to that Guy was a tenured professor at the Georgia Institute of Technology. Guy published several books and over 70 refereed articles in machine learning and data science, and he received a PhD from Carnegie Mellon University. He chaired the 2012 ACM CIKM conference and the 2015 AI and Statistics conference and is an action editor of Journal of Machine Learning Research. He received the NSF CAREER Award, the WWW best student paper award, the ICML best paper runner-up award, the Yahoo Faculty Research and Engagement Award, and is a Siebel Scholar.

Abstract summary

Being Smart with Art: Classical recommendation systems focus on deciding which items (movies at Netflix) to recommend to the user. A complementary problem is learning from data how to present the items to the user in an attractive way that will facilitate high-quality engagement. We will explore in this talk some aspects of this challenge at Netflix and other companies, with a focus on image selection: learning from data which images to use to represent movies.

Watch this presentation on YouTube »

Version 3

Stephanie deWet, Software Engineer, Pinterest

Stephanie deWet is a software engineer at Pinterest on the homefeed team, where she builds personalized blending models and recommendations. She earned a MS in Computer Science from the University of Wisconsin in 2014. Prior to that, she spent 4 years developing realtime simulations at L-3 Communications.

Abstract summary

Personalized Content Blending in the Pinterest Homefeed: The Pinterest Homefeed is a personalized feed of content (or “Pins”) drawn from many sources, including followed users, followed topics, and recommendations, among other sources. Each types of content is ranked by its own specialized machine learning model, and then blended with a ratio-based round robin to create the final Homefeed.

This presentation dives into how the current system evolved, and describes in depth an approach for personalizing the content blending ratio. This method uses historical user action data and models the Pin action rates of each pin type as a Bernoulli distribution. Each content type’s overall utility is modeled as a sum of the Pin action rate distributions, weighted by action-specific reward constants. I will discuss different methods for assigning blending ratios based on the utility distribution.

As we iterate on our blending systems, new questions have arisen as to how we measure success. . Unlike traditional search ranking problems, Pinterest faces both short- and long-term optimization challenges as we balance immediate user-engagement metric movements and long term ecosystem health. This talk concludes with an overview of some of the different dimensions of success we currently monitor as we continue to work on blending.

View the slides for this presentation »

Watch this presentation on YouTube »


Josh Wills, Head of Data Engineering, Slack

Josh Wills is the head of data engineering at Slack. Prior to Slack, he built and led data science teams at Cloudera and Google. He is the founder of the Apache Crunch project, co-authored an O’Reilly book on advanced analytics with Apache Spark, and wrote a popular tweet about data scientists.

Abstract summary

Several People Are Tuning: Data and Machine Learning at Slack: Slack has only been publicly available for a little over two years, but during that time we have been able to create one of the most interesting data sets in the world about how teams form, grow, and collaborate in order to get things done. Our next great challenge is to find ways to leverage what we’ve learned in order to help people find the information they need faster and to help large organizations work together more effectively.

Although our mission is broad, we are still at the very start of our journey; our machine learning team was only created six months ago, and our data team just six months before that. I would like to talk for a bit about what that journey has been like: how we hired our team, how we chose our tools and developed our initial data infrastructure and machine learning models, and how we’ve framed the product problems that we’re trying to solve.

View the slides for this presentation »

Watch this presentation on YouTube »


Daria Sorokina, Applied Scientist, (Amazon)

Daria Sorokina is a machine learning scientist at Amazon Search who enjoys working with large complex data sets. She joined (Amazon subsidiary in Palo Alto, California) in 2014 and currently she is a member of relevance features team, where she is involved in projects for browse ranking and search for digital and fashion products.
Daria graduated with a PhD from Cornell in 2008 followed by a postdoc at Auton Lab, CMU. She got introduced to search as a relevance scientist at Yandex Labs, and continued in this field as a data scientist at LinkedIn, where she developed spam detection and static rank algorithms for LinkedIn people search.
Daria is the author of Additive Groves, the best off-the-shelf machine learning algorithm for a variety of tasks. It held winning positions in multiple data mining competitions, most notably Yahoo Learning To Rank ‘2010.

Abstract summary

Amazon Search: The Joy of Ranking Products: Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. As a consequence, even small improvements in relevance ranking both positively influence the shopping experience of millions of customers and significantly impact revenue. In the past, Amazon’s product search engine consisted of several hand-tuned ranking functions using a handful of input features. A lot has changed since then.
In this talk we are going to cover a number of relevance algorithms used in Amazon Search today. We will describe a general machine learning framework used for ranking within categories, blending separate rankings in All Product Search, techniques used for matching queries and products, and algorithms targeted at unique tasks of specific categories — books and fashion.

Watch this presentation on YouTube »


Alex Smola, Machine Learning Director, Amazon

Alex Smola is the Manager of the Cloud Machine Learning Platform at Amazon. Prior to his role at Amazon, Smola was a Professor in the Machine Learning Department of Carnegie Mellon University and cofounder and CEO of Marianas Labs. Prior to that he worked at Google Strategic Technologies, Yahoo Research, and National ICT Australia. Prior to joining CMU, he was professor at UC Berkeley and the Australian National University. Alex obtained his PhD at TU Berlin in 1998. He has published over 200 papers and written or coauthored 5 books.

Abstract summary

Personalization and Scalable Deep Learning with MXNET: User return times and movie preferences are inherently time dependent. In this talk I will show how this can be accomplished efficiently using deep learning by employing an LSTM (Long Short Term Model). Moreover, I will show how to train large scale distributed parallel models using MXNet efficiently. This includes a brief overview of key components of defining networks, of optimization, and a walkthrough of the steps required to allocate machines, and to train a model.

Watch this presentation on YouTube »


Scott Clark, Co-Founder and CEO, SigOpt

Scott is a co-founder and CEO of SigOpt, providing optimization tools as a service, helping experts optimally tune their machine learning models. Scott has been applying optimal learning techniques in industry and academia for years, from bioinformatics to production advertising systems. Before SigOpt, Scott worked on the Ad Targeting team at Yelp leading the charge on academic research and outreach with projects like the Yelp Dataset Challenge and open sourcing MOE. Scott holds a PhD in Applied Mathematics and an MS in Computer Science from Cornell University and BS degrees in Mathematics, Physics, and Computational Physics from Oregon State University. Scott was chosen as one of Forbes’ 30 under 30 in 2016.

Abstract summary

Using Bayesian Optimization to Tune Machine Learning Models: In this talk we briefly introduce Bayesian Global Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. We will motivate the problem and give example applications.

We will also talk about our development of a robust benchmark suite for our algorithms including test selection, metric design, infrastructure architecture, visualization, and comparison to other standard and open source methods. We will discuss how this evaluation framework empowers our research engineers to confidently and quickly make changes to our core optimization engine.

We will end with an in-depth example of using these methods to tune the features and hyperparameters of a real world problem and give several real world applications.

View the slides for this presentation »

Watch this presentation on YouTube »


Elena Grewal, Data Science Manager, Airbnb

Elena Grewal is a Data Science Manager at Airbnb, currently leading Airbnb’s team of data scientists, analysts, and visualization specialists. The team partners with the product team to understand and optimize all parts of the product, using techniques of machine learning in a wide variety of contexts. Examples include price prediction, payments routing optimization, lead scoring for operations teams, and LTV modeling. Prior to working at Airbnb Elena completed a doctorate in education at Stanford where she estimated complex models of friendship networks in schools and then modeled the impact of friendships on educational outcomes.

Abstract summary

Before the Model: How Machine Learning Products Start, with Examples from Airbnb: Often the most important part of building a machine learning product is the formulation of the problem; the most elegant model is rendered useless without the right application and model architecture. Airbnb is an online marketplace for accommodations which has found many interesting applications for machine learning products by taking a data driven approach to investment in Machine learning products. Come hear about how the Airbnb team generates and vets ideas for machine learning products and tailors the product to business problems, with some examples of success and lessons learned along the way.

View the slides for this presentation »

Watch this presentation on YouTube »


Daniel Shank, Data Scientist, Talla

Neural Turing Machines are a landmark architecture in the field of machine learning. A differentiable version of a classic model of computation designed by Alan Turing, NTMs open up the possibility of using machine learning to learn algorithms that can access an external memory. However, more so than many other popular deep learning architectures, NTMs are notoriously difficult to implement effectively. This presentation will provide an overview of the NTM architecture as well as tips and tricks for implementation using conventional machine learning frameworks. This presentation will also describe how NTMs can be used for standard machine learning tasks, and will touch on Dynamic Neural Computers, the followup architecture which was recently published in Nature.

Abstract summary

Neural Turing Machines: Perils and Promise: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.

View the slides for this presentation »

Watch this presentation on YouTube »


Virginia Smith, Researcher, UC Berkeley

Virginia Smith is a 5th-year Ph.D. student in the EECS Department at UC Berkeley, where she works jointly with Michael I. Jordan and David Culler as a member of the AMPLab. Her research interests are in large-scale machine learning and distributed optimization. She is actively working to increase the presence of women in computer science, most recently by co-founding the Women in Technology Leadership Round Table (WiT). Virginia has won several awards and fellowships while at Berkeley, including the NSF fellowship, Google Anita Borg Memorial Scholarship, NDSEG fellowship, and Tong Leong Lim Pre-Doctoral Prize.

Abstract summary

A General Framework for Communication-Efficient Distributed Optimization: Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In light of this, we propose a general framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. Our framework enjoys strong convergence guarantees and exhibits state-of-the-art empirical performance in the distributed setting. We demonstrate this performance with extensive experiments in Apache Spark, achieving speedups of up to 50x compared to leading distributed methods on common machine learning objectives.

View the slides for this presentation »

Watch this presentation on YouTube »


Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineering, University of Texas at Austin

Alex Dimakis is an Associate Professor in the Electrical & Computer Engineering department at The University of Texas at Austin. Prof. Dimakis received his Ph.D. in 2008 and M.S. degree in 2005 in electrical engineering and computer sciences from UC Berkeley and the Diploma degree from the National Technical University of Athens in 2003. During 2009 he was a CMI postdoctoral scholar at Caltech. He received an NSF Career award in 2011, a Google faculty research award in 2012 and the Eli Jury dissertation award in 2008. He is the co-recipient of several best paper awards including the joint Information Theory and Communications Society Best Paper Award in 2012.

Abstract summary:

A Friendly Introduction To Causality: Causality has been studied under several frameworks in statistics and artificial intelligence. We will briefly survey Pearl’s Structural Equation model and explain how interventions can be used to discover causality. We will also present a novel information theoretic framework for discovering causal directions from observational data when interventions are not possible. The starting point is conditional independence in joint probability distributions and no prior knowledge on causal inference is required.

View the slides for this presentation »

Watch this presentation on YouTube »


Anjuli Kannan, Software Engineer, Google

Anjuli Kannan is a software engineer on the Google Brain team. She is interested in machine learning and its application to problems in natural language understanding. Recently she was a core member of the team that brought Smart Reply to Inbox by Gmail.

Abstract summary:

Smart Reply: Learning a Model of Conversation from Data: Smart Reply is a text assistance feature that was recently introduced to Inbox by Gmail.  Given an incoming email message, the Smartreply system analyzes its contents and suggests complete responses that the recipient can send with just one tap.  This talk will cover how we built Smartreply using a combination of deep learning and semantic clustering, as well as what we learned along the way and why we think it shows promise for the future of dialogue models.

View the slides for this presentation »

Watch this presentation on YouTube »


Jennifer Prendki, Principal Data Scientist, @WalmartLabs

Jennifer Prendki is a principal data scientist at @WalmartLabs. Prior to joining the Walmart team, she used all kinds of machine learning and data mining techniques to solve very diverse business problems, mainly in Finance and Advertising, and turned ideas and algorithms into real-time commercial systems. Before joining the data science community, she used her mathematical modeling and software engineering skills to study the mysteries of the universe. She obtained her PhD in Particle Physics from UPMC – Sorbonne University in Paris, France, in 2009, with an emphasis on the study of the asymmetry between matter and antimatter, a subject on which she also conducted post-doctoral research at Duke University.

While she enjoys working on lots of different problems, her research is consistently driven by both a passion for innovation, and a boundless taste for a challenge. She is particularly fascinated with the analysis of high-dimensional, noisy and unstructured datasets, and enjoys working on real-life problems. And of course, the more data, the better!

Abstract summary:

Review Analysis: an Approach to Leveraging User-Generated Content in the Context of Retail: What are customers really thinking? What are they looking for specifically when shopping for a product? And if they are satisfied with their purchase, what is the main reason?

Today’s technology offers many different avenues for customers to express themselves, set their expectations in writing, and share their opinion, frustration or satisfaction regarding all kinds of products and services. ‘Leaving a review’ has become an integral part of the purchase process. Through reviews, customers are volunteering invaluable information that can be turned into insights that would help drive business decisions (if you are a retailer), or help you make a successful purchase (if you are a customer). Yet the amount of data available to make these decisions is oftentimes extremely large, and it might be difficult for a human to read and synthesize all that has been said about their product of interest.

Review analysis and opinion mining offer solutions to automate the analysis of customer feedback through large-scale machine learning, natural language processing and sentiment analysis, and allow retailers to better understand their customers… as well as their data.

In this talk, I will present the various ways in which machine learning techniques can be used to extract the most significant features for a given category of products. I will then dig into a process aiming at identifying the sentiments relative to these features, and a useful way to aggregate this information into insights that are both usable and readable by any user. I will end with mentions to some of the challenges met when trying to extract objective information from a data source likely tainted with human subjectivity in an ever-changing market.

View the slides for this presentation »

Watch this presentation on YouTube »


Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM

Jean-François is currently the technical lead for IBM Machine Learning and Optimization offerings. He holds a PhD in machine learning from Paris IX University and has spent his entire career turning scientific ideas into innovative software. Jean-Francois joined IBM as part of the ILOG acquisition and since then has held various technical executive positions, including CTO for IBM Analytics Solutions. While with ILOG, he helped establish the company as a market leader in mathematic optimization with his work on constraint programming for CPLEX, which was acquired by ILOG. ILOG and CPLEX collectively are now known as IBM Decision Optimization.

Jean-François has a strong background in mathematics. He is an alumni of Ecole Normale Supérieure de Paris Ulm and has published over 60 scientific papers in refereed journals and international conferences with peer reviews. He is passionate about turning scientific ideas into innovative software that is as powerful as it is easy to use. He is also passionate about evangelization of scientific and technical ideas via his blog and social media. You can reach him via Twitter or LinkedIn.

Abstract summary:

Why Machine Learning Algorithms Fall Short (And What You Can Do About It): Many think that machine learning is all about the algorithms. Want a self-learning system? Get your data, start coding or hire a PhD that will build you a model that will stand the test of time. Of course we know that this is not enough. Models degrade over time, algorithms that work great on yesterday’s data may not be the best option, new data sources and types are made available. In short, your self-learning system may not be learning anything at all. In this session, we will examine how to overcome challenges in creating self-learning systems that perform better and are built to stand the test of time. We will show how to apply mathematical optimization algorithms that often prove superior to local optimization methods favored by typical machine learning applications and discuss why these methods can crate better results. We will also examine the role of smart automation in the context of machine learning and how smart automation can create self-learning systems that are built to last.

View the slides for this presentation »

Watch this presentation on YouTube »


Harm van Seijen, Research Scientist, Maluuba

Harm van Seijen is a research scientist at Maluuba. His research is centered on reinforcement learning, with the goal of addressing the fundamental machine learning challenges related to natural conversations between humans and bots. Prior to Maluuba, Harm worked as a postdoc at the University of Alberta, working alongside professor Richard Sutton on novel reinforcement learning techniques.

Abstract summary:

Using Deep Reinforcement Learning for Dialogue Systems:

View the slides for this presentation »

Watch this presentation on YouTube »


Nikhil Garg, Engineering Manager, Quora

Nikhil Garg is an engineering manager at Quora where is leading two teams – one focused on building Quora’s ML platform to support company-wide ML innovation and productization and the other focused on building ML/NLP systems to understand the quality of Quora’s content. He is very interested in distributed systems and product design in addition to machine learning.

Abstract summary

Building a Machine Learning Platform at Quora: Each month, over 100 million people use Quora to share and grow their knowledge. Machine learning has played a critical role in enabling us to grow to this scale, with applications ranging from understanding content quality to identifying users’ interests and expertise. By investing in a reusable, extensible machine learning platform, our small team of ML engineers has been able to productionize dozens of different models and algorithms that power many features across Quora.

In this talk, I’ll discuss the core ideas behind our ML platform, as well as some of the specific systems, tools, and abstractions that have enabled us to scale our approach to machine learning.

View the slides for this presentation »

Watch this presentation on YouTube »


Brian Lucena, Senior Data Scientist, Metis

Brian Lucena is currently a Senior Data Scientist at Metis, where he teaches data science bootcamps and conducts corporate training. He previously was the Senior VP of Analytics at PCCI where he led a team of data scientists building real-time predictive models for clinical decision support. Before that he was Chief Mathematician at Guardian Analytics, where he pioneered the development of fraud detection algorithms based on Bayesian behavioral modeling. He has an A.B. from Harvard and a Ph.D. from Brown, and has held teaching and research positions at the University of Washington, UC-Berkeley, and the American University in Cairo.

Abstract summary

Interpreting Black-Box Models with Applications to Healthcare: Complex and highly interactive models such as Random Forests, Gradient Boosting, and Deep Neural Networks demonstrate superior predictive power compared to their high-bias counterparts, Linear and Logistic Regression. However, these more complex and sophisticated methods lack the interpretability of the simpler alternatives. In some areas of application, such as healthcare, model interpretability is crucial both to build confidence in the model predictions as well as to explain the results on individual cases. This talk will discuss recent approaches to explaining “black-box” models and demonstrate some recently developed tools that aid this effort.

View the slides for this presentation »

Watch this presentation on YouTube »


Michelle Casbon, Director of Data Science, Qordoba

Michelle Casbon is Director of Data Science at Qordoba, a platform that uses machine learning to help companies globalize their products. Her focus is on scalable NLP that generalizes across (natural) languages. Previously, she was a Senior Data Science Engineer at Idibon, where she built tools for generating predictions on text datasets. Michelle completed a Masters at the University of Cambridge, focusing on NLP, speech recognition, speech synthesis, and machine translation. She loves working with open source projects and has contributed to Apache Spark and Apache Flume.



Rumman Chowdhury, Senior Data Scientist, Metis

Rumman comes to data science from a quantitative social science background. Prior to joining Metis, she was a data scientist at Quotient Technology, where she used retailer transaction data to build an award-winning media targeting model. Her industry experience ranges from public policy, to economics, and consulting. Her prior clients include the World Bank, the Vera Institute of Justice, and the Los Angeles County Museum of the Arts. She holds two undergraduate degrees from MIT, a Masters in Quantitative Methods of the Social Sciences from Columbia, and she is currently finishing her Political Science PhD from the University of California, San Diego.



Alex Korbonits, Data Scientist, Remitly, Inc.

Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.