The 2017 Machine Learning Conference in New York City is scheduled for March 24th, 2017 at 230 5th Avenue. Located in the heart of picturesque Mid-Town NYC, this venue boasts a large meeting space with natural light, views of the city and with several screens– you won’t miss a single slide of your favorite ML presentations.


Corinna Cortes

Corinna Cortes, Head of Research, Google

Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines.

Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning.

Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting.

Abstract Summary:

Harnessing Neural Networks:
Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times.

In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.

Watch a previous presentation by Corinna Cortes here »

Aaron Roth, Associate Professor, University of Pennsylvania

Aaron Roth is an Associate Professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.”

Abstract Summary:

Differential Privacy and Machine Learning:
In this talk, we will give a friendly introduction to Differential Privacy, a rigorous methodology for analyzing data subject to provable privacy guarantees, that has recently been widely deployed in several settings. The talk will specifically focus on the relationship between differential privacy and machine learning, which is surprisingly rich. This includes both the ability to do machine learning subject to differential privacy, and tools arising from differential privacy that can be used to make learning more reliable and robust (even when privacy is not a concern).

Alexandra Johnson, Software Engineer, SigOpt

Alexandra works on everything from infrastructure to product features to blog posts. Previously, she worked on growth, APIs, and recommender systems at Polyvore (acquired by Yahoo). She majored in computer science at Carnegie Mellon University with a minor in discrete mathematics and logic, and during the summers she A/B tested recommendations at internships with Facebook and Rent the Runway.

Abstract Summary:

Common Problems In Hyperparameter Optimization: All large machine learning pipelines have tunable parameters, commonly referred to as hyperparameters. Hyperparameter optimization is the process by which we find the values for these parameters that cause our system to perform the best. SigOpt provides a Bayesian optimization platform that is commonly used for hyperparameter optimization, and I’m going to share some of the common problems we’ve seen when integrating into machine learning pipelines.

Erik Bernhardsson, CTO, Better Mortgage

Erik Bernhardsson is the CTO at Better, a small startup in NYC working with mortgages. Before Better, he spent five years at Spotify managing teams working with machine learning and data analytics, in particular music recommendations.

Abstract Summary:

Nearest Neighbor Methods And Vector Models: Vector models are being used in a lot of different fields: natural language processing, recommender systems, computer vision, and other things. They are fast and convenient and are often state of the art in terms of accuracy. One of the challenges with vector models is that as the number of dimensions increase, finding similar items gets challenging. Erik developed a library called “Annoy” that uses a forest of random tree to do fast approximate nearest neighbor queries in high dimensional spaces. We will cover some specific applications of vector models with and how Annoy works.

Soumith Chintala, Artificial Intelligence Research Engineer, Facebook

Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. He holds a Masters in CS from NYU, and spent time in Yann LeCun’s NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.

Abstract Summary:

Dynamic Deep Learning: a paradigm shift in AI research and tools:
AI research has seen many shifts in the last few years. We’ve seen research go from using static datasets such as Imagenet to being more dynamic and online in self-driving cars, robots and game-playing.Many dynamic environments such as Universe and Starcraft are being used in AI research to solve problems pertaining to reinforcement learning and online learning. In this talk, I shall discuss these shifts in research. Tools such as PyTorch, DyNet and Chainer have popped up to cope up with the paradigm shift, enabling cutting-edge AI, and I shall discuss these as well.

Ross Goodwin, Technologist – Creater, Sunspring

Ross Goodwin is a creative technologist, artist, hacker, data scientist, and former White House ghostwriter. Ross helped conceive Sunspring, a 2016 experimental science fiction short film entirely written by an artificial intelligence bot using neural networks. He employs machine learning, natural language processing, and other computational tools to realize new forms and interfaces for written language.

Abstract Summary:

Narrated Reality:
Can machine intelligence enable new forms and interfaces for written language, or does it merely reveal an “uncanny valley” of text? Join Ross Goodwin as he discusses his work with neural networks for creative applications, including expressive image captioning, narration devices for your home and car, and a film (Sunspring) created from a computer generated screenplay.

Irina Rish, Researcher, the AI Foundations- Department of the IBM T.J. Watson Research Center

Irina Rish is a researcher at the AI Foundations department of the IBM T.J. Watson Research Center. She received MS in Applied Mathematics from Moscow Gubkin Institute, Russia, and PhD in Computer Science from the University of California, Irvine. Her areas of expertise include artificial intelligence and machine learning, with a particular focus on probabilistic graphical models, sparsity and compressed sensing, active learning, and their applications to various domains, ranging from diagnosis and performance management of distributed computer systems (“autonomic computing”) to predictive modeling and statistical biomarker discovery in neuroimaging and other biological data. Irina has published over 60 research papers, several book chapters, two edited books, and a monograph on Sparse Modeling, taught several tutorials and organized multiple workshops at machine-learning conferences, including NIPS, ICML and ECML. She holds 24 patents and several IBM awards. Irina currently serves on the editorial board of the Artificial Intelligence Journal (AIJ). As an adjunct professor at the EE Department of Columbia University, she taught several advanced graduate courses on statistical learning and sparse signal modeling.

Abstract Summary:

Learning About the Brain and Brain-Inspired Learning:
Quantifying mental states and identifying statistical biomarkers of mental disorders from neuroimaging data is an exciting and rapidly growing research area at the intersection of neuroscience and machine learning, with the particular focus on interpretability and reproducibility of learned models. We will discuss promises and limitations of machine-learning methods in such applications, focusing on recent applications of deep learning methods such as recurrent convnets to the analysis of “brain movies” (EEG) data. On the other hand, besides the above “AI to Brain” direction, we will also discuss the “Brain to AI”, namely, borrowing ideas from neuroscience to improve machine learning, with specific focus on adult neurogenesis and online model adaptation in representation learning.

Evan Estola, Lead Machine Learning Engineer, Meetup

Evan is a Lead Machine Learning Engineer working on the Data Team at Meetup. Combining product design, machine learning research and software engineering, Evan builds systems that help Meetup’s members find the best thing in the world: real local community. Before Meetup, Evan worked on hotel recommendations at Orbitz Worldwide, and he began his career in the Information Retrieval Lab at the Illinois Institute of Technology.

Abstract Summary:

Machine Learning Heresy and the Church of Optimality:
As Machine Learning continues to grow in both usage and impact on people’s lives, there has been a growing concern around the ethics of using these systems. In application areas such as hiring selection, loan review, and even prison sentencing, ML is being used in ways that raise questions about the fairness of these algorithms. But what does it mean for an algorithm to be fair? An algorithm will consistently make the same decision when given the same data, leading some people to argue that building an optimal algorithm is inherently fair. Even in the case of using sensitive features like age, race and gender, if the data is predictive, aren’t we just modeling reality?

In this talk, I will argue that these questions do not let us off the hook in regards to the impact of the systems we build as Machine Learning engineers. I think it is important to question the nature of how ‘optimal’ a model can even be in the first place. Finally, I will discuss what kinds of organizational resistance engineers might run into, and how to deal with questionable ethical decisions for the sake of being ‘optimal’.

Yi Wang, Tech Lead of AI Platform, Baidu

Yi Wang is the tech lead of AI Platform at Baidu. The team is a primary contributor of PaddlePaddle, the open source deep learning platform originally developed in Baidu. Before Baidu, he was a founding member of ScaledInference, a Palo Alto-based AI startup company. Before that, he was a senior staff at LinkedIn, engineering director of advertising system at Tencent, and researcher at Google.

Abstract Summary:

Fault-tolerable Deep Learning on General-purpose Clusters:
Researchers have been used to running deep learning jobs on clusters. In industrial applications, AI is built on top of big data and deep learning is only one stage of the data pipeline. That is where MPI-based clusters are not enough, and general-purpose cluster management systems are necessary to run Web servers like Nginx, log collectors like fluentd and Kafka, data processors on top of Hadoop, Spark, and Storm, and deep learning, which improves the Web service quality. This talk explains how we integrate PaddlePaddle and Kubernetes to provide an open source fault-tolerable large-scale deep learning platform.

Claudia Perlich, Chief Scientist, Dstillery

Claudia Perlich leads the machine learning efforts that power Dstillery’s digital intelligence for marketers and media companies. With more than 50 published scientific articles, she is a widely acclaimed expert on big data and machine learning applications, and an active speaker at data science and marketing conferences around the world.
Claudia is the past winner of the Advertising Research Foundation’s (ARF) Grand Innovation Award and has been selected for Crain’s New York’s 40 Under 40 list, Wired Magazine’s Smart List, and Fast Company’s 100 Most Creative People.
Claudia holds multiple patents in machine learning. She has won many data mining competitions and awards at Knowledge Discovery and Data Mining (KDD) conferences, and served as the organization’s General Chair in 2014.
Prior to joining Dstillery in 2010, Claudia worked at IBM’s Watson Research Center, focusing on data analytics and machine learning. She holds a PhD in Information Systems from New York University (where she continues to teach at the Stern School of Business), and an MA in Computer Science from the University of Colorado.

Abstract Summary:

Predictability and other Predicaments:
In the context of building predictive models, predictability is usually considered a blessing. After all – that is the goal: build the model that has the highest predictive performance. The rise of ‘big data’ has in fact vastly improved our ability to predict human behavior thanks to the introduction of much more informative features. However, in practice things are more differentiated than that. For many applications, the relevant outcome is observed for very different reasons. In such mixed scenarios, the model will automatically gravitate to the one that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. We present a number of applications where this happens: clicks on ads being performed ‘intentionally’ vs. ‘accidentally’, consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. In conclusion, the combination of different and highly informative features can have significantly negative impact on the usefulness of predictive modeling.

Ben Lau, Quantitative Researcher, Hobbyist

Ben Lau is a quantitative researcher in a macro hedge fund in Hong Kong and he looks to apply mathematical models and signal processing techniques to study the financial market. Prior joining the financial industry, he specialized in using his mathematical modelling skills to discover the mysteries of the universe whilst working at Stanford Linear Accelerator Centre, a national accelerator laboratory where he studied the asymmetry between matter and antimatter by analysing tens of billions of collision events created by the particle accelerators. Ben was awarded his Ph.D. in Particle Physics from Princeton University and his undergraduate degree (with First Class Honours) at the Chinese University of Hong Kong.

Abstract Summary:

Deep Reinforcement Learning: Developing a robotic car with the ability to form long term driving strategies is the key for enabling fully autonomous driving in the future. Reinforcement learning has been considered a strong AI paradigm which can be used to teach machines through interaction with the environment and by learning from their mistakes. In this talk, we will discuss how to apply deep reinforcement learning technique to train a self-driving car under an open source racing car simulator called TORCS. I am going to share how this is implemented and will discuss various challenges in this project.

Layla El Asri, Research Scientist, Maluuba

Layla El Asri is a research Scientist at Maluuba. Her work explores artificial intelligence in the context of language understanding, dialogue and human-machine interaction. Layla leads a team seeking to build artificial intelligence systems that are knowledgeable and can exchange information with users to help users accomplish tasks or gain knowledge. Layla completed her PhD at Université de Lorraine in France.

Abstract Summary:

Teaching AI To Make Decisions and Communicate:
Many advances have been made in the area of artificial intelligence, with the goal of building agents that understand how they can interact with their environments, reason and solve complex tasks, and communicate their findings to humans. In this talk, I will focus on efficient decision-making and communication. For decision-making, I will present some work on building an efficient representation of the environment and breaking down tasks into generalizable subtasks. For communication, I will focus on dialogue through natural language and present some of our work in this area.

Ben Hamner, CTO, Kaggle

Ben Hamner is Kaggle’s co-founder and CTO. At Kaggle, he currently’s focused on creating tools that empower data scientists to frictionlessly collaborate on analytics and promote their results. He has worked with machine learning across many domains, including natural language processing, computer vision, web classification, and neuroscience. Prior to Kaggle, Ben applied machine learning to improve brain-computer interfaces as a Whitaker Fellow at the École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland. He graduated with a BSE in Biomedical Engineering, Electrical Engineering, and Math from Duke University.

Abstract Summary:

The Future of Kaggle: Where We Came From and Where We’re Going:
Kaggle started off running supervised machine learning competitions. This attracted a talented and diverse community that now has nearly one million members. It’s exposed us to hundreds of machine learning usecases, introduced hundreds of thousands to machine learning, and helped push the state of the art forward. We’ve expanded by launching an open data platform, Kaggle Datasets, along with a reproducible and collaborative machine learning platform, Kaggle Kernels. They have already achieved strong adoption by our community by making it simpler to get started with, share, and collaborate on data and code.

We’ve achieved less than 1% of what we’re capable of. Several weeks ago we launched an announced an acquisition by Google. This enables us to move forward more rapidly and ambitiously. Working with analytics and machine learning is fraught with pain right now. It’s the software engineering equivalent of programming in assembly. It’s tough to access data. It’s tough to collaborate. It’s tough to reproduce results. We’ve seen these pain points over, and over, and over again. We’ve seen them in how our customer’s internal teams function. We’ve experienced them collaborating with our customers. We’ve seen them as people approach our competitions individually, and they become even more pronounced when our users team up. We want to solve this, and foster an era of intelligent services that improve your lives every single day.

In this talk, I’ll go into depth on the lessons we’ve learned from running Kaggle and the most frustrating pain points we’ve seen. I’ll discuss how you can ameliorate these by leveraging current open source tools and technologies, and wrap up by painting a picture of the future we’re building towards.

Byron Galbraith, Chief Data Scientist, Talla

Byron Galbraith is the Chief Data Scientist and co-founder of Talla, where he works to translate the latest advancements in machine learning and natural language processing to build AI-powered conversational agents. Byron has a PhD in Cognitive and Neural Systems from Boston University and an MS in Bioinformatics from Marquette University. His research expertise includes brain-computer interfaces, neuromorphic robotics, spiking neural networks, high-performance computing, and natural language processing. Byron has also held several software engineering roles including back-end system engineer, full stack web developer, office automation consultant, and game engine developer at companies ranging in size from a two-person startup to a multi-national enterprise.

Abstract Summary:

Bayesian Bandits:
What color should that button be to convert more sales? What ad will most likely get clicked on? What movie recommendations should be displayed to keep subscribers engaged? What should we have for lunch? These are all examples of iterated decision problems — the same choice has to be made repeatedly with the goal being to arrive at an optimal decision strategy by incorporating the results of the previous decisions. In this talk I will describe the Bayesian Bandit solution to these types of problems, how it adaptively learns to minimize regret, how additional contextual information can be incorporated, and how it compares to the more traditional A/B testing solution.

Mayur Thakur, Managing Director, Goldman Sachs

Mayur is head of the Data Analytics Group in the Global Compliance Division. He joined Goldman Sachs as a managing director in 2014.
Prior to joining the firm, Mayur worked at Google, where he designed search algorithms for more than seven years. Previously, he was an assistant professor of computer science at the University of Missouri.
Mayur earned a PhD in Computer Science from the University of Rochester in 2004 and a BTech in Computer Science and Engineering from the Indian Institute of Technology, Delhi, in 1999.

Abstract Summary:

Surveillance platforms for bank compliance
Bank compliance uses models to look for outlier events such as insider trading, spoofing, front running, etc. With the exponential increase in the size of the data and a growing need to use such models, a key question is: How do we scale these models so they run efficiently and at the same time detect outlier events with good precision and recall?

In this talk, we will describe our experience building, from scratch, a Hadoop-based platform for surveillance.

Yuri M. Brovman, Data Scientist, eBay

Yuri is a Member of Technical Staff / Data Scientist at eBay in New York City. He is currently focused on developing scalable machine learning algorithms to produce high quality item recommendations. Yuri holds a Ph.D. degree from the Applied Physics and Applied Mathematics department from Columbia University and an undergraduate degree in Physics from UC Berkeley.

Abstract Summary:

Innovations in Recommender Systems for a Semi-structured Marketplace:
eBay has over 1 billion live items on the site at any given time. The lack of structured information about listings as well as variable inventory makes traditional collaborative filtering algorithms difficult to use in eBay’s large semi-structured marketplace. We will discuss approaches to overcome these challenges using machine learning and deep learning (both text and image based models). The details of the sampling strategy, feature engineering, and machine learned ranking model are all important for delivering improved operational metrics in A/B tests. We will cover both system architecture engineering as well as data science and machine learning methods that were developed to generate high quality recommendations.


Jeff Bradshaw, Founder, Adaptris

Jeff Bradshaw is the founder of Adaptris and Group CTO of Adaptris/F4F/DBT within Reed Business Information. He has spent his career integrating data wherever it resides and in-flight across a number of industries including Agriculture, Airlines, Telecommunications, Healthcare, Government and Finance.
Jeff has worked with and contributed to a number of international standards bodies and continues to work with large enterprises to help them extract value from their data silos and share data seamlessly with their trading partners to achieve business benefit. For the last few years Jeff has been focusing on Big Data and how to gather that across a wide range of sources to help gain insight into the agri-food supply chain.

Abstract Summary:

Precision agriculture – Predicting outcomes for farmers using machine learning to help feed the world:
Agricultural data is vast, often unstructured and includes many challenges when working with legacy farm systems on premise in rural areas. For instance, traditional farm equipment such as tractors, sprayers, and combines aren’t often from the same vendor, and it’s complex moving data between them. This is further complicated with the vast array of other systems used by our farmers. Furthermore, the number of sensors in agriculture is astonishing, whether it is sensors that measure the gait of the cow walking into the dairy parlor, or chickens that are pecking. All this data needs to turn into usable information on a global scale to improve the yields farmers get and provide greater visibility into what’s going on both in and out of the farm. In this session, a case study will be shared on how data was collected, normalized and analyzed leveraging the open source HPCC Systems platform from remote Farm Management Systems (used by farmers to manage their farms), and when merged with weather data, soil data and actual machinery data, the analyzed predictions is used to feed Agronomists and Crop Protection/Seed Manufacturers to get recommendations back. The goal is to deliver a precision agriculture solution, helping farmers increase their yield, which then helps feed the growing population of the world.