The 2016 Machine Learning Conference in NYC was scheduled for April 15, 2016 at 230 Fifth Avenue.
Kaheer Suleman, CTO, Maluuba
As CTO and co-founder, Kaheer Suleman led the creation of Maluuba’s Deep-Learning based Natural Language Understanding platform and is the technology visionary behind the algorithms that power voice search across millions of devices. Kaheer’s background is in information retrieval and artificial intelligence and he previously worked in the artificial intelligence and information retrieval labs at the University of Waterloo under Information Retrieval experts – Professor Pascal Poupart and Olga Vectomova. Kaheer’s expertise in building language understanding and conversational systems has served as guide for Maluuba’s research and development team through some of the toughest challenges in machine comprehension and spoken dialogue.
Conversational Language Understanding: Recent advances in deep learning have all but solved speech recognition and image processing. The next frontier is natural language. Maluuba’s vision is intelligent machines that can think, reason, and communicate, and we believe that language and intelligence are inextricable. We’ll describe steps taken toward next-generation conversational systems. Such systems should include the capacity to retain memory across dialogue turns and from past conversations, to clarify user intents through dynamic, back-and-forth speech, and to acquire new knowledge through interaction with humans and they. By taking a deep-learning approach, we have achieved state-of-the-art performance in both question and answering as well as conversational language understanding.
Jennifer Marsman, Principal Developer Evangelist, Microsoft
Jennifer Marsman is a Principal Developer Evangelist in Microsoft’s Developer and Platform Evangelism group, where she educates developers on Microsoft’s new technologies. In this role, Jennifer is a frequent speaker at software development conferences around the world. In 2009, Jennifer was chosen as “Techie whose innovation will have the biggest impact” by X-OLOGY for her work with GiveCamps, a weekend-long event where developers code for charity. She has also received many honors from Microsoft, including the Central Region Top Contributor Award, Heartland District Top Contributor Award, DPE Community Evangelist Award, CPE Champion Award, MSUS Diversity & Inclusion Award, and Gold Club. Prior to becoming a Developer Evangelist, Jennifer was a software developer in Microsoft’s Natural Interactive Services division. In this role, she earned two patents for her work in search and data mining algorithms. Jennifer has also held positions with Ford Motor Company, National Instruments, and Soar Technology. Jennifer holds a Bachelor’s Degree in Computer Engineering and Master’s Degree in Computer Science and Engineering from the University of Michigan in Ann Arbor. Her graduate work specialized in artificial intelligence and computational theory. Jennifer blogs at https://blogs.msdn.com/jennifer and tweets at https://twitter.com/jennifermarsman.
Using EEG and Azure Machine Learning to Perform Lie Detection: Today, we have the technology to “read minds” (well, EEG waves!). Using an EPOC headset from Emotiv, I have captured 14 channels of EEG (brain waves) while subjects lied and answered truthfully to a series of questions. I fed this labelled dataset into Azure Machine Learning to build a classifier which predicts whether a subject is telling the truth or lying. In this session, I will share my results on this “lie detector” experiment. I will show my machine learning models, data cleaning process, and results, along with discussing the limitations of my approach and next steps/resources. This session will be a fun look inside your brain waves along with demonstrations of data processing and predictive analytics. Attendees will gain exposure to the Emotiv EPOC+ headset and Azure Machine Learning.
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook
Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. He holds a Masters in CS from NYU, and spent time in Yann LeCun’s NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.
Predicting the Future Using Deep Adversarial Networks: Learning With No Labeled Data: Labeling data to solve a certain task can be expensive, slow and does not scale. If unsupervised learning works, then one can have very little labelled data to help a machine solve a particular task. Most traditional unsupervised learning methods such as PCA and K-means clustering do not work well for complicated data distributions, making them useless for a lot of tasks. In this talk, I’ll go over recent advances in a technique for unsupervised learning called Generative Adversarial networks, which can learn to generate very complicated data distributions such as images and videos. These trained adversarial networks are then used to solve new tasks with very little labeled data, making them an attractive class of algorithms for many domains where there is limited labeled data but unlimited unlabeled data.
Erich Elsen, Research Scientist, Baidu
Erich Elsen is a Research Scientist at Baidu’s Silicon Valley Artificial Intelligence Lab. He leads the High Performance Computing (HPC) group which focuses on techniques for scaling the training of neural networks and increasing the efficiency of their use in deployment. Prior to joining Baidu in September 2014 he was the founder of Royal Caliber, an HPC consulting company that was responsible for Shazam’s music recognition engine running on GPUs. He teaches a course at Stanford University in the Spring on parallel computing.
Sergei Vassilvitskii, Research Scientist, Google
I am a Research Scientist at Google New York. Previously I was a Research Scientist at Yahoo! Research and an Adjunct Assistant Professor at Columbia University. I completed my PhD at Stanford Universty under the supervision of Rajeev Motwani. Prior to that I was an undergraduate at Cornell University.
Teaching K-Means New Tricks: Over 50 years old, the k-means algorithm remains one of the most popular clustering algorithms. In this talk we’ll cover some recent developments, including better initialization, the notion of coresets, clustering at scale, and clustering with outliers.
Braxton McKee, CEO & Founder, Ufora
Braxton is the technical lead and founder of Ufora, a software company that has built an adaptively distributed, implicitly parallel runtime. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.
Say What You Mean: Scaling Machine Learning Algorithms Directly from Source Code: Scaling machine learning applications is hard. Even with powerful systems like Spark, Tensor Flow, and Theano, the code you write has more to do with getting these systems to work at all than it does with your algorithm itself. But it doesn’t have to be this way!
In this talk, I’ll discuss an alternate approach we’ve taken with Pyfora, an open-source platform for scalable machine learning and data science in Python. I’ll show how it produces efficient, large scale machine learning implementations directly from the source code of single-threaded Python programs. Instead of programming to a complex API, you can simply say what you mean and move on. I’ll show some classes of problem where this approach truly shines, discuss some practical realities of developing the system, and I’ll talk about some future directions for the project.
Geetu Ambwani, Principal Data Scientist, Huffington Post
Data Science in the Newsroom:
Edo Liberty, Research Director, Yahoo
Edo Liberty is a Research Director at Yahoo and leads its Scalable Machine Learning group. His research interests include fast dimensionality reduction, clustering, streaming and online algorithms, text and pattern mining, machine learning, and large scale numerical linear algebra. Before joining Yahoo in 2009 Edo was a post doctoral fellow in the Program in Applied Mathematics at Yale University, where he also received his PhD in Computer Science. Prior to that he received his BS in Physics and Computer Science from Tel Aviv University.
Online Data Mining: PCA and K-Means: Algorithms for data mining, unsupervised machine learning and scientific computing were traditionally designed to minimize running time for the batch setting (random access to memory).
In recent years, a significant amount of research is devoted to producing scaleable algorithms for the same problems. A scaleable solution assumes some limitation on data access and/or compute model. Some well known models include map reduce, message passing, local computation, pass efficient, streaming and others. In this talk we argue for the need to consider the online model in data mining tasks. In an online setting, the algorithm receives data points one by one and must make some decision immediately (without examining the rest of the input). The quality of the algorithm’s decisions is compared to the best possible in hindsight. While practitioners are well aware of the need for such algorithms, this setting was mostly overlooked by the academic community. Here, we will review new results on online k-means clustering and online Principal Component Analysis (PCA).
Lei Yang, Senior Engineering Manager, Quora
Lei is an engineer manager at Quora, leading the feed ranking and content distribution team. She also oversees Quora’s machine learning engineer guild, consisting of machine learning experts and software engineers who use machine learning to solve many challenging problems across the product, such as home feed ranking, related questions, answer ranking, and topic inference. Prior to Quora, Lei grew and managed a number of engineering teams at Google: Google Now recommendations, Google+ recommendation and personalizations, and Google Ads Quality. She has years of experience in machine learning and is passionate about its application across different fields, such as data mining, user modeling, content recommendation, and spam detection. Lei holds a Ph.D. degree in Computer Engineering from Northwestern University.
Sharing and Growing the World’s Knowledge with Machine Learning: At Quora our mission is to “share and grow the world’s knowledge”. To accomplish this, we need to build a complex ecosystem which requires us to understand and solve a variety of problems like content quality, demand, user engagement, personalization, and author reputation. In this talk, we will go over several exciting challenges of applying machine learning to these problems. We will give examples such as our ranking and recommendation approaches, as well as systems and tools we built to support experimentation and integration of machine learning models in the product.
Samantha Kleinberg, Assistant Professor of Computer Science, Stevens Institute of Technology
Samantha Kleinberg is an Assistant Professor of Computer Science at Stevens Institute of Technology. She received her PhD in Computer Science from New York University in 2010 and was a Computing Innovation Fellow at Columbia University in the Department of Biomedical informatics from 2010-2012. She is the recipient of NSF CAREER and JSMF Complex Systems Scholar Awards. She is the author of “Causality, Probability, and Time” (Cambridge University Press, 2012) and“Why: A Guide to Finding and Using Causes” (O’Reilly Media, 2015), a nontechnical introduction to causality.
Causal Inference and Explanation to Improve Human Health: Massive amounts of medical data such as from electronic health records and body-worn sensors are being collected and mined by researchers, but translating findings into actionable knowledge remains difficult. The first challenge is finding causes, rather than correlations, when the data are highly noisy and often missing. The second is using these to explain specific cases, such as why an individual’s blood glucose is raised. In this talk I discuss new methods for both causal inference and explanation, and show how these could be used to provide individualized feedback to patients.
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consulting
Mathias Brandewinder has been developing software professionally for about 10 years, with a focus on forecasting and risk analysis models. He is a Microsoft F# MVP, and speaks regularly on functional programming and related topics at conferences worldwide. He is the author of “Machine Learning Projects for .NET Developers” (Apress), and the founder of Clear Lines Consulting. Mathias is based in San Francisco, blogs at www.clear-lines.com/blog, and can be found on Twitter as @brandewinder. Mathias holds degrees in Business from ESSEC, Economics from University Paris X, and Operations Research from Stanford University.
Scripts that Scale with F# and mbrace.io:
Nothing beats interactive scripting for productive data exploration and rapid prototyping: grab data, run code, and iterate based on feedback. However, that story starts to break down once you need to process large datasets or expensive computations. Your local machine becomes the bottleneck, and your are left with a slow and unresponsive environment.
In this talk, we will demonstrate on live examples how you can have your cake and eat it, too, using mbrace.io, a free, open-source engine for scalable cloud programming. Using a simple programming model, you can keep working from your favorite scripting environment, and execute code interactively against a cluster on the Azure cloud. We will discuss the relevance of F# and mbrace in a data science and machine learning context, from parallelizing code and data processing in a functional style, to leveraging F# type providers to consume data or even run R packages.
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo
Damien Lefortier is a Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team at Criteo where he has been actively involved in the development of Criteo’s large scale distributed machine learning library as well as in improving Criteo’s predictive algorithms for ad targeting. Before Criteo, Damien worked 3 years in the Search team at Yandex where he focused both on search quality and on infrastructure. At the same time, he started his PhD in information retrieval at the University of Amsterdam. His research work has been published at top tier conferences, such as WWW and CIKM.
Machine Learning for Display Advertising @ Scale: In this talk, we will briefly introduce the display advertising marketplace, its stakeholders and the key performance metrics. We will then present the models we have developed at Criteo for bidding in real-time auctions, product recommendation, and look & feel optimization at scale (1B+ monthly users, 3B+ products in our catalog, and 30K ad displayed / sec at peak traffic). For these tasks, we’ve moved over time from predicting rare, binary events (clicks) to predicting very rare events (sales) and continuous events (sales amounts), all of them being quite noisy, and we’ll discuss the different methods that we have tried to build these models (such as generalized linear models, trees or factorization machines). We’ll continue by discussing how we evaluate these models both offline and online. We will describe the infrastructure for large-scale distributed data processing that these algorithms rely upon and discuss different optimization techniques we have experimented with (such as SGD, L-BFGS, SVRG). Finally, we will conclude with future areas of research and discuss open challenges we are currently facing.”
Yael Elmatad, Senior Data Scientist, Tapad
Yael Elmatad is a Senior Data Scientist at Tapad. Prior to Tapad, Dr. Elmatad was a Faculty Fellow and Assistant Professor at NYU Physics Department, specializing in the use of high-performance computing to study model space parameter optimization. Ms. Elmatad holds a Ph.D. in Physical Chemistry from University of California, and B.S. in Chemistry with a focus on Mathematics and Computer Science from New York University.
Beyond the Classifier, Inspiration from Engineering Algorithms: Many data scientists work within the realm of Machine Learning and their problems are often addressable with techniques such as classifiers and recommendation engines. At Tapad, we have often had to look outside that standard toolkit to find inspiration from more traditional engineering algorithms. This has included solving our Device Graph’s connected component problem at scale as well as maintaining our Device Graph’s time-consistency in our cluster identification week over week.
Furong Huang, Ph.D. Candidate, UC Irvine – Winner of MLconf Industry Impact Student Research Award
Furong Huang is a 6th year Ph.D. Candidate from UC Irvine working with Professor Anima Anandkumar. Her research interests lie in developing scalable and parallel algorithms for large-scale data using statistical models. She has worked on non-convex function optimization such as finding tensor decomposition using stochastic gradient descent; developing fast detection algorithm to discover hidden and overlapping user communities from social networks; designing a parallel spectral tensor decomposition algorithm for topic modeling in Map-Reduce frameworks. Beside pure statistical computation, Furong has applied her machine learning techniques to biology. Recently she worked on neuronal cell types and their gene expression profiles in mouse brain by extracting mixture of spatial point process on large-scale high-resolution brain images, and the project started during her internship at Microsoft Research New England with Jennifer Chayes and Christian Borgs, along with Srinivas Turaga from Janelia Labs.
Discovery of Latent Factors in High-dimensional Data Using Tensor Methods: Learning latent variable mixture models in high-dimension is applicable in numerous domains where low-dimensional latent factors out of the high-dimensional observations are desired. Popular likelihood based methods optimize over a non-convex likelihood which is computationally challenging due to the high-dimensionality of the data, and it is usually not guaranteed to converge to a global or even local optima without additional assumptions. We propose a framework to overcome the problem of unscalable and non-convex likelihood by taking the power of inverse method of moments. By matching the moments, the problem of learning latent variable mixture models is reduced to a tensor (higher order matrix) decomposition problem in low-dimensional space. This framework incorporates dimensionality reduction and thus is scalable to high-dimensional data. Moreover, we show that the algorithm is highly parallel and implemented a distributed version on Spark in Scala language.
Ike Nassi, Founder, TidalScale
Dr. Ike Nassi is the founder of TidalScale, an Adjunct Professor of Computer Science at UC Santa Cruz, and a founding trustee at the Computer History Museum. Previously, Dr. Nassi was an executive vice president and chief scientist at SAP. Before joining SAP, Ike helped start three companies: Encore Computer Corporation, a pioneer in symmetric multiprocessors; InfoGear Technology, which developed Internet appliances and services; and Firetide, a wireless mesh networking company. He has also held executive positions at Cisco Systems, Apple Computer, Visual Technology, and Digital Equipment Corporation. Ike has been a visiting scholar at Stanford University, a research scientist at MIT, and a visiting scholar at University of California, Berkeley. He has served on the board of the Anita Borg Institute for Women and Technology and is currently affiliated with other non-profit organizations. He is an avid San Jose Sharks fan.
Scaling Spark – Vertically: The mantra of Spark technology is divide and conquer, especially for problems too big for a single computer. The more you divide a problem across worker nodes, the more total memory and processing parallelism you can exploit. This comes with a trade-off. Splitting applications and data across multiple nodes is nontrivial, and more distribution results in more network traffic which becomes a bottleneck. Can you achieve scale and parallelism without those costs?
We’ll show results of a variety of Spark application domains including structured data, graph processing and common machine learning in a single, high-capacity scaled-up system versus a more distributed approach and discuss how virtualization can be used to define node size flexibly, achieving the best balance for Spark performance.
Michael Galvin, Sr. Data Scientist, Metis
Michael comes to Metis from General Electric where he worked to establish the company’s data science strategy and capabilities for field services and to build solutions supporting global operations, risk, engineering, sales, and marketing. He also taught data science and machine learning for General Assembly. Prior to GE, Michael spent several years as a data scientist working on problems in credit modeling at Kabbage and corporate travel and procurement at TRX. Michael holds a Bachelor’s degree in Mathematics and a Master’s degree in Computational Science and Engineering from the Georgia Institute of Technology where he also spent 3 years working on machine learning research problems related to computational biology and bioinformatics. Additionally, Michael spent 12 years in the United States Marine Corps where he held various leadership roles within aviation, logistics, and training units. In his spare time, he enjoys running, traveling, and reading.
An Introduction to Word2vec and Working With Text Data: This session will introduce the basics of working with textual data using various Python libraries and give several examples of real world applications. It will start by introducing the basic bag of word representation and move onto other models with an emphasis on word2vec. By the end of this talk, participants will be able to use Python to explore and build their own models on text data.
Welch Labs, Short Tutorial Videos on Mutual Information & Decision Trees
Learning new technical material is tough. Theory from textbooks can be dense and intimidating, and tutorials often gloss over the why to focus on the how. Being proficient at technical topics requires not only “how-to” skills, but also an understanding of the broader context in which these topics exist.
At Welch Labs, we strive to make the whole picture click – from math, to concepts, to code. We do this with fun and practical short video series and accompanying resources, covering topics from the high school to the graduate school level.
We’re excited to announce that for all 2016 events, Welch Labs will be revealing original tutorial videos on various ML topics to be played before breaks at each event. In 2016, topics covered include: Mutual Information, Decision Trees, Convolutional Neural Networks, Deep Learning, Bayesian Inference, MCMC, HMM, Support Vector Machines, The Kernel Trick, and Dimensionality Reduction. Welch Labs played numerous videos at 2015 events, which were positively received by the MLconf audience. These videos can be found on their website here.