The 2015 Machine Learning Conference in NYC is scheduled for March 27, 2015 at 230 Fifth Avenue. This venue boasts a large meeting space with several screens, to ensure you won’t miss a single slide of their favorite ML presentations. During breaks and meals, you’ll be energized by the postcard-ready midtown views and delicious coffee.


Corinna Cortes

Corinna Cortes, Head of Research at Google

Corinna Cortes is the Head of Google Research, NY, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs – Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna’s research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. Corinna received her MS degree in Physics from University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her Ph.D. in computer science from the University of Rochester in 1993. Corinna is also a competitive runner, and a mother of two.

Abstract summary

Finding Structured Data at Scale and Scoring its Quality: Recently Google launched Structured Snippets in its core search results. In this talk we will discuss techniques for extracting structured data at scale from the web, both from tables and free text. We will also discuss Lattice Regression, a powerful non-linear ML method that provides interpretable models.

View the slides for this presentation »

Watch this presentation on YouTube »

Ted Wilke

Ted Willke, Senior Principal Engineer at Intel Labs

Ted Willke leads a team that researches large-scale machine learning and data mining techniques in Intel Labs. Prior to returning to the Labs this year, he led the analytics team within Intel’s Datacenter Group, which develops cloud solutions for machine learning and data mining. He developed his expertise over his 17 years with Intel. He has developed both software and hardware technologies for datacenters in Intel Labs and Intel’s product organizations. Ted holds a Doctorate in electrical engineering from Columbia University. He won Intel’s highest award last year for starting a venture focused on graph-shaped data.

Abstract summary

You Thought What?!  The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking?  Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that.  Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes.  But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs.   In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds.  And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.

View the slides for this presentation »

Watch this presentation on YouTube »


Jeff Johnson, Research Engineer at Facebook

Jeff is a research engineer at Facebook AI Research, currently building new CPU and GPU-based systems for machine learning. At Facebook Jeff previously worked on Apollo, a novel large-scale distributed database system built using strong and weak consensus protocols. Prior to Facebook, he worked for over a decade in the video game industry, primarily on real-time physics simulation, distributed systems and low-level optimization. Jeff studied mathematics and computer science at Princeton University, with a BSE in Computer Science.

Abstract summary

Hacking GPUs for Deep Learning: GPUs have revolutionized machine learning in recent years, and have made both massive and deep multi-layer neural networks feasible. However, misunderstandings on why they seem to be winning persist. Many of deep learning’s workloads are in fact “too small” for GPUs, and require significantly different approaches to take full advantage of their power. There are many differences between traditional high-performance computing workloads, long the domain of GPUs, and those used in deep learning. This talk will cover these issues by looking into various quirks of GPUs, how they are exploited (or not) in current model architectures, and how Facebook AI Research is approaching deep learning programming through our recent work.

View the slides for this presentation »

Watch this presentation on YouTube »

Alina Beygekzimer

Alina Beygelzimer, Senior Research Scientist at Yahoo Labs

Alina Beygelzimer is a Senior Research Scientist at Yahoo Labs in New York, working on theoretical and applied machine learning. Before joining Yahoo, she was at the IBM Thomas J. Watson Research Center, where she received the Pat Goldberg Best Paper Award for her work on nearest neighbor search. Alina received a Ph.D. in Computer Science from the University of Rochester in 2003. Her work has has more than 1500 citations.

Abstract summary

Learning through exploration: I will talk about interactive learning applied to several core problems at Yahoo. Solving these problems well requires learning from user feedback. The difficulty is that only the feedback for what is actually shown to the user is observed. The need for exploration makes these problems fundamentally different from standard supervised learning problems—if a choice is not explored, we can’t optimize for it. Through examples, I will discuss the importance of gathering the right data. I will then discuss how to reuse data collected by production systems for offline evaluation and direct optimization. Being able to reliably measure performance offline allows for much faster experimentation, shifting from guess-and-check with A/B testing to direct optimization.

View the slides for this presentation »

Bryan Thompson

Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC

Mr. Bryan Thompson (SYSTAP, LLC) has 30+ years experience as a technologist, inventor and researcher in cloud computing and big data. He is the principle investigator for an XDATA research team investigating GPU-accelerated distributed architectures for graph databases and graph mining. He is the lead architect for the MapGraph and bigdata® platforms. MapGraph is a disruptive technology for graph analytics on NVIDIA GPUs, with over 30 billion traversed edges per second on a 64 GPU cluster. Bigdata is an open source distributed graph database used by Fortune 500 companies including EMC (SYSTAP provides the graph engine for the topology server used in their host and storage management solutions) and Autodesk (SYSTAP provides their cloud solution for graph search). Mr. Thompson has over 30 years experience related to cloud computing; graph databases; the semantic web; web architecture; relational, object, and RDF database architectures; knowledge management and collaboration; artificial intelligence and connectionist models; natural language processing; metrics, scalability studies, benchmarks and performance tuning; decision support systems. Mr. Thompson has been instrumental as the PI and Senior Scientist in multiple government programs, including DARPA’s XDATA, ODNI’s RDEC RDF Experiment, ONR’s Artificial Intelligence and Cognitive Research, and Training and Simulation for the General Staff College, Fort Leavenworth, Kansas. He is the co-founder and Chief Scientist of SYSTAP, LLC. Previous positions include co-founder, President and CTO of GlobalWisdom, Inc., and Executive Vice President and Senior Scientist with Cognitive Technologies, Inc. He is an expert in Java, C, C++ with an emphasis on concurrent programing.

Abstract summary

Graph Traversal at 30 billion edges per second with NVIDIA GPUs: I will discuss current research on the MapGraph platform. MapGraph is a new and disruptive technology for ultra-fast processing of large graphs on commodity many-core hardware. On a single GPU you can analyze the bitcoin transaction graph in .35 seconds. With MapGraph on 64 NVIDIA K20 GPUs, you can traverse a scale-free graph of 4.3 billion directed edges in .13 seconds for a throughput of 32 Billion Traversed Edges Per Second (32 GTEPS). I will explain why GPUs are an interesting option for data intensive applications, how we map graphs onto many-core processors, and what the future looks like for the MapGraph platform.

MapGraph provides a familiar vertex-centric abstraction, but its GPU acceleration is 100s of times faster than main memory CPU-only technologies and up to 100,000 times faster than graph technologies based on MapReduce or key-value stores such as HBase, Titan, and Accumulo. Learn more at

View the slides for this presentation »

Michal Maloh

Michal Malohlava, Software Engineer,

Michal is a geek, developer, Java, Linux, programming languages enthusiast developing software for over 10 years.
He obtained PhD from the Charles University in Prague in 2012 and post-doc at Purdue University.

During his studies he was interested in construction of not only distributed but also embedded and real-time component-based systems using model-driven methods and domain-specific languages. He participated in design and development of various systems including SOFA and Fractal component systems or jPapabench control system.

Abstract summary

Building Machine Learning Applications with Sparkling Water: Writing applications which are processing and analyzing large amount of data is still hard. It often requires to design and run Machine Learning experiments in small scale and then consolidate them into a form of application and run them in large scale. There are several distributed machine learning platforms which are trying to mitigate this effort. In this talk we will focus on Sparkling Water which is combining benefits of two platforms – H2O and Spark. H2O is an open-source distributed math-engine providing tuned Machine Learning library, Spark is an execution platform which allows for processing large amount of data. The talk will demonstrate Sparkling Water features and shows its benefits for building rich and robust Machine Learning applications.

View the slides for this presentation »

dan mallinger

Dan Mallinger, Data Science Practice Manager, Think Big Analytics

Dan Mallinger is the Data Science Practice Manager for Think Big Analytics. He has deep experience enabling analytics at enterprises and implementing data science solutions, having helped many of the Fortune 100. Dan has extensive experience working with product, business, and marketing teams across a wide variety of industries. His work with them has been focused on driving value from multi-structured and unstructured data sets. He is formally trained in statistics, computer science, and organizational psychology & leadership.

Abstract summary

Analytics Communication: Re-Introducing Complex Modeling: Despite a wide array of advanced techniques available today, too many practitioners are forced to return to their old toolkit of approaches deemed “more interpretable.” Whether because of non-legal policy or difficulty in executive presentation, these restraints result from poor analytics communication and inability to explain model risks and outcomes, not a failing of the techniques.

From sampling to feature reduction to supervised modeling, the toolbox and communications of data scientists are limited by these constraints. But, instead of simplifying models, data scientists can re-introduce often ignored statistical practices to describe the models, their risk, and the impact of changes in the customer environment.

Even in situations without restrictions, these approaches will improve how practitioners select models and communicate results. Through measurement and simulation, reviewed approaches can be used to articulate the promises, risks, and assumptions of developed models, without requiring deep statistical explanations.

View the slides for this presentation »

Ilona Murynets

Ilona Murynets, Senior Member of Technical Staff at AT&T Security Research Center

Ilona Murynets is a scientist at the Chief Security Office at AT&T. She obtained her Ph.D. in Systems Engineering at School of Systems and Enterprises, Stevens Institute of Technology. Her dissertation received an Outstanding Dissertation Award. Ilona holds B.Sc. degree in Mathematics and M.S. degree in Statistics, Financial & Actuarial Mathematics from Kiev National Taras Shevchenko University, Ukraine. Ilona’s research is in the area of data mining, optimization and statistical analysis in application to malware and fraud detection, mobile and network security.

Abstract summary

Mobile Network Fraud Analysis and Detection: Over the last years, wireless devices connected to the mobile network have been leveraged for various fraudulent and illegal activities such as massive dissemination of spam messages and arrangement of elaborated voice call bypassing schemes. Besides causing the economic loss to cellular operators, fraudsters degrade the local service where they operate. Often, cells are overloaded, and voice calls fraudulently re-routed have poor quality, which results in customer dissatisfaction.

The large amount of daily cellular traffic and continuously increasing number of mobile devices connecting to the network make detecting fraudulent activities extremely challenging. This talk will introduce the huge potential that network-based machine learning algorithms have in detecting fraudulent activities in cellular networks. Specifically, it will demonstrate the effectiveness of combining predictions of multiple classifiers in detecting SMS spam and SIMbox call bypass fraud in hundreds of millions of anonymized SMS and voice call detail records from one of the main cellular operators in the United States.


Irina Rish, Research Staff, IBM T.J. Watson Research Center

Irina Rish is a research staff member at the IBM T.J. Watson Research Center. She received MS in Applied Mathematics from Moscow Gubkin Institute, Russia, and PhD in Computer Science from the University of California, Irvine. Her areas of expertise include artificial intelligence and machine learning, with a particular focus on probabilistic graphical models such as Bayesian and Markov networks, sparsity and compressed sensing, information-theoretic experiment design and active learning, with numerous applications ranging from diagnosis and performance management of distributed computer systems (“autonomic computing”) to predictive modeling and statistical biomarker discovery in neuroimaging (functional MRI and EEG) and other biological data. Irina has published over 50 papers, several book chapters, two edited books, and a monograph on Sparse Modeling, taught several tutorials and organized numerous workshops at top machine-learning conferences, such as NIPS, ICML and ECML. She holds 24 patents and several IBM awards, including IBM Technical Excellence award, IBM Technical Accomplishment award, and multiple Invention Achievement Awards. Also, as an adjunct professor at the EE Department of Columbia University, she taught several advanced graduate courses on statistical learning and sparse signal modeling.

Abstract summary

Learning About Brain: Sparse Modeling and Beyond: Sparse modeling is a rapidly developing area at the intersection of statistical learning and signal processing, motivated by the age-old statistical problem of finding a relatively small subset of ”important” variables in high-dimensional datasets. Variable selection is particularly important for improving the interpretability of predictive models in scientific applications such as computational biology and neuroscience, where the main objective is to gain a better insight into functioning of a biological system, besides just learning ”black-box” predictors. Moreover, variable selection provides an effective way of avoiding the “curse of dimensionality” as it helps to prevent overfitting and reduce computational complexity in high-dimensional but relatively small-sample datasets, such as, for example, functional MRI (fMRI), where the number of variables (brain voxels) can range from 10 to 100 thousands, while the number of samples is typically limited to several hundreds.

In this talk, I will summarize our work on sparse models and other machine-learning approaches to ”brain decoding” (aka ”mind reading”), i.e. to prediction of mental states from functional MRI data, in a wide range of applications, from analyzing pain perception to discovering predictive patterns of brain activity associated with schizophrenia and cocaine addiction. I will mention several lessons learned from those applications that can hopefully generalize to other practical machine-learning problems. Finally, I will briefly discuss our recent project that focuses on inferring mental states from ”cheap” (unlike fMRI), easily collected data, such as speech and wearable sensors, with applications ranging from clinical settings (”computational psychiatry”) to everyday life (”augmented human”).

View the slides for this presentation »

Claudia Perlich

Claudia Perlich, Chief Scientist at Dstillery

Claudia Perlich currently acts as chief scientist at Dstillery (previously m6d) and in this role designs, develops, analyzes, and optimizes the machine learning that drives digital advertising. She has published more than 50 scientific article and holds multiple patents in machine learning. She has won many data mining competitions and best paper awards at KDD and is acting as General Chair for KDD 2014. Before joining m6d in February 2010, Perlich worked in the Predictive Modeling Group at IBM’s T. J. Watson Research Center, concentrating on data analytics and machine learning for complex real-world domains and applications. She holds a PhD in information systems from NYU and an MA in computer science from Colorado University and teaches in the Stern MBA program at NYU.

Abstract summary

All the Data and Still Not Enough!: There is a deeply symbiotic relationship between machine learning/predictive modeling and Big Data. Machine learning theory asserts that the more data the better. Empirical observations suggest that more granular data, a hallmark of Big Data, further improves performance. Predictive modeling is one of the core techniques that measurably delivers value across many industries and demonstrates the value of Big Data.

However, there is a surprising paradox of predictive modeling: when you need models most, even all the data is not enough or just not suitable. The foundation of predictive modeling requires that you have enough training data with the respective outcomes, preferably IID. But often this data is not available: there are only so many people buying luxury cars online to inform my targeting models. I can never observe what happens BOTH when I treat you AND when I don’t – which is what I need to make causal claims and measure the impact of strategic decisions. To allocate sales resources I love to know what a customer’s budget is – but maybe even he does not know.

So in the days and age of Big Data there remains an art to machine learning in situation where the right data is scarce. This talk will present a number of cases where enough of the right data is fundamentally not obtainable and how creative data science can still solve them.

View the slides for this presentation »

Watch this presentation on YouTube »

Jeremy Schiff

Jeremy Schiff, Senior Manager, Data Science at OpenTable

Jeremy Schiff earned an undergraduate in Electrical Engineering and Computer Science from the University of California in 2005, and a Ph.D. in Electrical Engineering and Computer Science in 2009, with a focus on applying machine learning and statistical inference to robotics. In 2006, Jeremy co-founded, an online photo editing company that powered companies such as MySpace and Photobucket. In 2009, Jeremy joined Ness Computing, a Personalized Search and Recommendation company. As VP of Machine Learning, he oversaw the efforts around Personalized Recommendations, and other data-driven features. Ness sold to OpenTable in 2014, where Jeremy now leads DataScience.

Abstract summary

Recommendation Architecture: Understanding the Components of a Personalized Recommendation System: When we typically talk about recommendation systems, we focus on specific novel algorithms and formulations for performing collaborative filtering. However, building a system to recommend items to a user in a personalized way often involves many more components than just a collaborative filter; it requires a much broader ecosystem of functionality, tools, and development pipelines. This presentation will discuss an holistic approach to building recommendation systems including 1) how A/B testing works with machine learning to iterate toward better recommendations, 2) how to couple an information-retrieval based search stack with collaborative filtering to capture user intent in a personalized way, and 3) making recommendations more relevant and interpretable.

View the slides for this presentation »

Watch this presentation on YouTube »

Ronald Menich

Ronald Menich, Chief Data Scientist at Predictix, LLC

Dr. Ronald P. Menich, Chief Data Scientist from Predictix, LLC, describes his firm’s experience in applying machine learning to retail demand forecasting, utilizing many different attributes / features/ causal factors / demand-drivers.

Ron Menich is EVP and Chief Data Scientist and Predictix, LLC, where he leads a team of machine learning and retail implementation personnel, delivering demand forecasting, promotional lift estimation and related services to a variety of customers and prospects. Dr. Menich has over 20 years of industrial experience in retail, revenue management, travel & hospitality, price optimization and related fields. He received his Ph.D. in Industrial & Systems Engineering and M.S. in Operations Research, both from Georgia Tech, and his undergrad in Math from the University of Illinois at Urbana/Champaign.

Abstract summary

Retail Demand Forecasting with Machine Learning: For over two decades, time-series methods, in combination with hierarchical spreading/aggregation via location and product hierarchies, and subsequent manual user adjustments, have been a standard means by which retailers and the software vendors who serve them have created demand forecasts. The forecasts so produced are and were used as inputs to store and vendor replenishment, regular and markdown pricing, and other downstream decision support systems. The rise of machine learning — the advent of high-powered commercial product recommender systems such as books at amazon book and movies at netflix, of powerful search (e.g., google), text processing (e.g., Facebook) and sentiment analysis capabilities, IBM Watson, self-driving cars and the like — is real phenomenon based on academically-sound and industrially-proven techniques whose application to retail demand forecasting is ripe.

View the slides for this presentation »

Watch this presentation on YouTube »

Juliet Hougland

Juliet Hougland, Data Scientist at Cloudera

Juliet is a data scientist at Cloudera where she builds tools that make data analysis on Hadoop easier and helps customers build models and data pipelines on huge data sets. Prior to Cloudera, she spent 3 years working on a variety of Big Data applications from building a real-time model application platform for e-commerce recommendations to designing predictive models for oil and gas pipeline operators. She holds an MS in Applied Mathematics from University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in Math-Physics.

Abstract Summary:

Matrix Decomposition at Scale: Matrix decomposition is an incredibly common task in machine learning, appearing everywhere including recommendation algorithms (SVD++), dimensionality reduction (PCA), and natural language processing (Latent Semantic Analysis) . Many well-known existing libraries can compute matrix decompositions when matrices fit in memory on a single machine. When the matrix no longer fits in memory and distributed computation is required, the computations becomes more complex and the details of the implementation become much more important. In this talk I will focus on the three major open source implementations of distributed eigen/singular value decomposition– LanczosSolver and StochasticSVD in Mahout and the SVD implementation in Spark MLLib. I will discuss the tradeoffs of of these implementations from the perspective of real world performance (beyond big-o notation for flops) and accuracy. I will conclude with some guidelines for choosing which implementation to use based on accuracy, performance, and scale requirements.

View the slides for this presentation »

Watch this presentation on YouTube »

Oscar Clema

Òscar Celma, Director of Research, Pandora

Òscar Celma is currently Director of Research at Pandora, where he leads a team of scientists to provide the best personalized radio experience. From 2011 till 2014 Òscar was Senior Research Scientist at Gracenote. His work focused on music and video recommendation and discovery. Before that he was co-founder and Chief Innovation Officer at Barcelona Music and Audio Technologies (BMAT). Òscar published a book named “Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space” (Springer, 2010). In 2008, Òscar obtained his Ph.D. in Computer Science and Digital Communication, in the Pompeu Fabra University (Barcelona, Spain). He holds a few patents from his work on music discovery as well as on Vocaloid, a singing voice-synthesizer bought by Yamaha in 2004.

Abstract summary

Inside Pandora: Practical Application of Big Data in Music: Pandora internet radio is best known for the Music Genome Project; the most unique and richly labeled music catalog of 1.5 million+ tracks. While this content-based approach to music recommendation is extremely effective and still used today as the foundation to the leading online radio service, Pandora has also collected more than a decade of contextual listener feedback in the form of more than 50 billion thumbs from 80M+ monthly active users who have created more than 7 billion stations.

This session will look at how the interdisciplinary team at Pandora goes about making sense of these massive data sets to successfully make large scale music recommendations to the masses.


Jeremy Stanley, EVP/Data Scientist – Sailthru

Chief Data Scientist & EVP of Engineering

Prior to Sailthru, Jeremy was the CTO at Collective where he led a team of data scientists, product managers and engineers in creating technology platforms that used machine learning and big data to solve digital advertising challenges. Before joining Collective, Jeremy founded and led the Global Markets Analytics Group at Ernst & Young, analyzing the firm’s markets, financial and personnel data to inform executive decision making.

Jeremy holds a bachelor’s degree in mathematics from Wichita State University, and an MBA from Columbia Business School.

Abstract summary

Cost Effectively Scaling Machine Learning Systems in the Cloud: E-commerce and publishing clients use Sailthru to personalize billions of digital experiences for their customers weekly. Earlier this year, Sailthru launched Sightlines to allow clients to predict the future behavior of individual users. In this talk we cover how we scaled Sightlines cost effectively in the cloud by combining inexpensive computing resources with an efficient architecture and easy to maintain and evolve implementation.

To access computing resources cost effectively, we utilize Amazon spot instances and Apache Mesos to pool together large quantities of CPU and memory. This approach can be orders of magnitude more cost effective than traditional deployments, but requires sophisticated automation and orchestration tools, and a fine-grained fault tolerant application architecture.

Given cost effective resources, the next challenge was to design the application to be efficient. Simple sampling and data pre-processing techniques significantly limit the computational requirements without adversely impacting model performance. Further, by controlling how often we run various components of the pipeline, we minimize cost while keeping models up to date.

The final challenge is to make such a system maintainable and easy to evolve. This includes removing single points of failure, automating infrastructure management, building distributed logging and monitoring capabilities, and running identical A / B production environments to enable aggressive, iterative changes to the code base and architecture in production.

We hope to demonstrate that the challenges faced in scaling a complex machine learning system in the cloud are at least as interesting as the science behind it, and to provide some insight into modern tools and methods for addressing these scalability challenges.

View the slides for this presentation »

Watch this presentation on YouTube »









Coffee Sponsorship: