The 2016 Machine Learning Conference in SEA was May 20, 2016 at Columbia Tower Club.
Ted Willke, Sr Principal Engineer, Intel
Ted Willke leads a team that researches large-scale machine learning and data mining techniques in Intel Labs. His research interests include parallel and distributed systems, image processing, machine learning, graph analytics, and cognitive neuroscience. Ted is also a co-principal investigator in a multi-year grand challenge project on real-time brain decoding with the Princeton Neuroscience Institute. Previously, he founded an Intel venture focused on graph analytics for data science that is now an Intel-supported open source project. In 2014, he won Intel’s highest award for this effort. In 2015, he was appointed to the Science & Technology Advisory Committee of the US Department of Homeland Security. Ted holds a doctorate in electrical engineering from Columbia University, a master’s from the University of Wisconsin-Madison, and a bachelor’s from the University of Illinois.
Can Cognitive Neuroscience Provide a Theory of Deep Learning Capacity?: Deep neural networks have achieved learning feats for video, image, and speech recognition that leave other techniques far behind. For example, the error rate on the ImageNet 2012 object recognition challenge was halved with the introduction of deep convolutional nets and now they dominate these competitions. At the same time, the industry is busy putting them to use on applications spanning autonomous driving to product recommenders and researchers continue to propose more elaborate topologies and intricate training techniques. But our theoretical understanding of how these networks encode representations of the “things they see” is far behind, as is our understanding of their limitations.
To advance deep neural network design from “black magic” to an engineering problem, we need to understand the impact that the choice of topology and parameters have on learnt representations and the processing that a network is capable of. How many representations can a given network store? How does representation “reuse” impact learning rate and learning capacity? How many tasks can a given network perform?
In this talk, I’ll describe why the human brain, with its seemingly unlimited parallel distributed processing, is downright terrible at multi-tasking and why this is totally logical. And I’ll describe the theoretical implications this may have for artificial neural networks. I’ll also describe very recent work that sheds some light on how representations are encoded and how our research team is extending this work to create practical best practices for network design.
Ewa Dominowska, Engineering Manager, Facebook
Ewa Dominowska joined Facebook in spring of 2014 where she has managed teams focused on Science, Optimization and Identity Modeling for Online Advertising. Before coming to Facebook she designed a large scale predictive analytics platform for mobile devices as a Chief Architect at Medio Systems (acquired by Nokia). Prior to her start-up days, Ewa spent 10 years in various roles at Microsoft. At Microsoft, Ewa joined the Online Services Division to help found adCenter (now Bing Ads). Her work focused on real-time ad ranking, targeting, content analysis, click prediction, and pricing models. As part of the small yet dynamic original team, Ewa designed, architected, and built the alpha version of the contextual advertising product. In 2007, Ewa founded the Open Platform Research and Development team. As part of this effort, she organized the Beyond Search academic program, TROA WWW Workshop, and IRA SIGIR Workshop, resulting in a number of very successful collaborations between academia and industry. During her tenure in the Online Services Division, Ewa spent a year serving as the TA for Satya Nadella, where she advised and assisted in operation and planning for the division. The role encompassed architecture, technology, large-scale data services, and cross-organizational efficiency. Ewa was responsible for the intellectual property process, long-term strategy, and prioritization for the division. In 2010 Ewa started the adCenter Marketplace team responsible for all aspects of the advertising marketplace health and tuning. She architected and built a petabyte-scale distributed data and analytics platform and created a suite of marketplace and experimentation tools. Ewa earned her degrees in Electrical Engineering/Computer Science and Mathematics from MIT. Her research focused on machine learning, natural language processing, and predictive, context aware systems applied in the medical field. Ewa authored several papers and dozens of patents in the areas of online advertising, search, pricing models, predictive algorithms and user interaction.
Generating a Billion Personal News Feeds: With exponential growth of information and improved access, there is more and more data and not enough time to digest it. Facebook’s News Feed attempts to solve this by offering a way to show the most relevant content to each individual person. We create billions of personalized experiences by ranking stories for each person. Over the years, News Feed ranking has evolved to use large-scale machine learning techniques, driving to maximize the value created for each individual. Ranking and organizing the content in a unique way for a billion of users poses unique challenges. Each time a person visits their News Feed, we need to find the best piece of content out of all the available stories for them and put it at the top of Feed, where people are most likely to see it. To accomplish this, we model each person, attempting to figure out which friends, pages, and topics they care most about, and pick the stories and ordering they will find most interesting. In addition to the machine learning problems we work on for directing those choices, another primary area of research is understanding the value we are creating for people. These joint problems of selection and evaluation are essential for delivering continued value in personalized Feeds, and they would not be possible at the huge scale of content and users that Facebook operates at without powerful machine learning and analytics.
Igor Markov, Software Engineer, Google
Igor L. Markov is currently working at Google on search infrastructure, while holding appointments at Michigan and occasionally teaching at Stanford. He received his M.A. in Mathematics and Ph.D. in Computer Science from UCLA. He is an IEEE Fellow and an ACM Distinguished Scientist.
Igor is interested in computers that make computers, including algorithms and mathematical models, as well as software and hardware. Some of his results lead to order-of-magnitude improvements in practice, and many of them are being used in commercial tools and open-source software. During the 2011 redesign of the ACM Computing Classification System, he lead the effort on the Hardware tree. At the University of Michigan, he chaired the undergraduate program in Computer Engineering (ranked #7 in the US) for a number of years.
Igor co-authored five books and over 200 refereed publications. He served on program committees and chaired tracks at top conferences in Electronic Design Automation. Twelve Ph.D. degrees were defended under his guidance. Current and former students interned at or have been employed by AMD, Altera, Amazon, Berkeley, Cadence, Calypto, Columbia University, the US Department of Defense, the US Department of Energy, the Howard Hughes Medical Inst., Google, IBM Research, Lockheed Martin, Microsoft, MIT, Qualcomm, Samsung, Synopsys, Texas Instruments.
Igor lead student teams that won 1st places at multi-month research competitions in optimization software (organized by IBM Research and Intel Labs) in 2007, 2009, 2010, 2012 and 2013.”
Can AI Become a Dystopian Threat to Humanity? – A Hardware Perspective: Viewing future AI as a possible threat to humanity has long become common in the movie industry, while some serious thinkers (Hawking, Musk) have also promoted this perspective, even though prominent ML experts don’t see this happening any time soon. Why is this topic attracting so much attention? What can we learn from the the past? This talk draws attention to physical limitations of possible threats, such as energy sources and the ability to reproduce. These limitations can be made more reliable and harder to circumvent, while the hardware of future AI systems can be designed with particular attention to physical limits.
Carlos Guestrin, CEO of Dato Inc, Amazon Professor of Machine Learning at the University of Washington
Carlos Guestrin is the Amazon Professor of Machine Learning in Computer Science & Engineering at the University of Washington, and co-founder and CEO of Dato. He also co-teaches the Machine Learning Specialization through UW and Coursera. His previous positions include the Finmeccanica Associate Professor at Carnegie Mellon University and senior researcher at the Intel Research Lab in Berkeley. Carlos received his PhD and Masters from Stanford University, and a Mechatronics Engineer degree from the University of Sao Paulo, Brazil. Carlos’ work has been recognized by over a dozen best-paper awards at top conferences and journals. He is also a recipient of the Alfred P. Sloan Fellowship, IBM Faculty Fellowship, IJCAI Computers and Thought Award and the Presidential Early Career Award for Scientists and Engineers (PECASE). Carlos was named one of the 2008 `Brilliant 10′ by Popular Science Magazine and is a former member of the Information Sciences and Technology (ISAT) advisory group for DARPA.
How Can We Trust Machine Learning? Exploration, Evaluation and Explanation for ML Models: Machine learning technologies are at the core of a new generation of intelligent applications that differentiate disruptive businesses from established players. Today, business tasks like product recommendation, image tagging, sentiment analysis, churn prediction, fraud detection and lead scoring can only be achieved using machine learning (ML). To build these applications at scale, companies are fast adopting tools such as Dato’s GraphLab Create and Predictive Services, enabling developers to accelerate the innovation cycle, and quickly take their ideas from inspiration to production.
Industry practitioners understand that in order to secure adoption of intelligent applications, they must build trust in their models and predictions – that is, gain confidence that their models are achieving their desired outcomes and a good understanding of how predictions are made. In this talk, I’ll describe both: a) Recent research done at the University of Washington to provide a formal framework that explains why a machine learning model makes a particular prediction, and how even non-experts can use these explanations to improve the performance of a model. b) New tools introduced by Dato to help industry practitioners build trust and confidence in machine learning by making it easy to evaluate, explore, and explain models and predictions.
With these techniques, companies can start to have the means to gain trust and confidence in the models and predictions behind their core business applications.
Jake Mannix, Lead Data Engineer, Lucidworks
Living in the intersection of search, recommender-systems, and applied machine learning, with an eye for horizontal scalability and distributed systems. Currently Lead Data Engineer in the Office of the CTO at Lucidworks, doing research and development of data-driven applications on Lucene/Solr and Spark.
Previously built out LinkedIn’s search engine, and was a founding member of the Recommender Systems team there and after that, at Twitter, built their user/account search system and lead that team before creating the Personalization and Interest Modeling team, focused on text classification and graph-based authority and influence propagation.
Apache Mahout committer, PMC Member (and former PMC Chair).
In a past life, studied algebraic topology and particle cosmology.
Smarter Search With Spark-Solr: Search gets smarter when you know more about your documents and their relationship to each other (think: PageRank) and the users (i.e. popularity), in addition to what you already know about their content (text search). It also gets smarter when you know more about your users (personalization) and both their affinity for certain kinds of content and their similarities to each other (collaborative filtering recommenders).
Building all of these pieces typically requires a big mix of batch workloads to do log processing, as well as training machine-learned models to use during realtime querying, and are highly domain specific, but many techniques are fairly universal: we will discuss how Spark can interface with a Solr Cloud cluster to efficiently perform many of the pieces to this puzzle in one relatively self-contained package (no HDFS/S3, all data stored in Solr!), and introduce “spark-solr” – an open-source JVM library to facilitate this.
Franziska Bell, Data Science Manager, Uber Technologies
Franziska Bell is lead data scientist of the Intelligent Decision Systems team at Uber, which focuses on developing new models for real-time outage and outlier detection. Since joining Uber in late 2014, these models have broken new ground in detection accuracy and speed whilst being sufficiently computationally tractable to be applied to 100,000s of time series in real-time.
Before Uber, Franziska was a Postdoc at Caltech where she developed a novel, highly accurate approximate quantum molecular dynamics theory to calculate chemical reactions for large, complex systems, such as enzymes. Franziska earned her Ph.D. in theoretical chemistry from UC Berkeley focusing on developing highly accurate, yet computationally efficient approaches which helped unravel the mechanism of non-silicon-based solar cells and properties of organic conductors.
Towards 99.99% Availability via Intelligent Real-time Monitoring: The Intelligent Real-time Monitoring team at Uber focuses on developing novel time series models for real-time outage and outlier detection. These models have broken new ground in detection accuracy and speed whilst being sufficiently computationally tractable to be applied to 100,000s of time series in real-time. This talk will give an overview of prerequisites, challenges and approaches to intelligent real-time monitoring.
Evan Estola, Lead Machine Learning Engineer, Meetup
Evan is a Lead Machine Learning Engineer working on the Data Team at Meetup. Combining product design, machine learning research and software engineering, Evan builds systems that help Meetup’s members find the best thing in the world: real local community. Before Meetup, Evan worked on hotel recommendations at Orbitz Worldwide, and he began his career in the Information Retrieval Lab at the Illinois Institute of Technology.
When Recommendations Systems Go Bad: Machine learning and recommendations systems have changed the way we interact with not just the internet, but some of the basic products and services that we use to run our lives.
While the reach and impact of big data and algorithms will continue to grow, how do we ensure that people are treated justly? Certainly there are already algorithms in use that determine if someone will receive a job interview or be accepted into a school. Misuse of data in many of these cases could have serious public relations, legal, and ethical consequences.
As the people that build these systems, we have a social responsibility to consider their effect on humanity, and we should do whatever we can to prevent these models from perpetuating some of the prejudice and bias that exist in our society today.
In this talk I intend to cover some examples of recommendation systems that have gone wrong across various industries, as well as why they went wrong and what can be done about it. The first step towards solving this larger issue is raising awareness, but there are concrete technical approaches that can be employed as well. Three that will be covered are:
- Accepting simplicity with interpretable models.
- Data segregation via ensemble modelling.
- Designing test data sets for capturing unintended bias.
Amanda Casari, Senior Data Scientist, Concur Technologies
Amanda Casari is a Senior Data Scientist on the Data Science Team @ Concur, where builds products for the Perfect Trip. She randomly walked her way into data science + computational storytelling through control systems engineering, operations research analysis, technology consulting and studying complex systems. Amanda actively speaks + works with several organizations to foster an inclusive community in the Seattle data science community, including: Ada Academy, Seattle Spark Meetup, PyLadies and Women Who Code.
Scaling Global Data Science Products, Not Teams: Congratulations! Your data science feature works! Your metrics are outstanding. Your data scientists and engineers have created useful products for your customers with proven results. Now your product and marketing teams are ready to move into new markets. Your underlying population changes. The skewed statistics of your data shifts depending on the data center you analyze. Your product is now localized and your NLP methods must adjust for a greater range of languages. Your requirements have grown. Your team has not.
How do you scale success for global products in multiple data centers with small teams?
The Data Science team at Concur’s work to grow our products into international markets has not required a global scaling of resources. This talk will share our lessons learned in creating modular, reusable data science products deployable to international, segregated data centers.
Sam Steingold, Lead Data Scientist, Magnetic Media Online
Sam Steingold has been doing data science since before it got that swanky name. He is the lead data scientist at Magnetic Media Online and holds a PhD in Math from UCLA. He contributed to various open source projects (e.g., GNU Emacs, CLISP, Vowpal Wabbit).
An Information Theoretic Metric for Multi-Class Categorization: The most common metrics used to evaluate a classifier are accuracy, recall and precision, and $F_1$-score. These metrics are widely used in machine learning, information retrieval, and text analysis (e.g., text categorization). Each of these metrics is imperfect in some way (captures only one aspect of predictor performance and can be fooled by a weird data set). None of them can be used to compare predictors across different datasets. In this paper we present an information-theoretic performance metric which does not suffer from the aforementioned flaws and can be used in both classification (binary and multi-class) and categorization (each example can be placed in several categories) settings. The code to compute the metric is available under the Apache open-source license.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai
Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc.
Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from UC Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Avi Pfeffer, Principal Scientist, Charles River Analytics
Dr. Avi Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Avi has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational learning, which provides the ability to combine probabilistic and relational reasoning. He is the lead developer of Charles River Analytics’ Figaro probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning the beliefs, preferences, and decision making strategies of people in strategic situations. Avi received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.
Practical Probabilistic Programming with Figaro: Probabilistic reasoning enables you to predict the future, infer the past, and learn from experience. Probabilistic programming enables users to build and reason with a wide variety of probabilistic models without machine learning expertise. In this talk, I will present Figaro, a mature probabilistic programming system with many applications. I will describe the main design principles of the language and show example applications. I will also discuss our current efforts to fully automate and optimize the inference process.
Kristian Kersting, Associate Professor for Computer Science, TU Dortmund University, Germany
Kristian Kersting is an Associate Professor for Computer Science at the TU Dortmund University, Germany. He received his PhD from the University of Freiburg, Germany, in 2006. After a PostDoc at MIT, he moved to the Fraunhofer IAIS and the University of Bonn using a Fraunhofer ATTRACT Fellowship. His main research interests are data mining, machine learning, and statistical relational AI, with applications to medicine, plant phenotpying, traffic, and collective attention. Kristian has published over 130 technical papers, and his work has been recognized by several awards, including the ECCAI Dissertation Award for the best AI dissertation in Europe.
He gave several tutorials at top venues and serves regularly on the PC (often at the senior level) of the top machine learning, data mining, and AI venues. Kristian co-founded the international workshop series on Statistical Relational AI and co-chaired ECML PKDD 2013, the premier European venue for Machine Learning and Data Mining, as well as the Best Paper Award Committee of ACM KDD 2015. Currently, he is an action editor of DAMI, MLJ, AIJ, and JAIR as well as the editor of JAIR’s special track on Deep Learning, Knowledge Representation, and Reasoning.
Declarative Programming for Statistical ML: The democratization of complex data does not mean dropping the data on everyone’s desk and saying, “good luck”! It means to make machine learning methods usable in such a way that people can easily instruct machines to have a “look” at complex data and help them to understand and act on it. Existing statistical relational learning and probabilistic programming languages only provide partial solutions; most of them do not support convex optimization commonly used in machine learning.
In this talk I will present RELOOP, a declarative mathematical programming language embedded into Python. It allows the user to specify mathematical programs before she knows what individuals are in the domain and, therefore, before she knows what variables and constraints exist. It facilitates the formulation of abstract, general knowledge. And, it reveals the rich logical structure underlying many machine learning problems to the solver and, turn, may make it go faster.
With this, people can start to rapidly develop statistical machine learning approaches for complex data. For instance, adding just three lines of RELOOP code makes a linear support vector machines aware of any underlying network that connects the objects to be classified.
Joint work with Martin Mladenov and many others.
Jason Baldridge, Associate Professor of Computational Linguistics, University of Texas at Austin
Jason is co-founder and Chief Scientist of People Pattern and Associate Professor of Computational Linguistics at the University of Texas at Austin. As a professor, Jason works on probabilistic models for categorization and syntax, with a particular emphasis on low-resource languages. He also focuses on methods and applications for connecting linguistic objects to geography and time. He has been active in the creation and promotion of open source software for natural language processing: he is one of the co-creators of the Apache OpenNLP Toolkit, and he has contributed many others, including ScalaNLP, Junto, TextGrounder and OpenCCG. Jason received his Ph.D. from the University of Edinburgh in 2002, where his doctoral dissertation was awarded the 2003 Beth Dissertation Prize from the European Association for Logic, Language and Information. His main academic research interests include categorial grammars, parsing, semi-supervised learning, co-reference resolution and geo-referencing.
Disambiguating Explicit and Implicit Geographic References in Natural Language: When people speak, both they and their utterances are situated in place and time. Our utterances reflect where we are from, where we are right now, and where we are talking about—among many other things, including personality, social status, and the topics under discussion. As hearers, we naturally incorporate locational awareness into our understanding of what we are told. That is to say, we geographically ground the meaning of natural language utterances. In this talk, I will discuss both toponym resolution (e.g., identifying which Springfield is intended in a passage) and general text geolocation—deriving geographical gists from free-running text that perhaps has no explicit mentions of places. I’ll cover supervised and semi-supervised methods for solving these tasks. I’ll also briefly discuss how this work might generalize the now commonplace treatment of words-as-vectors into computational models of word meaning that use multi-dimensional representations over words, geography, time, and images.
Florian Tramèr, Researcher, EPFL
Florian Tramèr is a research assistant at EPFL in Switzerland, hosted by Prof. J-P. Hubaux. He received his Masters in Computer Science from EPFL in 2015 and will be joining Stanford University as a PhD student this Fall. During his Master Thesis, Florian collaborated with researchers at EPFL, Columbia University and Cornell Tech to design and implement FairTest, a testing toolkit to discover unfair and unwarranted behaviours in modern data-driven algorithms. Florian’s interests lie primarily in Cryptography and Security, with recent work devoted to the study of security and privacy in the areas of genomics, ride-hailing services, and Machine-Learning-as-a-Service platforms.
Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit: In today’s data-driven world, programmers routinely incorporate user data into complex algorithms, heuristics, and application pipelines. While often beneficial, this practice can have unintended and detrimental consequences, such as the discriminatory effects identified in Staples’ online pricing algorithm and the racially offensive labels recently found in Google’s image tagger.
We argue that such effects are bugs that should be tested for and debugged in a manner similar to functionality, performance, and security bugs. We describe FairTest, a testing toolkit that detects unwarranted associations between an algorithm’s outputs (e.g., prices or labels) and user subpopulations, including protected groups (e.g., defined by race or gender). FairTest reports any statistically significant associations to programmers as potential bugs, ranked by their strength and likelihood of being unintentional, rather than necessary effects.
We designed FairTest for ease of use by programmers and integrated it into the evaluation framework of SciPy, a popular library for data analytics. We used FairTest experimentally to identify unfair disparate impact, offensive labeling, and disparate rates of algorithmic error in six applications and datasets. As examples, our results reveal subtle biases against older populations in the distribution of error in a real predictive health application, and offensive racial labeling in an image tagger.
Suxin Guo, Researcher, eBay
Suxin Guo received her Ph.D. degree in Computer Science from the State University of New York at Buffalo in 2015. She is an applied researcher in the Traffic Science team at eBay Inc. Her research interests include machine learning, data mining and information retrieval, with a focus on paid internet marketing.
Iron: Keyword Grouping Model for Text Ads Bidding in Paid Search: We present a keyword grouping model named Iron that helps determine keyword bidding prices for text ads auctions in Google paid search. In the grouping process, we first train a Gradient Boosting Machine to predict the conversion rate of every keyword based on eBay search log data, paid search performance and Google feedback for the keywords, then calculate the expected value of each keyword based on the predicted conversion rate and the price, and finally group keywords according to the expected value. After that, we set the bids of keywords based on group-level data. We demonstrate that the Iron model improves our incremental revenue greatly compared with the previous keyword grouping model, which groups keywords solely based on keyword category.