Misha Bilenko, Principal Researcher, Microsoft
Misha Bilenko leads the Machine Learning Algorithms team in Cloud+Enterprise division of Microsoft, working on a broad range of ML algorithms and systems for Azure Machine Learning and other product groups. Before that, he spent seven years in the Machine Learning Group in Microsoft Research, which he joined after receiving his Ph.D. in Computer Science from the University of Texas at Austin. He co-edited “Scaling Up Machine Learning” published by Cambridge U. Press, and his work has received best paper awards from KDD and SIGIR. His research interests include parallel and distributed learning algorithms, accuracy debugging methods, and trainable representations and similarity functions.
Many Shades of Scale: Big Learning Beyond Big Data: In the machine learning research community, much of the attention devoted to ‘big data’ in recent years has been manifested as development of new algorithms and systems for distributed training on many examples. This focus has led to significant advances in the field, from basic but operational implementations on popular platforms to highly sophisticated prototypes in the literature. In the meantime, other aspects of scaling up learning have received relatively little attention, although they are often more pressing in practice. The talk will survey these less-studied facets of big learning: scaling to an extremely large number of features, to many components in predictive pipelines, and to multiple data scientists collaborating on shared experiments.
Xavier Amatriain, VP of Engineering, Quora
Xavier Amatriain is VP of Engineering at Quora, where he leads the team building the best source of knowledge in the Internet. With over 50 publications in different fields, Xavier is best known for his work on Machine Learning in general, and Recommender Systems in particular. Before Quora, he was Research/Engineering Director at Netflix, where he lead the team building the famous Netflix Recommendation algorithms. Previously, Xavier was also Research Scientist at Telefonica Research and Research Director at UCSB. He has also lectured at different universities both in the US and Spain and is frequently invited as a speaker at conferences and companies.
Machine learning applications for growing the world’s knowledge at Quora: At Quora our mission is to “share and grow the world’s knowledge”. We want to do this by getting the right questions to the right people to answer them, but also by getting the existing answers to people who are interested in them. In order to accomplish this we need to build a complex ecosystem where we value issues such as content quality, engagement, demand, interests, or reputation. It is not possible to build a system like this unless most of the process are highly automated and scalable. We are fortunate though to have lots of very good quality data on which to build machine learning solutions that can help address all of the previous requirements.
In this talk I will describe some interesting uses of machine learning at Quora that range from different recommendation approaches such as personalized ranking to classifiers built to detect duplicate questions or spam. I will describe some of the modeling and feature engineering approaches that go into building these systems. I will also share some of the challenges faced when building such a large-scale knowledge base of human-generated knowledge.
Ewa Dominowska, Engineering Manager, Facebook
Ewa Dominowska joined Facebook in spring of 2014 as an Engineering Manager focused on Science and Metrics for Online Advertising. Before coming to Facebook she designed a large scale predictive analytics platform for mobile devices as a Chief Architect at Medio Systems (acquired by Nokia). Prior to her start-up days, Ewa spent 10 years in various roles at Microsoft. At Microsoft, Ewa joined the Online Services Division to help found adCenter, the second largest online advertising platform in the US. Her work focused on real-time ad ranking, targeting, content analysis, click prediction, and pricing models. As part of the small yet dynamic original team, Ewa designed, architected, and built the alpha version of the contextual advertising product. In 2007, Ewa founded the Open Platform Research and Development team. As part of this effort, she organized the Beyond Search academic program, TROA WWW Workshop, and IRA SIGIR Workshop, resulting in a number of very successful collaborations between academia and industry. During her tenure in the Online Services Division, Ewa spent a year serving as the TA for Satya Nadella, where she advised and assisted in operation and planning for the division. The role encompassed architecture, technology, large-scale data services, and cross-organizational efficiency. Ewa was responsible for the intellectual property process, long-term strategy, and prioritization for the division. In 2010 Ewa started the adCenter Marketplace team responsible for all aspects of the advertising marketplace health and tuning. She architected and built a petabyte-scale distributed data and analytics platform and created a suite of marketplace and experimentation tools. Ewa earned her degrees in Electrical Engineering/Computer Science and Mathematics from MIT. Her research focused on machine learning, natural language processing, and predictive, context aware systems applied in the medical field. Ewa authored several papers and dozens of patents in the areas of online advertising, search, pricing models, predictive algorithms and user interaction.
Managing Machine Learning Projects in Industry: As the use of machine learning techniques to analyze and find value in ‘big data’ is being adopted more broadly by industry, we see an increasing need to build teams that can execute on large and complex projects. It is not possible for a single machine learning expert to cover problems of the scope and magnitude that are encountered. The scale of these projects requires teams of researchers and engineers to coordinate and collaborate to deliver impact. In this talk I will touch on some learnings and considerations when building or expanding such a team. I will cover building a group, framing the problem, finding a solution, and evaluating the results. I will illustrate the points with examples drawn from my experience in large companies and startups. I hope to provoke consideration and discussion for the challenges in this area, as well as to illustrate some of the complexities.
Josh Wills, Director of Data Science, Cloudera
Josh Wills is Cloudera’s Senior Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. He is the founder and VP of the Apache Crunch project for creating optimized MapReduce pipelines in Java and lead developer of Cloudera ML, a set of open-source libraries and command-line tools for building machine learning models on Hadoop. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+.
Brainwashed: Building an IDE for Feature Engineering: Feature engineering- writing code to map raw input data into a set of signals that will be fed into a machine learning algorithm- is the dark art of data science. Although the process of crafting new features is tedious and failure-prone, the key to a successful model is a diverse set of high-quality features that are informed by domain experts. Recently, academic researchers have begun to focus on the problem of feature engineering, and have started to publish research that addresses the relative lack of tools that are designed to support the feature engineering process. In this talk, I will review some of my favorite papers and present some efforts to convert these ideas into tools that leverage the principles of reactive application design in order to make feature engineering (dare I say it) fun.
Shiva Amiri, Chief Product Officer, RTDS Inc.
Shiva Amiri is the Chief Product Officer at Real Time Data Solutions Inc. (RTDS Inc.) where they are developing a unique and robust machine learning technology for the analysis and modelling of massive data.
Prior to RTDS Inc. she lead the Informatics and Analytics team at the Ontario Brain Institute where they developed a large-scale neuroinformatics platform called Brain-CODE, for the management, processing, and analytics of big data in neuroscience across the province of Ontario. Shiva is also the President and CEO of Modecular Inc., a Computational Biochemistry start-up company developing next generation drug screening methodologies.
She was also the Team Lead for UK’s Science and Innovation Network in Canada where she was facilitating research, innovation and commercialization between UK and Canada. Shiva completed her D.Phil. (Ph.D.) in Computational Biochemistry at the University of Oxford where she focused her work on computational studies of membrane proteins involved in neuronal diseases. Shiva is involved with several organisations including Let’s Talk Science and Shabeh Jomeh International.
Incorporating the Real Time Component into Analytics and Machine Learning: Many industries and organizations today want to harness the power of big data analytics and machine learning for its potential to improve margins, enhance discoveries, give insight into the business, and enable fast data driven decisions. The challenges include inability and/or difficulties in using available systems, not knowing where to start or which tools make sense for a particular problem, and dealing with data sets that are too big, too fast, or too complicated to handle with traditional systems.
RTDS Inc. has developed SymetryMLTM which are technologies for zero latency machine learning and analytics/exploration of very large datasets in real time, with a focus on speed, accuracy and simplicity. Our goal has been to cut the memory footprint required to learn large data sets, “reducer” functionality to automatically select the best attributes for model creation and build models on the fly. SymetryMLTM is also designed for easy integration into existing business processes via either an easy to use Web-UI or RESTful APIs.
This talk will explore some of the functionality of these systems including real time exploration of data, fast multi-variate model prototyping, and our use of GPUs and parallelization. An example of brain related data and the complexities of analytics will be discussed as well as a brief overview of other verticals we are exploring. Our work is geared towards making big data make sense in real time and enable users to gain insights faster than traditional methods.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
Tensor Methods: A New Paradigm for Training Probabilistic Models and Feature Learning: Tensors are rich structures for modeling complex higher order relationships in data rich domains such as social networks, computer vision, internet of things, and so on. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates of parameters for many probabilistic models such as topic models, community models, hidden Markov models, and so on. I will show the results of these methods for learning topics from text data, communities in social networks, disease hierarchies from healthcare records, cell types from mouse brain data, etc. I will also demonstrate how tensor methods can yield rich discriminative features for classification tasks and can serve as an alternative method for training neural networks.
Sergey A. Razin Ph.D., Chief Technology Officer, SIOS Technology
As CTO, Sergey is responsible for driving product strategy and innovation at SIOS Technology Corp. A noted authority in advanced analytics and machine learning, Sergey pioneered the application of these technologies in the areas of IT security, media, and speech recognition. He is currently leading the development of innovative solutions based on these technologies that enable simple, intelligent management of applications in complex virtual and cloud environments.
Prior to joining SIOS, Sergey was an architect for EMC storage products and EMC CTO office where he drove initiatives in areas of network protocols, cloud and storage management, metrics, and analytics. Sergey has also served as Principal Investigator (PI), leader in research, development and architecture in areas of big data analytics, speech recognition, telephony, and networking.
Sergey holds PhD in computer science from the Moscow State Scientific Center of Informatics. He also holds a BS in computer science from the University of South Carolina
Machine learning as the key ingredient for making “self-driving data center” a reality: Workloads are moving away from traditional physical servers toward virtual, and cloud environments that are quite complex with many layers and end-points that constantly grow. This presentation will discuss how principals of graph theory can be applied to operations data to analyze existing relationships and discover hidden inner relationships in complex virtual and cloud environments. It will also discuss how machine learning semi-supervised principals can be applied to address the need for a single, easy-to-use way to identify and resolve problems, explore infrastructure improvements, and tune the efficiency of operations in large in complex virtual and cloud environments ultimately delivering the vision of “self-driving data center”.
Carlos Guestrin, CEO of Dato Inc, Amazon Professor of Machine Learning at Washington University
Carlos Guestrin is the Amazon Professor of Machine Learning at theComputer Science & Engineering Department of the University of Washington. He is also a co-founder and CEO of Dato, Inc. (formerly GraphLab), focusing large-scale machine learning and graph analytics. His previous positions include the Finmeccanica Associate Professor at Carnegie Mellon University and senior researcher at the Intel Research Lab in Berkeley. Carlos received his PhD and Master from Stanford University, and a Mechatronics Engineer degree from the University of Sao Paulo, Brazil. Carlos’ work has been recognized by awards at a number of conferences and two journals: KDD 2007 and 2010, IPSN 2005 and 2006, VLDB 2004, NIPS 2003 and 2007, UAI 2005, ICML 2005, AISTATS 2010, JAIR in 2007 & 2012, and JWRPM in 2009. He is also a recipient of the ONR Young Investigator Award, NSF Career Award, Alfred P. Sloan Fellowship, IBM Faculty Fellowship, the Siebel Scholarship and the Stanford Centennial Teaching Assistant Award. Carlos was named one of the 2008 `Brilliant 10′ by Popular Science Magazine, received the IJCAI Computers and Thought Award and the Presidential Early Career Award for Scientists and Engineers (PECASE). He is a former member of the Information Sciences and Technology (ISAT) advisory group for DARPA.
Deploying Machine Learning in Production
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineering Group at Netflix
Ehtsham Elahi is a Senior Research Engineer in the Personalization Science and Engineering group at Netflix where his focus is on Personalized video ranking algorithms. His main research area is Probabilistic Graphical Models for modeling rich user behavior and their large scale implementations to use them in Netflix Personalization Algorithms. Prior to Netflix, Ehtsham worked as a Data Scientist at American Express, NYC and Change.org, San Francisco. He attended University of Michigan, Ann Arbor for his graduate studies focussing on machine learning and probabilistic modeling.
Spark and GraphX in the Netflix Recommender System: We at Netflix strive to deliver maximum enjoyment and entertainment to our millions of members across the world. We do so by having great content and by constantly innovating on our product. A key strategy to optimize both is to follow a data-driven method. Data allows us to find optimal approaches to applications such as content buying or our renowned personalization algorithms. But, in order to learn from this data, we need to be smart about the algorithms we use, how we apply them, and how we can scale them to our volume of data (over 50 million members and 5 billion hours streamed over three months). In this talk we describe how Spark and GraphX can be leveraged to address some of our scale challenges. In particular, we share insights and lessons learned on how to run large probabilistic clustering and graph diffusion algorithms on top of GraphX, making it possible to apply them at Netflix scale.
Xia Zhu, Research Scientist, Intel
As a research scientist at Intel Corporation, Xia (Ivy) Zhu works on graph analytics to provide users with end to end solution which includes but not limited to graph ETL, graph building and machine learning. Prior to joining Intel Labs in 2005, Ivy worked as senior scientist at Philips Research East Asia. She holds a Doctorate in Computer Science, and holds 13 patents.
Model-Based Machine Learning for Real-Time Brain Decoding: Neurofeedback derived from real-time functional magnetic resonance imaging (rtfMRI) is promising for both scientific applications, such as uncovering hidden brain networks that respond to stimulus, and clinical applications, such as helping people cope with brain disorders ranging from addiction to autism. One of the greatest challenges in applying machine learning to real time brain “decoding” is that traditional methods fit per-voxel parameters, leading to large computational problems on relatively small datasets. As such, it is easy to over-fit parameters to noise rather than the desired signals. Bayesian model-based hierarchical topographical factor analysis (HTFA) solves this problem by uncovering low-dimensional representations (latent factors) of brain images, fitting parameters for latent factors (rather than voxels) while removing the false assumption that all voxels are independent. In this talk, we’ll discuss the promise of using this and other model-based machine learning to better understand full-brain activity and functional connectivity. And we’ll show how Intel Labs and its partners are combining neuroscience and computer science expertise to further extend such algorithms for real-time brain decoding.
Mark Zangari, CEO, Quantellia
As Quantellia’s CEO, Mark Zangari led design and development of the company’s award-winning World Modeler platform for decision intelligence. He was formerly CTO for SPATIALinfo, Inc., where he launched the company’s US operations and growth into a leading provider of network modeling solutions for the telecommunications and utilities sectors. In addition to his management experience, Zangari is an accomplished software architect, designer, and developer.
During the early 1990s, Zangari was principal architect of the STATPLAY software system—a ground-breaking tool that improved the performance of decision makers when reasoning with statistical data.
Zangari holds degrees in physics, philosophy, and computer science. In 1993, he was awarded the British Council Postgraduate Bursary for his research, leading to an advanced physics fellowship at Cambridge University. He co-founded Quantellia with Lorien Pratt, and works out of its Silicon Valley and Denver offices.
Agency Theory: The existence of massive data sets in many arenas is creating new challenges. Most people know about the issue of spurious correlations that do not represent true cause-and-effect. However a second challenge is more insidious and costly: this is the expense – which can run into the billions of dollars – of managing data that does not lead to actionable and valuable outcomes for an organization. For this reason, organizations that can identify the 20% of data that represents 80% of value realize a substantial advantage.
In this talk, I introduce Agency Theory, which is a mathematical framework for analyzing decision models to solve this problem. Agency theory borrows key ideas from machine learning, to solve a different purpose: rather than finding a set of parameters that best fits a data set, the objective is to find a set of decisions that leads to the most favorable set of outcomes, along with the data that is most valuable in supporting those decisions. Just as many foundational aspects of machine learning can be understood using information theory, I’ll describe how entropy and related concepts underlie Agency, and how to use this approach to prioritize data management and improve decision making.
Ray Richardson, Chief Technology Officer at Simularity
Ray has 25 years experience in the software industry as a programmer and software architect. As CTO of Simularity, a predictive analytics software company, Ray has created Simularity’s High Performance Correlation Engine, Dynamic Classifier, and Time Series Analysis products. Prior to Simularity, Ray was a Senior Principal Technologist at Wind River, where he designed embedded operating systems and distributed processing systems. Ray has also worked on such varied systems as the Unix kernel, machine learning systems, intelligent distributed control systems, and large parallel data collection and processing systems. Ray holds a number of patents in the field of parallel processing and has several predictive analytics patents pending.
Practical Predictive Analytics on Time Series Data using SAX: The potential to do machine learning on the data generated by connected sensors is a key factor that is driving the spread of the Internet of Things. Predictive analytics on time series data can be used to anticipate adverse events, enable early-warning systems, improve outcomes, reduce costs, and increase efficiency. In this talk you’ll learn how to use Symbolic Aggregate approXimation (SAX) to determine normal behavior, recognize behavior that is anomalous, quantify it, and classify it based on known patterns.
In this presentation, Ray Richardson will walk you through the basics of SAX, and cover a predictive maintenance example in detail. One of the key advantages to using SAX is that it yields an explainable model. When the result of an analysis is designed to get someone to take an action, the importance of having an explainable model should not be underestimated. Key takeaways from this talk include:
- two methods to determine what normal is
- how to do time series classification
- how to predict time to failure
- open source SAX tools
Anthony Bak, Principal Data Scientist at Ayasdi
Anthony Bak is Principal Data Scientist at Ayasdi where he works on building Ayasdi’s machine learning framework and developing data analytics tools leveraging Topology. He also works on a variety of problems for Ayasdi customers. He has a Ph.D in Mathematics from the University of Pennsylvania and prior to Ayasdi did research at Stanford University, the American Institute for Mathematics and the Max Planck Institute.
Topology as Framework for Data Science: Ayasdi has a unique approach to machine learning and data analysis using topology. This framework represents a revolutionary way to look at and understand data that is orthogonal but complementary to traditional machine learning and statistical tools. In this presentation I will show you what is meant by this statement: How does topology help with data analysis? Why would you use topology? I will illustrate with both synthetic examples and problems we’ve solved for our clients.
Robert Moakler, Data Science Intern/Integral Ad Science
Robert is currently a third year Ph.D. student at the NYU Stern School of Business in the Information Systems group where he works on causal inference and machine learning problems with Professor Foster Provost. Simultaneously, he is a data science intern at Integral Ad Science where he explores the causal effects of online advertisements using large-scale observational data sets.
Efficient Measurement of Causal Impact in Digital Advertising Using Online Ad Viewability: Online display ads offer a level of granularity in observable metrics that is impossible to achieve for traditional, non-digital advertisers. However, as advertising budgets comprise an increasing amount of marketing spend, true return on investment (ROI) is increasingly important but often goes unmeasured. An important question to answer is how much incremental revenue was generated by an online campaign. In general, there are two common approaches to measuring the causal impact of a campaign: (1) a randomized experiment and (2) using observational data. The first technique is preferred due to its ability to give an unbiased estimate of a campaign’s effect, but is usually prohibitively costly. The second requires no additional ad spend, but is plagued by complex modeling choices and biases. Using a unique position in the online advertising pipeline to create a “natural experiment”, we propose a novel approach to measuring campaign effectiveness that utilizes detailed measurements of whether ads were actually viewed by a user. Treating users that have never been exposed to a viewable ad as a control group, we are able to mimic the setup of a randomized experiment without any additional cost while avoiding the biases that are typical when using observational data.
Joseph K. Bradley, Software Engineer, Databricks Inc.
Joseph is an Apache Spark committer and works at Databricks on MLlib, Spark’s Machine Learning library. Previously, he received his Ph.D. in Machine Learning from Carnegie Mellon University, where he worked on probabilistic graphical models and parallel sparse regression. He also researched peer grading systems and sparse models during postdoctoral study at the University of Washington and at the University of California, Berkeley.
Spark DataFrames and ML Pipelines: In this talk, we will discuss two recent efforts in Spark to scale up data science: distributed DataFrames and Machine Learning Pipelines. These components allow users to manipulate distributed datasets and handle complex ML workflows, using intuitive APIs in Python, Java, and Scala (and R in development).
Data frames in R and Python have become standards for data science, yet they do not work well with Big Data. Inspired by R and Pandas, Spark DataFrames provide concise, powerful interfaces for structured data manipulation. DataFrames support rich data types, a variety of data sources and storage systems, and state-of-the-art optimization via the Spark SQL Catalyst optimizer.
On top of DataFrames, we have built a new ML Pipeline API. ML workflows often involve a complex sequence of processing and learning stages, including data cleaning, feature extraction and transformation, training, and hyperparameter tuning. With most current tools for ML, it is difficult to set up practical pipelines. Inspired by scikit-learn, we built simple APIs to help users quickly assemble and tune practical ML pipelines.