jenniferthumbnaik

Jennifer Marsman, Principal Developer Evangelist, Microsoft

Jennifer Marsman is a Principal Developer Evangelist in Microsoft’s Developer and Platform Evangelism group, where she educates developers on Microsoft’s new technologies. In this role, Jennifer is a frequent speaker at software development conferences across the United States. In 2009, Jennifer was chosen as “Techie whose innovation will have the biggest impact” by X-OLOGY for her work with GiveCamps, a weekend-long event where developers code for charity. She has also received many honors from Microsoft, including the Central Region Top Contributor Award, Heartland District Top Contributor Award, DPE Community Evangelist Award, CPE Champion Award, MSUS Diversity & Inclusion Award, and Gold Club. Prior to becoming a Developer Evangelist, Jennifer was a software developer in Microsoft’s Natural Interactive Services division. In this role, she earned two patents for her work in search and data mining algorithms. Jennifer has also held positions with Ford Motor Company, National Instruments, and Soar Technology. Jennifer holds a Bachelor’s Degree in Computer Engineering and Master’s Degree in Computer Science and Engineering from the University of Michigan in Ann Arbor. Her graduate work specialized in artificial intelligence and computational theory.

Abstract summary

Fun with Mind Reading: Today, we have the technology to “read minds” (well, EEG waves!). Using a headset from Emotiv, I’ve captured the big data stream of EEG coming out of our heads. In this session, I will show the detection of facial expressions and emotional state, and training a machine learning model to employ mental commands (making a virtual object move via thought). Then, I will walk through and share my results on a “lie detector” experiment comparing brain waves when telling the truth and lying. I have built classifiers based on the EEG data using Azure Machine Learning to predict whether a subject is telling the truth, which I will demonstrate. The effectiveness of multiple classifiers can be easily compared. This session will be a fun look inside your brain waves along with demonstrations of data processing and predictive analytics. Attendees will gain exposure to the Emotiv EPOC headset and Azure Machine Learning.

View the slides for this presentation »

Watch this presentation on YouTube »

MajaKabiljo

Maja Kabiljo, Software Engineer, Facebook Inc.

Maja is software engineer at Facebook, working on Data Infrastructure for iterative and graph processing team. She is PMC member of Apache Giraph, open source project which Facebook uses for doing large scale analytics. She makes Giraph scale to Facebook data sizes, and develops highly performant algorithms on top of it, with current focus on collaborative filtering.

Abstract Summary

Large-scale item recommendations with Apache Giraph – This is a joint work with Aleksandar Ilic, Facebook Inc.: Recommendation systems try to make personalized item recommendations to users based on available historical information. One of the well-known recommendation techniques is Collaborative Filtering – which is often solved with matrix factorization of a sparse user-item matrix of known ratings. In this talk, we will describe our scalable implementation of SGD and ALS methods for Collaborative Filtering on top of Apache Giraph (an iterative graph processing system built for high scalability on big data).

In order to scale our implementation to over a billion users and tens of millions of items, we developed novel methods for distributing the problem and added several extensions to the Giraph framework. Experiments show that our implementation is up to 10x faster than some of the leading open source implementations in this domain (e.g. Spark MLlib) on the Amazon benchmark data while maintaining the same output quality.

We will describe several additional techniques for handling Facebook’s data (e.g. implicit and skewed item data, different offline metrics) that are required in page and group recommendations. To complete our comprehensive approach for computing recommendations at Facebook, we also implemented an efficient method for finding top-k recommendations per user and item-based recommendations with pairwise item similarities that is easily extendable with different formulas.

View the slides for this presentation »

Watch this presentation on YouTube »

cezhang_

Ce Zhang, Postdoctoral Researcher, Stanford University

Ce is a postdoctoral researcher in the Department of Computer Science at Stanford University. He is working with Christopher Ré on problems related to data management and database systems. Ce is especially interested in understanding how to build a general platform for the next generation of analytics tasks that are probabilistic, Web-scale, and declarative to developers. With indispensable help from many collaborators, his PhD work produced the system DeepDive, a trained data system for automatic knowledge-base construction. As part of his PhD thesis, the joint work on feature selection won the 2014 SIGMOD Best Paper Award; PaleoDeepDive, a machine-reading system for paleontologists, was featured in the Nature magazine; and he was also a member of the Stanford team that produced the top-performing machine-reading system for TAC-KBP 2014 slot-filling evaluations using DeepDive. Ce obtained his PhD from the University of Wisconsin-Madison, advised by Christopher Ré, and his bachelor of science degree from Peking University, advised by Bin Cui.

Abstract summary

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5x throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

View the slides for this presentation »

Watch this presentation on YouTube »

AllisonGilmore_

Allison Gilmore, Data Scientist, Ayasdi

Dr. Gilmore is currently a data scientist on the team at Ayasdi where she specializes in highly complex and dimensional data across a variety of industries. Prior to joining Ayasdi, Allison served as a National Science Foundation Post-Doctoral Fellow and an Assistant Adjunct Professor in mathematics at the University of California Los Angeles. Dr. Gilmore also did post-doctoral research at Princeton University. She received her Ph.D. in mathematics from Columbia University in New York in May 2011.

Allison completed her undergraduate and masters degrees from Washington University where she was selected as a Rhodes Scholar. She studied at Green College, Oxford University, and graduated in 2006 with an M.Phil. (with distinction) in sociology.

Her research interests include topology, geometry, network analysis and social movements. Dr. Gilmore serves on the board of The Friends of the Mandela Rhodes Foundation whose mission is to fund the development of exceptional leadership capacity in southern Africa.

Abstract summary

Topological Learning with Ayasdi: Ayasdi has a unique approach to machine learning and data analysis using topology. This framework represents a revolutionary way to look at and understand data that is orthogonal but complementary to traditional machine learning and statistical tools. In this presentation I will show you what is meant by this statement: How does topology help with data analysis? Why would you use topology? I will illustrate with both synthetic examples and problems we’ve solved for our clients.

View the slides for this presentation »

Watch this presentation on YouTube »

Jorge Silva150

Jorge Silva, Sr. Research Statistician Developer, SAS

Jorge Silva received his PhD in Electrical and Computer Engineering from Instituto Superior Técnico (IST), Lisbon, Portugal, in 2007. From 2007 to 2012 he was a research scientist at Duke University, where he applied statistical models to large-scale problems, e.g., unsupervised learning, analysis of multi-modal data, recommender systems and social networks. Since 2012 he is a Senior Research Statistician Developer at SAS, where he develops high-performance distributed algorithms for Enterprise Miner. He has filed two US patents and authored numerous articles in scholarly journals.

Abstract summary

Estimating the Number of Clusters in Big Data with the Aligned Box Criterion: Finding the number, k, of clusters in a dataset is a fundamental problem in unsupervised learning. It is also an important business problem, e.g. in market segmentation. Existing approaches include the silhouette measure, the gap statistic and Dirichlet process clustering. For thirty years SAS procedures have included the option of using the cubic clustering criterion (CCC) to estimate k. While CCC remains competitive, we propose a significant and original improvement, referred to herein as the aligned box criterion (ABC). Like CCC, ABC is based on a hypothesis-testing framework, but instead of a heuristic measure we use data-adaptive reference distributions to generate more realistic null hypotheses in a scalable and easily parallelizable manner. We have implemented ABC using SAS’ High Performance Analytics platform, and achieve state-of-the-art accuracy in the estimation of k.

View the slides for this presentation »

Watch this presentation on YouTube »

dalelittleatl

Dale Smith, Data Scientist, Nexidia

Abstract summary

Tensor Decompositions and Machine Learning: We know about vectors and matrices (linear transformations) from Linear Algebra. But tensors are not so familiar. Think of a hypercube in your data warehouse – can you do a tensor decomposition into lower-rank objects that reveal hidden features or hierarchies?

View the slides for this presentation »

Watch this presentation on YouTube »


unnamed

Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT Austin

Working as a Data Scientist at TACC, a Supercomputer center at UT Austin. I work on machine learning algorithms for predictive modeling, sentiment analysis, recommender systems, statistical modeling, natural language processing, text mining of unstructured big data. I have worked as a Data Architect for 3 years for the state of Utah. I have done my M.S in Computer Science from University of Utah. My research interests are in machine learning, NLP, information retrieval, and database systems.

Abstract summary

Building a Recommender System for Publications using Vector Space Model and Python:In recent years, it has become very common that we have access to large number of publications on similar or related topics. Recommendation systems for publications are needed to locate appropriate published articles from a large number of publications on the same topic or on similar topics. In this talk, I will describe a recommender system framework for PubMed articles. PubMed is a free search engine that primarily accesses the MEDLINE database of references and abstracts on life-sciences and biomedical topics. The proposed recommender system produces two types of recommendations – i) content-based recommendation and (ii) recommendations based on similarities with other users’ search profiles. The first type of recommendation, viz., content-based recommendation, can efficiently search for material that is similar in context or topic to the input publication. The second mechanism generates recommendations using the search history of users whose search profiles match the current user. The content-based recommendation system uses a Vector Space model in ranking PubMed articles based on the similarity of content items. To implement the second recommendation mechanism, we use python libraries and frameworks. For the second method, we find the profile similarity of users, and recommend additional publications based on the history of the most similar user. In the talk I will present the background and motivation for these recommendation systems, and discuss the implementations of this PubMed recommendation system with example.

This talk will cover, via live demo & code walk-through, the key lessons we’ve learned while building such real-world software systems over the past few years. We’ll incrementally build a hybrid machine learned model for fraud detection, combining features from natural language processing, topic modeling, time series analysis, link analysis, heuristic rules & anomaly detection. We’ll be looking for fraud signals in public email datasets, using Python & popular open-source libraries for data science and Apache Spark as the compute engine for scalable parallel processing.

View the slides for this presentation »

Watch this presentation on YouTube »

pedrosmall

Pedro Domingos, Professor, University of Washington

Pedro Domingos is a professor of computer science at the University of Washington in Seattle. He is a winner of the SIGKDD Innovation Award, the highest honor in data science. He is a Fellow of the Association for the Advancement of Artificial Intelligence, and has received a Fulbright Scholarship, a Sloan Fellowship, the National Science Foundation’s CAREER Award, and numerous best paper awards. He received his Ph.D. from the University of California at Irvine and is the author or co-author of over 200 technical publications. He has held visiting positions at Stanford, Carnegie Mellon, and MIT. He co-founded the International Machine Learning Society in 2001. His research spans a wide variety of topics in machine learning, artificial intelligence, and data science, including scaling learning algorithms to big data, maximizing word of mouth in social networks, unifying logic and probability, and deep learning.

Abstract summary

The Five Tribes of Machine Learning, and What You Can Take from Each: There are five main schools of thought in machine learning, and each has its own master algorithm – a general-purpose learner that can in principle be applied to any domain. The symbolists have inverse deduction, the connectionists have backpropagation, the evolutionaries have genetic programming, the Bayesians have probabilistic inference, and the analogizers have support vector machines. What we really need, however, is a single algorithm combining the key features of all of them. In this talk I will describe my work toward this goal, including in particular Markov logic networks, and speculate on the new applications that such a universal learner will enable, and how society will change as a result.

View the slides for this presentation »

Watch this presentation on YouTube »

5ff8db5f8765b5b37e420190511469dc

David Talby, SVP Engineering, Atigeo

David Talby is Atigeo’s senior vice president of engineering, leading the R&D, product management and operations teams. David has extensive experience in building & operating web-scale analytics and business platforms, as well as building world-class, agile, distributed teams. Previously he was with Microsoft’s Bing group where he led business operations for Bing Shopping in the US and Europe, and earlier he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams which helped scale Amazon’s financial systems. David holds a PhD in Computer Science along with two masters degrees, in Computer Science and Business Administration.

Abstract summary

Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch, is relatively rare (one in millions for finance or ecommerce, for example), and it may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.

This talk will cover, via live demo & code walk-through, the key lessons we’ve learned while building such real-world software systems over the past few years. We’ll incrementally build a hybrid machine learned model for fraud detection, combining features from natural language processing, topic modeling, time series analysis, link analysis, heuristic rules & anomaly detection. We’ll be looking for fraud signals in public email datasets, using Python & popular open-source libraries for data science and Apache Spark as the compute engine for scalable parallel processing.

View the slides for this presentation »

Watch this presentation on YouTube »

teddunning

Ted Dunning, Chief Application Architect, MapR

Ted Dunning is Chief Application Architect at MapR and has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the world’s most advanced identity theft detection system, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 24 issued and numerous pending patents and contributes to Apache Mahout, Zookeeper and Drill™. He is also a mentor for Apache Spark, Storm, DataFu and Stratosphere.

Abstract summary

Complement Deep Learning with Cheap Learning: Recent results of deep learning on hard problems has set the data world all a titter and made deep learning the fashion of the time.

But it is very important to remember that as data expands, the learning problems that are encountered are often nearly green field problems and it is often possible to solve these problems using remarkably simple techniques. Indeed, on many problems these simple techniques will give results as good as more complex ones, not because they are profound, but because many problems become simpler at scale.

That said, it isn’t always obvious how to do this. I will describe some of these techniques and show how they can be applied in practice.

View the slides for this presentation »

Watch this presentation on YouTube »

unnamed

Hank Roark, Data Scientist, H2O

Hank is a Data Scientist & Hacker at H2O. Hank comes to H2O with a background turning data into products and system solutions and loves helping others find value in their data. He has a deep background in the the application domains of telematics, remote sensing, logistics, manufacturing, agriculture, and the Internet of Things. Before becoming focused on machine intelligence, Hank led international software product teams and worked as IT consultant. Hank has an SM from MIT in Engineering and Management and BS Physics from Georgia Tech.

Abstract summary

The Internet of Things is about data, not things: Some forecasts that by 2018 the number of connect things will exceed the combined number of personal computers, smartphones, and tablets.  Each ’thing’ can produce a tremendous stream of data from sensors and other sources.   This presentation will discuss progress, examples, challenges, and opportunities with machine learning for the IoT.  A short presentation will be done on some recent applications of ML (using H2O) to the domains of machine prognostics / health management (PHM) and agriculture.

View the slides for this presentation »

Watch this presentation on YouTube »

JorgeCastañón

Jorge A. Castañón, Data Scientist, IBM

Jorge Castañón hails from Mexico City and received his Ph.D. in Computational and Applied Mathematics from Rice University. He has a genuine passion for data science and machine learning applications of any kind, especially imaging problems. Since 2007, he is been developing numerical optimization models and algorithms for regularization and inverse problems. At IBM, Jorge joined the Big Data Analytics team at Silicon Valley Laboratory where he is building the future of machine learning and text analytics tools using Apache Spark and Hadoop.

Abstract Summary

What is RedRock? Data + ML + Design: What happens when you take IBM Design Thinking, 2 data scientists, 3 designers and 3 developers in a time frame of 10 days? RedRock, a beautiful and easy to use app that can process, analyze and visualize terabytes of data in seconds with Apache Spark. Come hear the story about how we built RedRock and the details about the machine learning algorithms we used.

View the slides for this presentation »

Watch this presentation on YouTube »

RajiBalasuubramaniyan

Raji Balasuubramaniyan, Senior Data Scientist, Manheim

Dr.Raji Balasuubramaniyan working as a senior data scientist at Manheim, Atlanta. Previously she worked for Center for Disease Control and Prevention as a Bioinformatics scientist contractor and did research on Influenza viruses. She received her PhD in Bioinformatics from Max-Planck Institute for terrestrial Microbiology, Marburg Germany. Her interest is on wide variety of topics in machine learning, bioinformatics, deep learning and data science.

Abstract summary

Leveraging Machine Learning Techniques for Vehicle Auction Industry: Online shopping has grown in popularity over the years. Nowadays many shoppers turn to online shopping sites for shopping. By recommending those content that is relevant to the online shoppers we are minimizing the time they spent online and maximizing the business success of online shopping sites. Many online sites use recommendation systems nowadays and they leverage content based and or context based collaborative filtering machine learning techniques for this purpose. We have leveraged the power of few machine-learning techniques like collaborative filtering, neural networks, Bayesian learning for relevant content vehicle recommendation and time series forecasting for vehicle auction at Manheim. My talk will focus on some of these techniques and their uses on relevant content recommendation.

View the slides for this presentation »

Watch this presentation on YouTube »

jasonhuang_

Jason Huang, Solutions Engineer, Qubole

Jason is a Senior Solutions Architect at Qubole, where he works with prospects and customers to perform massive ad hoc queries in the public cloud. Prior to joining Qubole, Jason served as a technical architect for enterprise solutions spanning centralized logging, business intelligence, data analytics, private cloud infrastructure, distributed systems and computational grids within financial services and Fortune 500 organizations. He was previously a Senior Solutions Consultant with TIBCO and DataSynapse.

Jason holds an AB in Computer Science from Brown University.

Abstract summary

Sparking Data in the Cloud: Data isn’t useful until it’s used to drive decision-making. Companies, like Pinterest, are using Machine Learning to build data-driven recommendation engines and perform advanced cluster analysis. In this talk, Jason Huang will cover best practices for running Spark in the cloud, common challenges in iterative design and interactive analysis.

View the slides for this presentation »

Watch this presentation on YouTube »

svenkreiss

Sven Kreiss, Lead Data Scientist, Wildcard

Sven Kreiss (PhD) is the Lead Data Scientist at Wildcard. He is in charge of the company’s data science efforts including structured content extraction from unstructured websites. He is architecting Wildcard’s data analysis tools and machine learning models. Sven’s background is in model building and large-scale statistical inference for Particle Physics where he holds a PhD from NYU. His work is incorporated in CERN’s analysis tool Root and its statistics extension RooStats.

Abstract summary

Deep ML Architecture at Wildcard: At Wildcard we think about technologies for a future native mobile web experience through cards. Cards are a new UI paradigm for content on mobile for which we schematize unstructured web content. Part of the challenge is to develop an understanding of online content through machine learning algorithms. The extracted information is used to create cards that are surfaced in the Wildcard iOS app and in other card ecosystems. I will describe the challenge and the way we structure the problem of content extraction with a deep architecture of classification and optimization algorithms that combines traditionally factorized problems of content extraction which allows the various stages to inform each other. The talk will include an overview of the used data, features and our training strategy with a partly human-powered labeling system. This ML system, called sic, is used in production and I will show our approach to using only fast or a mix of fast and slow features depending on the use case in the app.

View the slides for this presentation »

Watch this presentation on YouTube »

Sponsors

Platinum:

ibm

Gold:

Mapr logo

H2O AI logo

qudoble

insightpool

sas

Silver:

logicblox

Media:

basicbooks

crcpress

mitpress