Enjoy a full day conference this September and appreciate the tribute to the southern reputation of charm and elegance, that The Academy offers. During coffee breaks and meals, take in the old-world elegance and style of days gone by, in the heart of bustling midtown Atlanta.


Aran Khanna, Software Engineer, Amazon Web Services

Aran Khanna is a software engineer in the deep learning research team at Amazon Web Services, led by Professor Alex Smola. Aran is the technical lead for the development of the Apache MXNet framework for Mobile, IoT and Edge devices, working to allow for deployment and management of efficient deep network models across a broad set of devices outside of the data center, from Raspberry Pis to smartphones to NVIDIA Jetsons. Aran recently graduated from Harvard’s Computer Science department before joining the AWS team.

Abstract summary

High Performance Deep Learning on Edge Devices With Apache MXNet:
Deep network based models are marked by an asymmetry between the large amount of compute power needed to train a model, and the relatively small amount of compute power needed to deploy a trained model for inference. This is particularly true in computer vision tasks such as object detection or image classification, where millions of labeled images and large numbers of GPUs are needed to produce an accurate model that can be deployed for inference on low powered devices with a single CPU. The challenge when deploying vision models on these low powered devices though, is getting inference to run efficiently enough to allow for near real time processing of a video stream. Fortunately Apache MXNet provides the tools to solve this issues, allowing users to create highly performant models with tools like separable convolutions, quantized weights and sparsity exploitation as well as providing custom hardware kernels to ensure inference calculations are accelerated to the maximum amount allowed by the hardware the model is being deployed on. This is demonstrated though a state of the art MXNet based vision network running in near real time on a low powered Raspberry Pi device. We finally discuss how running inference at the edge as well as leveraging MXNet’s efficient modeling tools can be used to massively drive down compute costs for deploying deep networks in a production system at scale.


Le Song, Assistant Professor, College of Computing, Georgia Institute of Technology

Le Song is an assistant professor in the College of Computing, Georgia Institute of Technology. He received his Ph.D. in Machine Learning from University of Sydney and NICTA in 2008, and then conducted his post-doctoral research in the Department of Machine Learning, Carnegie Mellon University, between 2008 and 2011. Before he joined Georgia Institute of Technology, he was a research scientist at Google. His principal research direction is machine learning, especially nonlinear methods and probabilistic graphical models for large scale and complex problems, arising from artificial intelligence, social network analysis, healthcare analytics, and other interdisciplinary domains. He is the recipient of the NSF CAREER Award’14, AISTATS’16 Best Student Paper Award, IPDPS’15 Best Paper Award, NIPS’13 Outstanding Paper Award, and ICML’10 Best Paper Award. He has also served as the area chair for leading machine learning conferences such as ICML, NIPS and AISTATS, and action editor for JMLR.

Abstract summary

Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.

Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.

View the slides for this presentation »

Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology

Jacob Eisenstein is an Assistant Professor in the School of Interactive Computing at Georgia Tech. He works on statistical natural language processing, focusing on computational sociolinguistics, social media analysis, discourse, and machine learning. He is a recipient of the NSF CAREER Award, a member of the Air Force Office of Scientific Research (AFOSR) Young Investigator Program, and was a SICSA Distinguished Visiting Fellow at the University of Edinburgh. His work has also been supported by the National Institutes for Health, the National Endowment for the Humanities, and Google. Jacob was a Postdoctoral researcher at Carnegie Mellon and the University of Illinois. He completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award. Jacob’s research has been featured in the New York Times, National Public Radio, and the BBC. Thanks to his brief appearance in If These Knishes Could Talk, Jacob has a Bacon number of 2.

Abstract summary

Making Natural Language Processing Robust to Sociolinguistic Variation:
Natural language processing on social media text has the potential to aggregate facts and opinions from millions of people all over the world. However, language in social media is highly variable, making it more difficult to analyze that conventional news texts. Fortunately, this variation is not random; it is often linked to social properties of the author. I will describe two machine learning methods for exploiting social network structures to make natural language processing more robust to socially-linked variation. The key idea behind both methods is linguistic homophily: the tendency of socially linked individuals to use language in similar ways. This idea is captured using embeddings of node positions in social networks. By integrating node embeddings into neural networks for language analysis, we obtained customized language processing systems for individual writers — even for individuals for whom we have no labeled data. The first application shows how to apply this idea to the problem of tweet-level sentiment analysis. The second application targets the problem of linking spans of text to known entities in a knowledge base.

Jennifer Marsman, Principal Software Development Engineer, Microsoft

Jennifer Marsman is a Principal Software Development Engineer in Microsoft’s Developer and Platform Evangelism group, where she educates developers on Microsoft’s new technologies. In this role, Jennifer is a frequent speaker at software development conferences around the world. In 2016, Jennifer was recognized as one of the “top 100 most influential individuals in artificial intelligence and machine learning” by Onalytica. She has been featured in Bloomberg for her work using EEG and machine learning to perform lie detection. In 2009, Jennifer was chosen as “Techie whose innovation will have the biggest impact” by X-OLOGY for her work with GiveCamps, a weekend-long event where developers code for charity. She has also received many honors from Microsoft, including the “Best in Role” award for Technical Evangelism, Central Region Top Contributor Award, Heartland District Top Contributor Award, DPE Community Evangelist Award, CPE Champion Award, MSUS Diversity & Inclusion Award, Gold Club, and Platinum Club. Prior to becoming a Developer Evangelist, Jennifer was a software developer in Microsoft’s Natural Interactive Services division. In this role, she earned two patents for her work in search and data mining algorithms. Jennifer has also held positions with Ford Motor Company, National Instruments, and Soar Technology. Jennifer holds a Bachelor’s Degree in Computer Engineering and Master’s Degree in Computer Science and Engineering from the University of Michigan in Ann Arbor. Her graduate work specialized in artificial intelligence and computational theory. Jennifer blogs at http://blogs.msdn.com/jennifer and tweets at http://twitter.com/jennifermarsman.

Abstract summary

Anna Kiefer, Software Engineer, Kevala Analytics

Anna is a full stack software engineer and sustainability professional originally from Washington, DC. She currently works at Kevala Analytics. She has a penchant for developing software for social good, including energy, climate, and health applications. She is a graduate of New York University and lives in San Francisco, CA.

Abstract summary

Forecasting Local Epidemics of Dengue Fever in Latin America:
Dengue fever is a mosquito-borne disease that occurs in tropical and sub-tropical parts of the world, affecting as many as 400 million people yearly. Because dengue is carried by mosquitoes, the transmission of dengue is related to climate variables such as temperature and precipitation. Climate change is likely to produce distributional shifts that may cause an increase in the outbreaks of dengue fever and have significant public health implications worldwide. The increased risk of dengue augments the need for accurate models to predict the time, location, and severity of dengue outbreaks in Latin America. A predictive model to forecast dengue fever outbreaks in Latin America was built using training data from NOAA’s Global Historical Climatology Network temperature data, PERSIANN satellite precipitation measurements, NOAA’s NCEP climate forecast system reanalysis precipitation measurements, and NOAA’s satellite vegetation index for two cities prone to dengue, San Juan, PR and Iquitos, Peru. This talk discusses the machine learning algorithms used for greatest accuracy in predicting the dengue pandemic and emphasizes the important role machine learning can play in epidemiology and global health.

Tim Chartier, Chief Academic Officer, Tresata

Chief Researcher for Tresata and Professor of Mathematics and Computer Science at Davidson College Dr. Tim Chartier specializes in sports analytics. He frequently consults on data analytics questions, including projects with ESPN Magazine, ESPN’s Sport Science program, NASCAR teams, the NBA, and fantasy sports sites. In 2014, Tim was named the inaugural Math Ambassador for the Mathematical Association of America, which also recognized Dr. Chartier’s ability to communicate math with a national teaching award. His research and scholarship were recognized with the prestigious Alfred P. Sloan Research Fellowship. Published by Princeton University Press, Tim authored Math Bytes: Google Bombs, Chocolate-Covered Pi, and Other Cool Bits in Computing. Through the Teaching Company, he taught a 24-lecture series entitled Big Data: How Data Analytics Is Transforming the World. In K-12 education, Tim has also worked with Google and Pixar on their educational initiatives. Dr. Chartier has served as a resource for a variety of media inquiries, including appearances with Bloomberg TV, NPR, the CBS Evening News, USA Today, and The New York Times.

Abstract summary

Beyond a Bit Fit
An emerging and important avenue of sport analytics is biometric data. From the casual athlete tracking steps and sleep to professional athletes tracking heart rate and impact data, biometric data can improve performance and prevent injury. What can we learn from biometric data? How can it aid athletes and coaches? How can you be a bit fitter by analyzing a body’s data? This talk will discuss the data, analysis and insights available and evolving in sports analytics of biometric data.

Alexandra Johnson, Software Engineer, SigOpt

Alexandra works on everything from infrastructure to product features to blog posts. Previously, she worked on growth, APIs, and recommender systems at Polyvore (acquired by Yahoo). She majored in computer science at Carnegie Mellon University with a minor in discrete mathematics and logic, and during the summers she A/B tested recommendations at internships with Facebook and Rent the Runway.

Abstract summary

Best Practices for Hyperparameter Optimization:
All machine learning and artificial intelligence pipelines – from reinforcement agents to deep neural nets – have tunable hyperparameters. Optimizing these hyperparameters provides tremendous performance gains, but only if the optimization is done correctly. This presentation will discuss topics including selecting performance criteria, why you should always use cross validation, and choosing between state of the art optimization methods.

Robert Morris, CTO and Co-Founder, Predikto, Inc.

Robert Morris, Ph.D. is Co-founder and CTO of Predikto, Inc. He is also an award winning academic (formerly Associate Professor of Criminology (with tenure) at the University of Texas at Dallas). At UTD, he taught a variety of courses covering advanced data analytics and machine learning for the social sciences and for operations research. He has published over 50 peer-reviewed journal articles across many disciplines in outlets such as PLOS One, Journal of Quantitative Criminology, Justice Quarterly, Intelligence, etc.
Robert’s expertise lies in machine learning approaches for longitudinal processes to predict and explain human (criminal) behavior. However, he now applies this philosophy into Predikto’s patent pending automated machine learning platform, which has been successful predicting unplanned events across a range of different equipment classes within the IoT space, including: freight locomotives (electric and diesel), high-speed commuter trains, quay cranes, rail cars, commercial aircraft, datacenter HVAC, and more.

Abstract summary

Machine Learning for Predictive Maintenance: Insights on the Journey From a Good Model to Business Value
Prospective applications for machine-learning-driven predictive maintenance are replete throughout many industries. Data have become rich and appetites for insight have become commonplace. Yet most large corporations across industrial verticals are only beginning their transformational journey to improve operational efficiency using big-data and machine learning. Success has been limited in spite of well-performing models and a tremendous investment in human capital (i.e., data scientists). Each scenario shares the requirement of demonstrating a return on investment from deployed predictive maintenance solutions. However, to achieve this requires much more than a sound machine learning model algorithm and clever feature engineering. In fact, the larger hurdle is around user trust in the output, enabling change management, triaging results, accounting for feedback in response to predictive output, and much more. This presentation will discuss insights, approaches, experiences, and lessons learned from experience deploying machine learning solutions across several industrial verticals, including rail, aviation, and shipping where the end-goal is to demonstrate business value through providing insights around unplanned asset downtime before it occurs.

Venkatesh Ramanathan, Data Scientist, PayPal

Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition.

Abstract summary

Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud Prevention:
PayPal is at the forefront of applying large scale graph processing and machine learning algorithms to keep fraudsters at bay. In this talk, I’ll present how advanced graph processing and machine learning algorithms such as Deep Learning and Gradient Boosting are applied at PayPal for fraud prevention. I’ll elaborate on specific challenges in applying large scale graph processing & machine technique to payment fraud prevention. I’ll explain how we employ sophisticated machine learning tools – open source and in-house developed.
I will also present results from experiments conducted on a very large graph data set containing millions of edges and vertices.

Talha Obaid, Email Security, Symantec

Talha Obaid is an AntiSpam engineer for Email Security.cloud at Symantec, where he joined from MIT’s CENSAM research centre. In his current role, he is utilizing Data Science to fight Spam and Malware. He loves democratizing Machine Learning, whilst recently speaking at ‘Google ML Experts Day’ in 2017, and at ‘Google GDG DevFest’ in 2016. While onsite, Talha delivered several sessions about Machine Learning at Symantec. He was acknowledged scores of times at Symantec, procuring ‘Symantec Innovator’ title twice, winning ‘Symantec STAR Innovation Day’, and clinching numerous ‘Symantec Applause Awards’. Prior to Symantec Talha worked at MIT’s CENSAM research centre; while working on Hydraulic Modelling and Simulation, he transitioned into a founding member of a spinoff; Visenti, which was acquired by Xylem. Earlier Talha also held Technical Leadership position at Mentor Graphics. During his career, his contributions landed him four spinoffs, five patents, a trade-secret and few publications. Talha holds a Bachelor’s degree in Computer Science with Honors, and a Masters’ degree in Information Systems from National University of Singapore, where he specialized in Business Analytics and Cluster computing in his dissertation. Besides work, Talha actively contributes to Data Science community; as a lead co-organizer for PyDataSG – 2k+ member strong group, holding regular monthly meet ups. Additionally, Talha conducts TeachEdison workshops too. He is a certified First-Aider as well. @ObaidTal

Abstract summary

A Machine Learning approach for detecting a Malware:
The project is to improve the way we detect script based malware using Machine Learning. Malware has become one of the most active channel to deliver threats like Banking Trojans and Ransomware. The talk is aimed at finding a new and effective way to detect the malware. We started with acquiring both malicious and clean samples. Later we performed feature identification, while building on top of existing knowledge base of malware. Then we performed automated feature extraction. After certain feature set is obtained, we teased-out feature which are categorical, interdependent or composite. We applied varying machine learning models, producing both binary and categorical outcomes. We cross validated our results and re-tuned our feature set and our model, until we obtained satisfying results, with least false-positives. We concluded that not all the extracted features are significant, in fact some features are detrimental on the model performance. Once such features are factored-out, it results not only in better match, but also provides a significant gain in performance.

Yashar Mehdad, Data Scientist Manager, Airbnb

Yashar Mehdad leads the machine learning and natural language processing efforts to improve customer support at Airbnb. Before that he lead Yahoo’s publisher products (Yahoo homepage, Yahoo Finance and Yahoo Sports) science team working on content understanding, personalization and ranking. Yashar published over 40 refereed articles and filed more than 10 patents in machine learning and natural language processing. He received his PhD from University of Trento and completed his postdoctoral research in University of British Columbia. Yashar also served as a chair, co-organizer, programme committee and reviewer in various top tier academic workshops, conferences and journals and received few academic and industrial awards.

Abstract summary

Airbnb: Driving a Higher Level of Customer Support with Machine Learning
“The initial Airbnb customer service operation was humble: one guy and his cell phone”. Now, Airbnb has thousands of agents available via phone, chat and email, 24/7, in every time zone and in 30 different languages. Such level of above-and-beyond customer support is not scalable nor sustainable without the power of data and insights we glean from it with the help of machines. At Airbnb, Machine Learning (ML) and Natural Language Processing (NLP) hold the promise to facilitate, optimize and improve customer experiences. Such applications range from understanding customer feedbacks and issues to providing a more reachable and efficient service in order to resolve the issues more effectively. In this talk, I will highlight various ways in which ML and NLP techniques are used in supporting our customers. I will then dig deeper into one or two use cases in more details, explaining the challenges, our approaches, lessons learned and our future directions. Come hear about how the Airbnb customer support team embeds machine learning to provide a higher level of support, customized care, and love.

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State University

Currently a PhD student in Analytics and Data Science at Kennesaw State University. Jessica received a B.A. in Anthropology from Emory University and an M.P.H in Global Epidemiology from the Rollins School of Public Health at Emory University. She has 7 years experience as an Epidemiologist/Biostatistician for the Division of Viral Diseases at the Centers for Disease Control and Prevention.

Abstract summary

Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification:
Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System. In this dataset, Logistic Regression outperformed SVM with ROC index of 81.8 and 81.7 for models with and without graph metrics, respectively. SVM with a polynomial kernel had ROC index of 72.9 and 75.6 for models with and without graph metrics, respectively. Although this did not perform as well as Logistic Regression, the results are consistent with previous studies utilizing SVM to classify diabetes.

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center

Jeremy is a Machine Learning Engineer at the Spark Technology Center, focused on scalable deep learning. His last major contribution was a deep neural network for regression in Spark, which he had the opportunity to speak about at Apache Big Data. He has contributed to MLlib at the STC, which he joined after graduating from Harvard College concentrating in Applied Mathematics and Computer Science.

Abstract summary

Convolutional Neural Networks at scale in Spark MLlib:
Jeremy Nixon will focus on the engineering and applications of a new algorithm built on top of MLlib. The presentation will focus on the methods the algorithm uses to automatically generate features to capture nonlinear structure in data, as well as the process by which it’s trained. Major aspects of that include compositional transformations over the data, convolution, and distributed backpropagation via SGD with adaptive gradients and an adaptive learning rate. Applications will look into how to use convolutional neural networks to model data in computer vision, natural language and signal processing. Details around optimal preprocessing, the type of structure that can be learned, and managing its ability to generalize will inform developers looking to apply nonlinear modeling tools to problems that they face.

Ryan West, Machine Learning Engineer, Nexosis

Ryan West is a Machine Learning Engineer at Nexosis, Inc. He has worked on implementing traditional data science workflows into Nexosis’ automated machine learning platform and generalizing time series models to forecast on datasets across multiple industries. Ryan is also a Venture for America fellow, a program for young professionals that provides the resources and ongoing support to prepare fellows to become successful entrepreneurs. He holds a B.S. in Systems Science and Engineering from Washington University in St. Louis as well as a B.A. in Physics from Knox College where he coauthored a publication on the electronic structure of aryl-substituted BIAN complexes of iron dibromide.

Abstract summary

Codifying Data Science Intuition: Using Decision Theory to Automate Time Series Model Selection:
While models generated from cross-sectional data can utilize cross-validation for model selection, most time series models cannot be cross-validated due to the temporal structure of the data used to create them. It is possible to employ a rolling cross-validation technique, however this process is computationally expensive and provides no indication of the long-term forecast accuracies of the models.

The purpose of this talk is to elaborate how decision theory can be used to automate time series model selection in order to streamline the manual process of validation and testing. By creating consecutive, temporally independent holdout sets, performance metrics for each model’s prediction on each holdout set are fed into a decision function to select an unbiased model. The decision function helps minimize the poorest performance of each model across all holdout sets in order to counteract the possibility of choosing a model that overfits or underfits the holdout sets. Not only does this process improve forecast accuracy, but it also reduces computation time by only requiring the creation of a fixed number of proposed forecasting models.

Qiaoling Liu, Lead Data Scientist, CareerBuilder

Qiaoling Liu is a lead data scientist in CareerBuilder’s Information Extraction and Retrieval team under Data Science R&D group. Her team owns the projects of Company Name Normalization, School Name Normalization, Skill Identification and Normalization, and Recruitment Edge Signals at CareerBuilder. Her research interests include information retrieval, text mining, and semantic web. She received a Ph.D. in Computer Science and Informatics from Emory University, and a B.S. in Computer Science and Technology from Shanghai Jiao Tong University in China. During her PhD studies, she was a student recipient of the 2011, 2012, 2013 Yahoo! Faculty Research and Engagement Program (FREP) Award.

Abstract summary

CompanyDepot: Employer Name Normalization in the Online Recruitment Industry
In the recruitment domain, the employer name normalization task, which links employer names in job postings or resumes to entities in an employer knowledge base (KB), is important to many business applications. It has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location and url context, as well as handling name variations, irrelevant input data, and noises in the KB. In this talk, we present a system called CompanyDepot which uses machine learning techniques to address these challenges. The proposed system achieves 2.5%- 21.4% higher coverage at the same precision level compared to a legacy system used at CareerBuilder over multiple real-world datasets. After applying it to several applications at CareerBuilder, we faced a new challenge: how to avoid duplicate normalization results when the KB is noisy and contains many duplicate entities. To address this challenge, we extend the CompanyDepot system to normalize employer names not only at entity level, but also at cluster level by mapping a query to a cluster in the KB that best matches the query. The proposed system performs an efficient graph-based clustering based on external knowledge from five mapping sources. We also propose a new metric based on success rate and diversity reduction ratio for evaluating the cluster-level normalization. Through experiments and applications, we demonstrate a large improvement on normalization quality from entity-level to cluster-level normalization.