MLconf Atlanta

Friday, September 19, 2014 from 8:00 AM to 6:00 PM (EDT) More Information → Register

Join us for an exciting event on September 19th, a block away of Georgia Tech at the Academy of Medicine for the first Atlanta-based MLconf. This year MLconf will focus on ML platforms, tools and algorithms. We'll host a speaker from Facebook whom will give us an overview of how machine learning shapes the biggest social network. We are very happy to have speakers from emerging platforms like Revolution Analytics one of the top players in industrial strength parallel R, SkyTree with their super fast and scalable machine learning server, Context Relevant with their scalable platform that finds structure in the data in record time, and 0xdata’s will present their open source platform and the new deep learning toolbox. Come and find out more about Systap’s graph analytics and machine learning platform on GPUs and Finally if you play in the traditional RDBMS field ORACLE will tell you how to do machine learning. If you are in he NOSQL arena, Cloudera will present the current trends. Professor Manos Antonakis will explain why machine learning alone cannot solve problems, without some domain expertise through his experience in internet security. Professor Amy Langville, the guru of ranking and also the author of “Who is #1”, “Google’s, Page rank and Beyond” will tell us how to rank all kinds of data, from sport teams to movies. If you want to find out how Netflix and Meetup, do their recommendations, you will be surprised by the difference on the constraints and data availability they have.


Friday at 8:00 AM

Breakfast and Registration



Friday at 9:00 AM


Ewa Dominowska - Engineering Manager, Facebook

Ewa Dominowska joined Facebook in spring of 2014 as an Engineering Manager focused on Science and Metrics for Online Advertising. Before coming to Facebook she designed a large scale predictive analytics platform for mobile devices as a Chief Architect at Medio Systems (acquired by Nokia). Prior to her start-up days, Ewa spent 10 years in various roles at Microsoft. At Microsoft, Ewa joined the Online Services Division to help found adCenter, the second largest online advertising platform in the US. Her work focused on real-time ad ranking, targeting, content analysis, click prediction, and pricing models. As part of the small yet dynamic original team, Ewa designed, architected, and built the alpha version of the contextual advertising product. In 2007, Ewa founded the Open Platform Research and Development team. As part of this effort, she organized the Beyond Search academic program, TROA WWW Workshop, and IRA SIGIR Workshop, resulting in a number of very successful collaborations between academia and industry. During her tenure in the Online Services Division, Ewa spent a year serving as the TA for Satya Nadella, where she advised and assisted in operation and planning for the division. The role encompassed architecture, technology, large-scale data services, and cross-organizational efficiency. Ewa was responsible for the intellectual property process, long-term strategy, and prioritization for the division. In 2010 Ewa started the adCenter Marketplace team responsible for all aspects of the advertising marketplace health and tuning. She architected and built a petabyte-scale distributed data and analytics platform and created a suite of marketplace and experimentation tools. Ewa earned her degrees in Electrical Engineering/Computer Science and Mathematics from MIT. Her research focused on machine learning, natural language processing, and predictive, context aware systems applied in the medical field. Ewa authored several papers and dozens of patents in the areas of online advertising, search, pricing models, predictive algorithms and user interaction.



Friday at 9:45 AM

Evan Estola

Evan Estola - Data Scientist,

Abstract: Beyond Collaborative Filtering: using Machine Learning to power recommendations at Meetup
Collaborative filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filtering. From sampling to deployment, we’ve had to invent our approach with few non-academic and non-toy examples to follow. At Meetup we’re all about sharing information and empowering communities, so I’ll present the details of our model as well as some of the new features we are still developing.

Evan is a Machine Learning Engineer at Meetup, where he is responsible for building intelligent systems that directly affect the user experience. Evan owns the recommendation engine at Meetup from data collection to production. Previously, Evan was on the Machine Learning Team at Orbitz Worldwide and he got his start in the Information Retrieval Lab at the Illinois Institute of Technology.



Friday at 10:10 AM

Amy Langville

Amy Langville - Associate Professor of Mathematics, The College of Charleston in South Carolina

My talk will cover four ranking and clustering projects that I consulted on this past year. The projects range from ranking Olympic athletes, mixed martial arts fighters, and cell phone carriers to clustering sentences to rank individuals by how much humility they evidence in their written language. For each project, I will address the particular data challenges and the solutions and techniques we proposed.

Amy is an Associate Professor of Mathematics at The College of Charleston in South Carolina where she regularly teaches graduate courses in Operations Research and Optimization and undergraduate courses in calculus and linear algebra. Her research focuses on ranking and clustering. She also enjoys solving applied mathematics problems from industry and has consulted with a variety of companies from large search engines and software companies to small start-ups and law firms engaged in patent infringement cases. Amy studied Operations Research for her PhD and web information retrieval for her postdoctorate at N.C. State University. When the surf’s up, Amy’s riding it. When it’s not, she’s training jiu-jitsu, peppering a volleyball, or biking around Folly Beach.



Coffee breaks provided by Insightpool 10:35 AM - 10:50 AM



Friday at 10:50 AM


Elizabeth Elhassani - Director of Marketing Analytics and Insights, LexisNexis

Elizabeth Elhassani joined LexisNexis Risk Solutions as Director, Marketing, Marketing Analytics & Insights in January 2012. In this newly created position within Marketing, Elizabeth is responsible for leading the design and implementation of short and long term analytic strategies to benefit all of our businesses. This includes, targeting and segmenting our client and prospect databases for effective demand generation, as well as working closely with our Marketing and Sales colleagues to track, analyze and report results of all customer-facing initiatives, both online and offline. An experienced marketing professional, Elizabeth brings more than 10 years of B2B and B2C analytics marketing experience to our ranks, with emphasis in designing statistical models, CRM strategies, segmentation schemes and cost benefit analyses. She was Associate Director for dunnhumby USA where she was responsible for scoping, pricing and designing consumer analytic insight projects for 10+ key consumer package goods clients utilizing many statistical methodologies to study customer behaviors including linear and nonlinear regression, CART/CHAID, ANOVA and cluster analysis. Prior to her work at dunnhumby, she was a Statistical Project Director for ChoicePoint Precision Marketing where she was responsible for consulting and directing projects for marketing analytics and acquisition models for external clients. In addition to her analytics expertise, she also brings an understanding of our industry with previous experience at Experian and Advanta Bank Business Cards.



Friday at 11:15 AM

parikshit ram

Parikshit Ram - Senior Machine Learning Scientist, Skytree

Abstract: Max-kernel search: How to search for just about anything?

Nearest neighbor search is a well studied and widely used task in computer science and is quite pervasive in everyday applications. While search is not synonymous with learning, search is a crucial tool for the most nonparametric form of learning. Nearest neighbor search can directly be used for all kinds of learning tasks — classification, regression, density estimation, outlier detection. Search is also the computational bottleneck in various other learning tasks such as clustering and dimensionality reduction. Key to nearest neighbor search is the notion of "near"-ness or similarity. Mercer kernels form a class of general nonlinear similarity functions and are widely used in machine learning. They can define a notion of similarity between pairs of objects of any arbitrary type and have been successfully applied to a wide variety of object types — fixed-length data, images, text, time series, graphs. I will present a technique to do nearest neighbor search with this class of similarity functions provably efficiently, hence facilitating faster learning for larger data.

Parikshit Ram is a member of the technical staff at the machine learning startup Skytree ( where he develops enterprise grade machine learning algorithms. Prior to this, Pari completed his doctorate in Computer Science at Georgia Tech in the School of Computational Science and Engineering where he was a member of the FASTlab and focused on developing fundamental algorithms and statistical tools for machine learning and data mining. Pari joined Georgia Tech in 2007 after completing his BS and MS in Mathematics and Computing in the department of Mathematics at Indian Institute of Technology, Kharagpur, India. Pari has also contributed to the open source machine learning library MLPACK (



Friday at 11:40 AM


Sri Ambati - CEO, 0xdata

Sri is co-founder and CEO of 0xdata (@hexadata), the builders of H2O. H2O democratizes bigdata science and makes hadoop do math for better predictions. Before 0xdata, Sri spent time scaling R over bigdata with researchers at Purdue and Stanford. Prior to that Sri co-founded Platfora and was the Director of Engineering at DataStax. Before that Sri was Partner & Performance engineer at java multi-core startup, Azul Systems, tinkering with the entire ecosystem of enterprise apps at scale.

Before that Sri was at sabbatical pursuing Theoretical Neuroscience at Berkeley. Prior to that Sri worked on nosql trie based index for semistructured data at in-memory index startup RightOrder. Sri is known for his knack for envisioning killer apps in fast evolving spaces and assembling stellar teams towards productizing that vision. Sri is a regular speaker in the BigData, NoSQL and Java circuit.



Friday at 12:05 AM

sandy ryza

Sandy Ryza - Software Engineer, Cloudera

Abstract: Unsupervised Learning on Huge Data with Apache Spark
Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Spark's MLLib module contains implementations of several unsupervised learning algorithms that scale to large datasets. In this talk, we'll discuss how to use and implement large-scale machine learning algorithms with the Spark programming model, diving into MLLib's K-means clustering and Principal Component Analysis (PCA).



LUNCH 12:30 AM - 1:00 PM



Friday at 1:00 PM




Friday at 2:00 PM


Justin Basilico - Senior Researcher/Engineer in Recommendation Systems, Netflix

Justin Basilico is a Research/Engineering manager for Page Algorithms Engineering at Netflix. He leads an applied research team focused on developing the next generation of algorithms used to generate the Netflix homepage through machine learning, ranking, recommendation, and large-scale software engineering. Prior to Netflix, he worked on machine learning in the Cognitive Systems group at Sandia National Laboratories. He is also the co-creator of the Cognitive Foundry, an open-source software library for building machine learning algorithms and applications.



Friday at 2:25 PM

Tao Ye

Tao Ye - Senior Scientist, Pandora

Tao Ye is a Sr. Scientist on the Pandora playlist team since 2010, working on research driven system building for recommendation systems, measurements and user modeling. She has 15 years of experience in the software industry, holding research scientist and lead engineer positions in social media, networking and mobile systems. She holds 11 granted patents and has published 12 peer reviewed papers. She received a Master's degree from UC Berkeley in Computer Science and duo Bachelor's degrees from State University of New York at Stony Brook in Computer Science and Engineering Chemistry.



Friday at 2:50 PM

Xia Zhu - Intel




Friday at 3:15 PM

Jacob Mundt pic

Jacob Mundt - Chief Technology Officer, eBrevia

Jacob Mundt is the CTO at legal tech startup eBrevia, applying information extraction and summarization to the text of legal documents and contracts. eBrevia provides software tools that help attorneys to speed their review of legal documents while increasing accuracy. Previously Jacob researched summarization, machine translation, and information extraction under Kathleen McKeown at Columbia University, and led the Research and Development team at Outcome Sciences (acquired by Quintiles) to improve patient health outcomes through collection of clinical data from hundreds of hospitals. He holds a Bachelor of Science from Rice University and a Master of Science from Columbia.



Coffee breaks provided by Insightpool (Book Giveaways) 3:40 PM - 4:10 PM



Friday at 4:10 PM


Manos Antonakakis - Assistant Professor of Computer Systems and Software, Georgia Tech

Abstract: So, you think you can model Internet abuse with machine learning?

Abuse in the Internet is an every day problem. Illicit actors are victimizing people, which result to a variety of significant problems --- i.e., from losing your private information to have your recourses being used in other criminal activities. The common denominator behind the Internet abuse is a network of infected machines (a.k.a. botnet) under the control of the criminal entity (a.k.a. botmaster). Needless to say, the detection of such "botnet communications" is in the hurt of the security problem that a large organization faces every day. Detection methods based on static methods are doomed fail, simply because they will always be behind the threat. Thus, the community is in great need of scalable abuse detection solutions.

Unsurprisingly, such newly proposed solutions are often based on machine learning. With this talk I will argue that a fancy machine-learning algorithm (and derived pretty graph pictures) "operationally" will simply not "cut-it". This is true especially in the case where what you are trying to solve is not your company's marketing problem, rather the security problem your network and security operation center is facing every day. The role of domain knowledge and constant counter intelligence of the malicious actors is fundamental to properly craft generic detection and attribution solutions able to catch up with the constantly changing malicious methodologies, while at the same time you minimize the false and missed detections.

Manos Antonakakis received his engineering diploma in 2004 from the University of the Aegean, Department of Information and Communication Systems Engineering. From November 2004 up to July 2006, he was working as a guest researcher at the National Institute of Standards and Technology (NIST-DoC), in the area of wireless ad hoc network security, at the Computer Security Division. Before joining the ECE faculty, Dr. Antonakakis held the chief scientist role at Damballa, where he was responsible for advanced research projects, university collaborations, and technology transfer efforts. He currently serves as the co-chair of the Academic Committee for the Messaging Anti-Abuse Working Group (MAAWG). In May 2012, he received his Ph.D. in computer science from the Georgia Institute of Technology under Wenke Lee's supervision. In his free time, he enjoys watching and playing soccer.



Friday at 4:35 PM


Danai Koutra - CMU/Technicolor Researcher, Carnegie Mellon University

Networks naturally capture a host of interactions in the real world spanning from friendships to brain activity. But, given a massive graph, like the Facebook social graph, what can be said about its structure? Which are its most important structures? How does it compare to other networks like Twitter? This talk will focus on my work developing scalable algorithms and models that help us to make sense of large graphs via pattern discovery and similarity analysis.

I will begin by presenting VoG, an approach that efficiently summarizes large graphs by finding their most interesting and semantically meaningful structures. Starting from a clutter of millions of nodes and edges, such as the Enron who-mails-whom graph, our Minimum Description Length based algorithm, disentangles the complex graph connectivity and spotlights the structures that ‘best’ describe the graph.

Then, for similarity analysis at the graph level, I will introduce the problems of graph comparison and graph alignment. I will conclude by showing how to apply my methods to temporal anomaly detection, brain graph clustering, deanonymization of bipartite (e.g., user-group membership) and unipartite graphs, and more.

Danai Koutra is a final-year Ph.D. candidate at the Computer Science Department at Carnegie Mellon University. Her research interests include large-scale graph mining, graph similarity and matching, graph summarization, and anomaly detection. Danai's research has been applied mainly to social, collaboration and web networks, as well as brain connectivity graphs. She holds 1 ``rate-1'' patent and has 6 (pending) patents on bipartite graph alignment. Danai has multiple papers in top data mining conferences, including 2 award-winning papers, and her work was covered by popular press, such as MIT Technology Review. She has also worked at IBM Hawthorne, Microsoft Research Redmond, and Technicolor Palo Alto/Los Altos. She earned her M.S. in Computer Science from CMU 2013 and her diploma in ECE at the National Technical University of Athens in 2010.



Friday at 5:10 PM


Hassan Chafi - Research Manager, Oracle Labs

Abstract: PGX: An In-Memory, Parallel Graph Analytic and Query Engine

Brief Description:
In-memory (and distributed) graph analytic engine that is tightly coupled with a relational database.

We present a graph processing system in which a graph database is tightly integrated with a graph analytic engine. Our graph database, based on existing NoSQL and relational databases, provides scalable management of graph data for transactional workloads. Our graph analytic engine, on the other hand, enables rapid execution of analytic workloads. We first introduce PGX, our in-memory graph analytic engine which initially loads up the graph data from the database and periodically synchronizes afterward. The parallel execution engine of PGX is very efficient - e.g. counting triangles in billion-edge graphs in 2 minutes. The users can also submit their custom graph algorithms written in a domain-specific language; PGX automatically parallelizes them for execution. Then we introduce PGX.DIST, our distributed graph analytic engine. We show that PGX.DIST is up to orders of magnitude faster than the state-of-art graph analytic engine. The DSL compiler can help running the same algorithm on both PGX and PGX.DIST, transparently.
* Graph database tightly integrated with graph analytic engine
* Fast, parallel in-memory graph analytic engine
* Distributed graph analytic engine
* Use of Domain-Specific Language for graph analytics.



Friday at 5:35 PM




Friday at 5:55 PM

Thank Yous, Book Giveaways, and Wrap-Up

Event Sponsors:


Media Sponsors: