Enjoy a full day conference this September and appreciate the tribute to the southern reputation of charm and elegance, that The Academy offers. During coffee breaks and meals, take in the old-world elegance and style of days gone by, in the heart of bustling midtown Atlanta.

Speakers

husseinmehanna_150

Hussein Mehanna, Engineering Director – Core ML, Facebook

I am the Director of the Core Machine Learning group at Facebook. Our team focuses on building state of the art ML/AI Platforms combined with applied research in event prediction and text understanding. We work closely with product teams in Ads, Feed, Search, Instagram and others to improve their user experiences.

In 2012, I joined Facebook as the original developer on the Ads ML platform. That quickly developed into a Facebook wide platform serving more than 30+ teams. Prior to Facebook, I worked at Microsoft on Search query alterations and suggestions in Bing and on communication technologies in Lync. I hold a masters degree in Speech Recognition from the University of Cambridge, UK where I worked on noise robustness modeling.

Abstract summary

Applying Deep Learning at Facebook Scale: Facebook leverages Deep Learning for various applications including event prediction, machine translation, natural language understanding and computer vision at a very large scale. There are more than a billion users logging on to Facebook every daily generating thousands of posts per second and uploading more than a billion images and videos every day. This talk will explain how Facebook scaled Deep Learning inference for realtime applications with latency budgets in the milliseconds.

View the slides for this presentation »

TerriLarson

Terri Larsen, Founder & Director, ScientificLiteracy.org

Throughout Dr. Larsen’s academic training she perceived a conspicuous void in guidelines on how we present our data and results. She has dedicated her career to thinking about how to do this as well as possible. She respects Edward Tufte’s perspective on data visualization but finds his critiques frustrating because he tells us what’s wrong rather than how to do it right. So she has created a curriculum that teaches the process of going from raw data to a clear visual message.

Abstract summary

Achieving Better Visual Communication; Best Practices in Data Visualization: Visual messages make the most impact in terms of conveying complex information and its significance. Accurate, clear, and compelling data visualization has never been more important. To compose powerful visual messages conveying complex information, this talk will give you

  • Simple tips to make your data visualizations more clear
  • A powerful approach to enhance your message so that your audience comprehends the significance

Make your data visualizations much more clear and intuitive by learning to take advantage of the visual vocabulary our audience already knows. Hone the message for maximum impact by applying common sense rules of visual “grammar” to make complex information readily understandable. You will come away from this talk with a fresh perspective on composing images through a lens of visual literacy:

  • the grammar and vocabulary of visual communication that can transform your data visualizations
  • the most common mistakes and what to do to correct them

View the slides for this presentation »

JoshPatterson

Josh Patterson, Advisor, Skymind.io – Deep learning for Industry

Josh Patterson currently is Director of Field Engineering for Skymind. Previously Josh worked as a Principal Solutions Architect at Cloudera and as a machine learning / distributed systems engineer at the Tennessee Valley Authority where he brought Hadoop into the smart grid with the openPDC project. Josh has a Masters in Computer Science from the University of Tennessee at Chattanooga where he did published research on mesh networks (tinyOS) and social insect optimization algorithms. Josh has over 18 years in software development and is very active in the open source space with projects such as DL4J, Apache Mahout, Metronome, IterativeReduce, openPDC, and JMotif.

Abstract summary

DL4J and DataVec for Enterprise Deep Learning Workflows: Applications in NLP, sensor processing (IoT), image processing, and audio processing have all emerged as prime deep learning applications. In this session we will take a look at a practical review of building practical and secure Deep Learning workflows in the enterprise. We’ll see how DL4J’s DataVec tool enables scalable ETL and vectorization pipelines to be created for a single machine or scale out to Spark on Hadoop. We’ll also see how Deep Networks such as Recurrent Neural Networks are able to leverage DataVec to more quickly process data for modeling.

View the slides for this presentation »

TanviMotwani

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com

Tanvi Motwani currently leads the data science efforts within the Guidance team at Amazon Search. Tanvi has been working with Amazon’s search engine technology for the past three years and her primary focus has been understanding users’ queries to improve ranking of products on Amazon. Prior to that, she worked at Adchemy (later acquired by Walmart Labs) and developed their keyword recommendation system for online advertisers. She completed her Masters in Computer Science from The University of Texas at Austin, publishing papers with Prof. Raymond Mooney on problems of Integrating NLP and Computer Vision (NIPS, 2011 and ECAI, 2012). During her masters, she worked at Facebook to improve ranking of facebook app stories on News Feed. She has been developing scalable, end-to-end, machine learning systems for search and ads technology for the past 8 years.

Abstract summary

E-commerce Query Tagging System Using Unsupervised Training Methods: Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. A key component of Amazon Search is the query understanding pipeline, which extracts appropriate semantic information used to precisely display products for billions of queries everyday. In this talk, we will go through the primary building blocks of query understanding pipeline.
Amazon Search enables users to search against structured products, hence it is necessary to extract information from queries in a format that is consistent with the structured information about the products. Query tagging is the task of semantically annotating query terms to pre-defined labels (such as brand, product-type and color). We propose a scalable system to train large-scale machine learning algorithms to solve this problem. Our system improved the precision over baseline, which is a dictionary lookup based tagger, by 10% and approximately doubled the recall.

View the slides for this presentation »

lesong_150

Le Song, Assistant Professor, College of Computing, Georgia Institute of Technology

Le Song is an assistant professor in the College of Computing, Georgia Institute of Technology. He received his Ph.D. in Machine Learning from University of Sydney and NICTA in 2008, and then conducted his post-doctoral research in the Department of Machine Learning, Carnegie Mellon University, between 2008 and 2011. Before he joined Georgia Institute of Technology, he was a research scientist at Google. His principal research direction is machine learning, especially nonlinear methods and probabilistic graphical models for large scale and complex problems, arising from artificial intelligence, social network analysis, healthcare analytics, and other interdisciplinary domains. He is the recipient of the NSF CAREER Award’14, AISTATS’16 Best Student Paper Award, IPDPS’15 Best Paper Award, NIPS’13 Outstanding Paper Award, and ICML’10 Best Paper Award. He has also served as the area chair for leading machine learning conferences such as ICML, NIPS and AISTATS, and action editor for JMLR.

Abstract summary

Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.

Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.

View the slides for this presentation »

michaelgalvin

Michael Galvin, Sr. Data Scientist, Metis

Michael comes to Metis from General Electric where he worked to establish the company’s data science strategy and capabilities for field services and to build solutions supporting global operations, risk, engineering, sales, and marketing. He also taught data science and machine learning for General Assembly. Prior to GE, Michael spent several years as a data scientist working on problems in credit modeling at Kabbage and corporate travel and procurement at TRX. Michael holds a Bachelor’s degree in Mathematics and a Master’s degree in Computational Science and Engineering from the Georgia Institute of Technology where he also spent 3 years working on machine learning research problems related to computational biology and bioinformatics. Additionally, Michael spent 12 years in the United States Marine Corps where he held various leadership roles within aviation, logistics, and training units. In his spare time, he enjoys running, traveling, and reading.

Abstract summary

Machine Learning in Business: Data science has been one of the fastest growing jobs of the past 10 years and companies are rapidly integrating it into their businesses. In this talk I will discuss the practical skills and techniques needed to successfully integrate data science into a business, as well as some common struggles and pitfalls that commonly occur.

View the slides for this presentation »

erinledell

Erin LeDell, Machine Learning Scientist, H2O.ai

Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc.

Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from UC Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.

Abstract summary

Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.

We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.

View the slides for this presentation »

beverlywright_150

Beverly Wright, Executive Director, Business Analytics Center, Georgia Institute of Technology

Dr. Beverly Wright leads the Business Analytics Center at Georgia Institute of Technology’s Scheller College of Business. Beverly brings over twenty years of marketing analytics and insights experience from corporate, consulting and academia. In her consultative roles for both nonprofits and for profit businesses, she has solved critical issues through the use of modeling and advanced analytics. Her academic experience spans over a decade with a strong emphasis toward community engagement and experiential learning. She’s also worked for companies within or leading Marketing Analysis departments.

Beverly earned a PhD in Marketing Analysis, a Master of Science degree in Analytical Methods, and a Bachelor of Business Administration degree in Decision Sciences from Georgia State University. She has also received a Professional Research Certification from the Marketing Research Association and CAP certification from INFORMS. Dr. Beverly Wright regularly presents at professional and academic conferences, as well as publishes articles in various business journals.

Abstract summary

Solving for Why: Impact of Machine Learning for Business Decision-Making: Our abilities to create and harness volumes of data seem to have brought expectations for the development and application of more advanced modeling to support various levels of decision-making. The business problems and opportunities we can and strive to solve are well positioned to increase in number, breadth, and complexity.

As many organizations forge into uncomfortable and foreign territory to build more advanced analytics capabilities, the need for an academic partner tends to become increasingly apparent, and forming meaningful and active partnerships with academic institutions provide a number of benefits for the business community, rising talent, faculty, and other constituents.

View the slides for this presentation »

JonathanLenaghan

Jonathan Lenaghan, VP of Science and Technology, PlaceIQ

Jonathan Lenaghan is the Vice President of Science and Technology at PlaceIQ. He joined PlaceIQ in 2012 and has since been engaged in building the high-scale geospatial analytics platform that powers PlaceIQ’s media and enterprise businesses.

After receiving his PhD in physics at Yale University, Jonathan worked as a researcher studying phase transitions in the early universe at the Niels Bohr Institute in Copenhagen, Denmark and the University of Virigina. His academic work focused on developing theoretical and computational tools to understand the statistical properties of very hot, strongly interacting matter. After leaving academia, Jonathan worked in scientific publishing as an assistant editor for the Physical Review, a leading physics journal. Prior to joining PlaceIQ, Jonathan spent several years in the quantitative trading industry where he applied large-scale optimization and machine learning techniques to tick-level equity pricing data and developed real-time trading models and platforms.

Abstract summary

Discerning Human Behavior from Mobility Data: Mobility data encompasses many elements, including location history, latitude coordinates, longitude coordinates, anonymized mobile device IDs, and timestamps. Such data are generated, for instance, by automobile navigation applications and by the mobile advertising ecosystem. Typical sources of mobility data contain extensive inaccuracies that result from a variety of sources, ranging from shortcomings in location services on mobile devices to the intentional misrepresentation of spatial coordinates by bad ecosystem actors. In this talk, we describe a production data pipeline, Darwin, which analyzes the location quality of mobility data to measure how accurately a set of mobility data represents true movement patterns. Darwin uses a number of measures that are ultimately combined into two quality scores: hyper-locality and clusterability. These measurements include techniques from information theory, the mean number of spatial clusters, the compactness of the clusters, and the differences between the empirical distribution of digits in the spatial coordinates and reference distributions.

View the slides for this presentation »

tompeters_150

Tom Peters, Software Engineer, Ufora

Tom Peters is a software engineer at Ufora, Inc. He has worked on a multiple aspects of Ufora’s auto-parallel, multi-host, open source Python project, Pyfora. He has a PhD in mathematics from Columbia University, where he specialized in low-dimensional topology, using Heegaard Floer homology to compute invariants of manifolds, and has a BA in mathematics from Rutgers University.

Abstract summary

Say What You Mean: Scaling Machine Learning Algorithms Directly from Source Code: Scaling machine learning applications is hard. Even with powerful systems like Spark, Tensor Flow, and Theano, the code you write has more to do with getting these systems to work at all than it does with your algorithm itself. But it doesn’t have to be this way!

In this talk, I’ll discuss an alternate approach we’ve taken with Pyfora, an open-source platform for scalable machine learning and data science in Python. I’ll show how it produces efficient, large scale machine learning implementations directly from the source code of single-threaded Python programs. Instead of programming to a complex API, you can simply say what you mean and move on. I’ll show some classes of problem where this approach truly shines, discuss some practical realities of developing the system, and I’ll talk about some future directions for the project.

View the slides for this presentation »

FundaGunes_150

Funda Gunes, Senior Research Statistician Developer at SAS Institute Inc.

Funda is a Senior Research Statistician and Machine Learning Scientist at SAS Institute where she researches and implements new data mining and machine learning approaches for SAS Enterprise Miner, data mining software. She received her Ph.D. in statistics from North Carolina State University. Her research focuses on penalized regression methods, ensemble machine learning techniques, Bayesian statistics, and mixed models.

Abstract summary

Local Search Optimization for Hyper-Parameter Tuning: Many machine learning algorithms are sensitive to their hyper-parameter settings, lacking good universal rule-of-thumb defaults. In this talk we discuss the use of black-box local search optimization (LSO) for machine learning hyper-parameter tuning. Viewed as a black-box objective function of hyper-parameters, machine learning algorithms create a difficult class of optimization problems. The corresponding objective functions involved tend to be nonsmooth, discontinuous, unpredictably computationally expensive, requiring support for both continuous, categorical, and integer variables. Further evaluations can fail for a variety of reasons such as early exits due to node failure or hitting max time. Additionally, not all hyper-parameter combinations are compatible (creating so called “hidden constraints”). In this context, we apply a parallel hybrid derivative-free optimization algorithm that can make progress despite these difficulties providing significantly improved results over default settings with minimal user interaction. Further, we will address efficient parallel paradigms for different types of machine learning problems, while exploring the importance of validation to avoid overfitting and emphasizing that even for small data problems, the need to perform cross validations can create computationally intense functions that benefit from a distributed/threaded environment.

View the slides for this presentation »

PatrickKoch

Patrick Koch, Principal Data Scientist, SAS Institute Inc.

Dr. Patrick Koch is a Principal Data Scientist with the Operations Research R&D team in the Advanced Analytics division at SAS Institute Inc. His focus is on optimization strategies in machine learning – developing hybrid search methods within a distributed/parallel framework for effective and efficient training and tuning of predictive models. Before joining the SAS OR team, Patrick led a team designing, developing, and applying collaborative decision support technologies for engineering design at Dassault Systèmes – combining sampling/designed-experiment strategies for computer simulations, predictive modeling surrogates for complex and expensive black box simulations, and non-linear optimization methods with interactive post processing and visualization. Here Patrick’s expertise resided on coupling statistics and probabilistics with search methods for optimization under uncertainty (stochastic, robust, reliability-based optimization). Patrick received his Ph.D. in Mechanical Engineering in 1998 from Georgia Institute of Technology where he collaborated with Aerospace Engineering and Industrial and Systems Engineering departments while developing a hierarchical approach to robust optimization of large scale complex systems.

Abstract summary

Local Search Optimization for Hyper-Parameter Tuning: Many machine learning algorithms are sensitive to their hyper-parameter settings, lacking good universal rule-of-thumb defaults. In this talk we discuss the use of black-box local search optimization (LSO) for machine learning hyper-parameter tuning. Viewed as a black-box objective function of hyper-parameters, machine learning algorithms create a difficult class of optimization problems. The corresponding objective functions involved tend to be nonsmooth, discontinuous, unpredictably computationally expensive, requiring support for both continuous, categorical, and integer variables. Further evaluations can fail for a variety of reasons such as early exits due to node failure or hitting max time. Additionally, not all hyper-parameter combinations are compatible (creating so called “hidden constraints”). In this context, we apply a parallel hybrid derivative-free optimization algorithm that can make progress despite these difficulties providing significantly improved results over default settings with minimal user interaction. Further, we will address efficient parallel paradigms for different types of machine learning problems, while exploring the importance of validation to avoid overfitting and emphasizing that even for small data problems, the need to perform cross validations can create computationally intense functions that benefit from a distributed/threaded environment.

View the slides for this presentation »

ryancurtin

Ryan Curtin, Principal Research Scientist, Symantec

Ryan Curtin is the primary maintainer and developer of mlpack, and currently a Principal Research Scientist at Symantec. Previously, he was awarded a Ph.D. at Georgia Tech for research involving fast tree-based algorithms and their applications to machine learning settings. He doesn’t like writing biographies, and once, he lit his hair on fire while operating a homemade blast furnace.

Abstract summary

mlpack: Exploiting C++ to Produce Fast Machine Learning Algorithms: mlpack is a cutting-edge C++ machine learning library containing fast implementations of both standard machine learning algorithms and recently-published algorithms. In this talk, I will introduce mlpack, its design philosophy, and discuss how C++ is helpful for making implementations fast, as well as the pros and cons of C++ as a language choice. I will briefly review the capabilities of mlpack, then focus on mlpack’s flexibility by demonstrating the k-means clustering code (and maybe some other algorithms too, like nearest neighbor search), and how it might be used in a production environment. The project website can be found at https://www.mlpack.org/.

View the slides for this presentation »

arun_150

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis

Arun Rathinasabapathy is the Senior Software Engineer working as a BIG Data scientist with LexisNexis Risk Solutions Inc. He is currently responsible for analyzing the BIG Data problems in Lexis Nexis and arriving at suitable performance engineering recommendations and new migration projects development. Arun joined LexisNexis in 2013 as the Senior Software Analyst. He developed multiple migration projects from existing Legacy Applications in the enterprise to HPCC and managed many teams to successfully launch many new solutions and products. He has 10 years of experience in working in Software Industry in varied positions.

Prior to LexisNexis, Arun held several positions with Cognizant Technology Solutions US PVT LTD including Module lead, Team lead and Project Manager. He worked closely with many businesses over the years as partners of Kohl’s, Discover Financial Services and ACE Insurance.

Arun has various Software courses certifications and spoken at industry conferences representing LexisNexis proprietary software HPCC and Enterprise Control Language.

He holds Bachelors in Computer Engineering from Anna University, an MBA in Finance from Alagappa University and currently pursuing Doctor of Business Administration from California Southern University.

Abstract summary

Big Data Processing Above and Beyond Hadoop: Data-intensive computing represents a new computing paradigm to address Big Data processing requirements using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible. The fundamental challenges of data-intensive computing are managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms which can scale to search and process massive amounts of data. The open source HPCC (High-Performance Computing Cluster) Systems platform offers a unified approach to Big Data processing requirements: (1) a scalable, integrated computer systems hardware and software architecture designed for parallel processing of data-intensive computing applications, and (2) a new programming paradigm in the form of a high-level, declarative, data-centric programming language designed specifically for big data processing. This presentation explores the challenges of data-intensive computing from a programming perspective, and describes the ECL programming language and the HPCC architecture designed for data-intensive computing applications. HPCC is an alternative to the Hadoop platform, and ECL is compared to Pig Latin, a high-level language developed for the Hadoop MapReduce architecture.

View the slides for this presentation »

kazsoto

Kaz Sato, Evangelist, Google

Kaz Sato is Staff Developer Advocate at Cloud Platform team, Google Inc. He leads the developer advocacy team for Machine Learning and Data Analytics products, such as TensorFlow, Vision API and BigQuery, and speaking at major events including Strata+Hadoop World 2016 San Jose, Google Next 2015 NYC and Tel Aviv and DevFest Berlin. Kaz also has been leading and supporting developer communities for Google Cloud for over 7 years. He is also interested in hardwares and IoT, and has been hosting FPGA meetups since 2013.

Abstract summary

Machine Intelligence at Google Scale: Tensor Flow and Cloud Machine Learning: The biggest challenge of Deep Learning technology is the scalability. As long as using single GPU server, you have to wait for hours or days to get the result of your work. This doesn’t scale for production service, so you need a Distributed Training on the cloud eventually. Google has been building infrastructure for training the large scale neural network on the cloud for years, and now started to share the technology with external developers. In this session, we will introduce new pre-trained ML services such as Cloud Vision API and Speech API that works without any training. Also, we will look how TensorFlow and Cloud Machine Learning will accelerate custom model training for 10x – 40x with Google’s distributed training infrastructure.

View the slides for this presentation »

amylangville

Amy Langville, Professor of Mathematics, The College of Charleston in South Carolina

Amy is a Professor of Mathematics at the College of Charleston where she is the Operations Research specialist. She works on ranking, clustering, recommendation systems, and more recently stable matching for applications from web search to legal consulting to sports analytics.

Abstract summary

Learning to Play Sports: Sports Analytics is an active and growing field. With large datasets from biometric devices and player tracking equipment, sports teams can benefit from techniques in data analytics and machine learning. This talk will discuss work in the areas of March Madness and game-to-game analysis. With the emergence of algorithms to study such dynamics as player performance and fan engagement, the collection of data also becomes paramount. Professional sports organizations have access to premium technology. This talk will also discuss how such work can be transferred to the college and secondary levels. Machine learning allows cutting edge technology to play from the bench.

View the slides for this presentation »

TimChartier

Tim Chartier, Chief Researcher, Tresata

Chief Researcher for Tresata and Professor of Mathematics and Computer Science at Davidson College Dr. Tim Chartier specializes in sports analytics. He frequently consults on data analytics questions, including projects with ESPN Magazine, ESPN’s Sport Science program, NASCAR teams, the NBA, and fantasy sports sites. In 2014, Tim was named the inaugural Math Ambassador for the Mathematical Association of America, which also recognized Dr. Chartier’s ability to communicate math with a national teaching award. His research and scholarship were recognized with the prestigious Alfred P. Sloan Research Fellowship. Published by Princeton University Press, Tim authored Math Bytes: Google Bombs, Chocolate-Covered Pi, and Other Cool Bits in Computing. Through the Teaching Company, he taught a 24-lecture series entitled Big Data: How Data Analytics Is Transforming the World. In K-12 education, Tim has also worked with Google and Pixar on their educational initiatives. Dr. Chartier has served as a resource for a variety of media inquiries, including appearances with Bloomberg TV, NPR, the CBS Evening News, USA Today, and The New York Times.

Abstract summary

Learning to Play Sports: Sports Analytics is an active and growing field. With large datasets from biometric devices and player tracking equipment, sports teams can benefit from techniques in data analytics and machine learning. This talk will discuss work in the areas of March Madness and game-to-game analysis. With the emergence of algorithms to study such dynamics as player performance and fan engagement, the collection of data also becomes paramount. Professional sports organizations have access to premium technology. This talk will also discuss how such work can be transferred to the college and secondary levels. Machine learning allows cutting edge technology to play from the bench.

View the slides for this presentation »

ChrisFregly

Chris Fregly, Research Scientist, Pipeline.io

Chris Fregly is a Research Scientist at Pipeline.io – a Streaming Analytics and Machine Learning Startup in San Francisco. He’s also an Apache Spark Contributor, Netflix Open Source Committer, Founder of the Global Advanced Spark and TensorFlow Meetup, and Author of the Upcoming Book, Advanced Spark.

Previously, Chris was a Streaming Data Engineer at Databricks and Netflix – as well as an early member of the IBM Spark Technology Center in San Francisco.

Abstract summary

Comparing TensorFlow NLP Options: word2Vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank: Through code samples and demos, we’ll compare the architectures and algorithms of the various TensorFlow NLP options. We’ll explore both feed-forward and recurrent neural networks such as word2vec, gloVe, RNN/LSTM, SyntaxNet, and Penn Treebank using the latest TensorFlow libraries.

View the slides for this presentation »

Sponsors

Gold:

cloudera

sas

mapr1

spare5

Silver:

hpccsystems

logicblox

h2o-ai

Lanyard:

hiringsolved

Media:

oreilly

galvanize

techtank

crcpress

womenwhocode

cambridge

springer