AI going to replace 50% of human jobs according to a new CNBC article quoting Kai-Fu Lee, founder of venture capital firm Sinovation Ventures and former head of Google China. Peter Norvig, current Director of AI at Google has also commented on his concerns around AI decimating the job market recently. I think about these topics often as I am the founder of HiringSolved, an AI based software company which makes recruiting more efficient. So, when will AI come to take your job away and what can you do about it?
Don’t panic! This is our destiny. Building Artificial Intelligence to take away our jobs is a very natural part of the evolution of the human race. Humans are unique because we can create and use tools. The tools we create enable us to work more efficiently, which means that it requires less people to do the job with tools than it did without them. Therefore, the technology we create does some of the work for us, eventually replacing us and taking our jobs.
This isn’t a new thing. Consider the humble wheelbarrow, a 2000 year old technology which enables 1 person to carry the same amount of weight as 10 people. In making the task of carrying things easier and more efficient, the wheelbarrow is an example of technology taking away jobs. For a more recent example, take a look at the computer itself. Most people don’t know that the word “computer” was actually a job title before it was a type of machine. The word “computer” used to be the title of a human worker who made calculations manually. Machines took that job from us so quickly that most people don’t even realize that we used to be the computers. However, I would argue that nobody wants to give up their spam filter or start doing long division by hand and give up Excel.
OK, we have established that we as a species are special because we make technology, that it eventually takes our jobs and that this has been happening for a very long time. Even the AI programmers themselves are not safe. DARPA, the good people who brought us the Internet are working to automate some of what ML/AI engineers do. So, when will technology take your job? Let’s break this question down into a few hypothetical scenarios:
Read More…
About The Author
Shon Burton is the founder and CEO of HiringSolved, which builds AI based recruiting software like RAI. Mr. Burton is also the co-founder of MLconf, a leading independent conference on Machine Learning and The Artificial Intelligence Conference which is hosting an AI Startup Competition with Spark Capital for new companies applying AI in new products. If AI does take all of our jobs it will be partially his fault. As a concession, Shon would like to offer the reader a 20% discount code to The AI Conference in June in San Francisco.
Interview with Adam Omidpanah, Biostatistician, Washington State University, by Sarah Braden
One of our Program Committee members, Sarah Braden, recently interviewed Adam Omidpanah, Biostatistician, Washington State University. This interview covers how recent advances in machine learning have impacted research in health care delivery.
SB) Tell us briefly about yourself and your work.
AO) I am a biostatistician with Washington State University’s college of nursing. I work with a variety of junior collaborators helping them with study design, data analysis, and manuscript preparation. Our work mainly focuses on health care delivery and disparities in marginal US populations, particularly American Indians and Alaska Natives. One of the greatest challenges to my work is shaping a complex causal model which incorporates psychosocial medicine with overt medical conditions, such as the relation between sleep and depression or PTSD and diabetes complications.
SB) This past month Google released a paper on ArXiv (https://arxiv.org/abs/1703.02442) demonstrating a convolutional neural network (CNN) model that has better small tumor detection rates compared to human pathologists. The context for this work was improving breast cancer metastasis detection in lymph nodes. With better metastasis detection more appropriate patient treatment plans could be chosen, and potentially improve patient outcomes. How are other advances in Machine Learning improving the accuracy of cancer diagnoses and risk prediction?
AO) I think ROC regression is a promising area of research. On one hand, ROC regression combines covariates to directly maximize an ROC curve, but its operating characteristics are very irregular. I think boosting and bagging are also promising, and it is only a matter of time before a diagnostic tool is developed that requires knowledge about several hundred genes simultaneously. I am also happy that risk stratification table indices like the integrated discrimination index, have recently fallen out of favor. I always found their performance to be overly sensitive to assumptions, and recent work from Dr. Katie Kerr and Dr. Margaret Pepe have proven this (and other issues) to be a cause for concern.
SB) What are the barriers to introducing machine learning into clinical practice? Is there a protocol for using machine learning models in the field of oncology?
AO) Randomized clinical trials are a gold standard for demonstrating effectiveness of any new cancer detection or treatment tool. Any machinery involving ML algorithms should be subjected to the same rigorous testing, and also confirmed in secondary trials to reduce false positive findings. In that sense I don’t see these as barriers. The clinical perception toward these tools is very positive: generally, there is a desire to reduce human error. The protocol for using machine learning model is pragmatic.
SB) What datasets are openly available for researchers to work on cancer risk prediction?
AO) SEER is perhaps the most widely known. The Center For Medicare Services (CMS) has recently merged SEER data with Medicare data and provided a 10% subsample of the US population without cancer. However, I would encourage interested investigators to research the National Cancer Institute (NCI)’s many accessible databases. Open data are useful, but closed data aren’t inaccessible.
SB) How do concerns about data privacy affect Machine Learning research in oncology?
AO) Most concerns relate specifically to the technology: genetics in particular has been a problematic area of research. Biospecimen donation rates are relatively low, especially for racial and ethnic minorities. And yet, these are the people often most adversely affected by cancer. Promoting biospecimen donation among racial and ethnic minorities could improve precision medicine and reduce disparities.
SB) Clinical science datasets often have large p, small n problems where the number of predictor variables (or features) is larger than the number of observations. Missing data and unbalanced classes are other common issues. What techniques do you use to overcome these challenges in your own work?
AO) An unpopular approach to p >> n is to simply refine the scope of the problem. Very rarely do I encounter a dataset where all features can be assigned equal weight. Combining relevant features in practical ways, and excluding others which have no relation to a pre-specified scientific question usually clearly points to a way forward. Missing data methods and unbalanced classes, while unrelated, all have methods involving applying high dimensional prediction algorithms to either impute or propensity match, thus improving the performance of regression models using those data. Recently I’ve had some success estimating such a high dimensional prediction algorithm using, simply, log linear models with splines and BIC for model selection.
Adam Omidpanah is a biostatistician whose interest in machine learning began during his graduate studies. Adam holds a Bachelors of Science, Mathematics from Portland State University and a Masters of Science, Biostatistics from University of Washington.
Sarah Braden is currently a Data Scientist at the startup HireIQ Solutions, Inc. There she specializes in developing predictive models for HireIQ’s automated interviewing platform. She also writes tools for HireIQ using automated speech recognition. Sarah is a fan of open source technology. She holds a PhD in Geological Sciences from the School of Earth and Space Exploration at Arizona State University and a Bachelors in Physics from Northwestern University.
MLconf NYC 2017 Speaker Resources
Aaron Roth, Associate Professor, University of Pennsylvania
The Algorithmic Foundations of Differential Privacy
The Reusable Holdout: Preserving Validity in Adaptive Data Analysis
Alexandra Johnson, Software Engineer, SigOpt
Intro
Ian Dewancker. SigOpt for ML: TensorFlow ConvNets on a Budget with Bayesian Optimization
Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization
Ian Dewancker. SigOpt for ML: Bayesian Optimization for Collaborative Filtering with MLlib
#1 Trusting the Defaults
Keras recurrent layers documentation
#2 Using the Wrong Metric
Ron Kohavi et al. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained
Xavier Amatriain. 10 Lessons Learning from Building ML Systems (Video at 19:03)
Image from PhD Comics.
See also: SigOpt in Depth: Intro to Multicriteria Optimization
#4 Too Few Hyperparameters
Image from TensorFlow Playground
Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization
#5 Hand Tuning
On algorithms beating experts: Scott Clark, Ian Dewancker and Sathish Nagappan. Deep Neural Network Optimization with SigOpt and Nervana Cloud
#6 Grid Search
Nogridsearch.com
#7 Random Search
James Bergstra and Yoshua Bengio. Random Search for Hyper-parameter Optimization
Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke. A Stratified Analysis of Bayesian Optimization Methods
Learn More
Blog.sigopt.com
sigopt.com/research
Byron Galbraith, Chief Data Scientist, Talla
https://github.com/bgalbraith/bandits
Corinna Cortes, Head of Research, Google
https://arxiv.org/pdf/1611.00068.pdf
http://www.kdd.org/kdd2016/papers/files/Paper_1069.pdf
Erik Bernhardsson, CTO, Better Mortgage
https://github.com/spotify/annoy
https://github.com/erikbern/ann-benchmarks
https://github.com/erikbern/ann-presentation
https://erikbern.com/
Layla El Asri, Research Scientist, Maluuba
Improving Scalability of Reinforcement Learning by Separation of Concerns
Towards Information-Seeking Agents
Frames: A Corpus For Adding Memory To Goal-Oriented Dialogue Systems
Book Giveaway at MLconf NYC, this Friday!
Our generous publishers are sending books again to MLconf NYC. Make sure to grab coupons, as they’ll be offering exclusive book discounts at MLconf! We’ll be hosting book giveaways at the conclusion of the event for the most unique tweets that mention @mlconf and/or #mlconfnyc. Meet some of the authors, Andreas Mueller will be signing his book Introduction to Machine Learning with Python and our speaker Irina Rish, author of Practical Applications of Sparse Modeling. Participating publishers include: MIT Press, Cambridge University Press. CRC Press, and O’Reilly Media.
Cambridge University Press:
- Agarwal/Chen, Statistical Methods for Recommender Systems
- Barabasi, Network Science
- Bennett & Hugen, Financial Analytics with R
- Braun & Murdoch, Introduction to Statistical Programming with R
- Castillo, Big Crisis Data
- Efron/Hastie, Computer Age Statistical Inference
- Flach, Machine Learning
- Fouss, Algorithms and Models for Network Data and Link Analysis
- Guenin e al, A Gentle Introduction to Optimization
- Leskovec et al, Mining of Massive Data Sets
- Liu, Sentiment Analysis
- Roughgarden, Twenty Lectures on Algorithmic Game Theory
- Wainer, Truth or Truthiness
- Warwick & Shah, Turing’s Imitation Game
- Watt, Machine Learning Refined
CRC Press:
*Save 20% when ordering online and enter promo code: AWR96
- A First Course in Machine Learning, Second Edition
- Autonomous Vehicle Navigation: From Behavioral to Hybrid Multi-Controller Architectures
- Big Data and Social Science: A Practical Guide to Methods and Tools
- Data Mining: A Tutorial-Based Primer, Second Edition
- Data Mining with R: Learning with Case Studies, Second Edition
- Machine Learning: An Algorithmic Perspective, Second Edition
- Machine Learning: Algorithms and Applications
- Modern Data Science with R
- Statistical Learning with Sparsity: The Lasso and Generalizations
- Sparse Modeling: Theory, Algorithms, and Applications
MIT:
- Introduction to Machine Learning, Third Edition
- Deep Learning
- Perturbations, Optimization, and Statistics
- Fundamentals of Machine Learning for Predictive Data Analytics
- Decision Making Under Uncertainty
- Foundations of Machine Learning
- Machine Learning
- Advanced Structured Prediction
- Practical Applications of Sparse Modeling, MLconf NYC Speaker, Irina Rish
- Boosting
- Optimization for Machine Learning
- Machine Learning in Non-Stationary Environments
OReilly Media:
* Use discount code: PCBW and save 40% on books, 50% on ebooks and videos.
Additional Machine Learning Books on Display:
- How to Create a Mind: The Secret of Human Thought Revealed, Kurzweil, Ray
- Overcomplicated: Technology at the Limits of Comprehension, Arbesman, Samuel
- Artificial Intelligence Simplified: Understanding Basic Concepts, George, Dr Binto
- Rise of the Robots: Technology and the Threat of a Jobless Future, Ford, Martin
- Superintelligence: Paths, Dangers, Strategies, Bostrom, Nick
- Our Final Invention: Artificial Intelligence and the End of the Human Era, Barrat, James
- The Age of Spiritual Machines: When Computers Exceed Human Intelligence, Kurzweil, Ray
- The Singularity Is Near: When Humans Transcend Biology, Kurzweil, Ray
- The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future, Kelly, Kevin
- Our Robots, Ourselves: Robotics and the Myths of Autonomy, Mindell, David A.
Why is MLconf starting The AI Conference?
Why is MLconf starting the AI Conference?
MLconf started in 2012 as collaboration with Danny Bickson and Carlos Guestrin from the Machine Learning department at Carnegie Mellon University. Courtney asked Danny to speak at our geekSessions event on Big Data in 2011 along with Eric Bieschke the CTO of Pandora and a few other people tackling ML problems at scale. We went on to develop MLconf as an independent, vendor neutral event series, focusing on combining both practical and theoretical presentations on ML in real-world applications, while continuing to collaborate with Carlos and Danny on their Data Science Summit events which were acquired with Turi in 2016.
The market has changed dramatically in the last few years. In 2010, people were working in data science but it was being called everything from analytics to big data. When we started MLconf in 2012, the ML space was just starting to organize behind a few tool sets and open source frameworks. We wanted to create a community where everyone who was generally working in the space could meet and collaborate. Today, we host annual MLconf events in New York, San Francisco, Seattle and Atlanta and we see the MLconf events and the Machine Learning space growing dramatically each year with major support from top technology companies and a vibrant startup ecosystem.
When we look out a few more years, we see some form of ML based technology becoming a pervasive component of most applications in the same way that database technology is today. We’re thrilled to help create environments and communities where people connect, share and collaborate around this fast growing technology. At the same time we believe that AI deserves it’s own focus. Although there is invariably overlap between the two, we think that MLconf will continue to support the Machine Learning community as it explodes into every industry, improving existing applications and creating exciting new applications and software capabilities that have yet to be discovered and supporting the new industry that is being created around Machine Learning technology.
How will the AI Conference be different?
The AI conference will focus on emerging technology in Artificial Intelligence with a specific focus around projects, teams and people who are working on Artificial General Intelligence and related topics. In addition to deeply technical presentations on AI, we will also host presentations on topics in law, ethics, safety, and governance, as we believe those are interesting topics and important dimensions to address in this growing field. As we have always done with MLconf, we will engage the community to help us define what The AI Conference should be. Our first event will be in June of 2017. If you would like to participate, please contact us here. We’re looking forward to the conversation!
About the Blogger:
Shon Burton is the Co-Founder and Chairman of the Board of Directors at MLconf, and CEO at HiringSolved.
Interview with Halim Abbas, VP of Data Science, Cognoa, by Alex Korbonits
One of our Program Committee members, Alex Korbonits, recently interviewed Halim Abbas, Vice President of Data Science at Cognoa, on how recent advances in machine learning have impacted research in childhood development, and his work at Cognoa.
AK) Recently, Nature published a groundbreaking article on the application of advanced machine learning techniques to model early childhood development. Specifically, researchers leveraged artificial neural networks to predict diagnoses for autism with high sensitivity well before behavioral characteristics correlated with ASD usually appear. How have recent advances in machine learning impacted cognitive clinical science generally and research in early childhood development specifically?
HA) Machine learning is a transformative technology that has helped disrupt or completely reinvent every vertical it has been applied to (including health and wellness) in the last decade or two. Cognitive clinical science is relatively late to the party and is only recently beginning to benefit from the power of ML. From leveraging phenotypic data toward reliable assessment, to mining genomic data for meaningful signal, or even building bridges between the two sources, the sky’s the limit.
AK) What excites you the most about applying machine learning to early childhood development?
HA) I worked across many verticals before joining Cognoa. It is hard to beat the excitement you feel when working on a solution to put parents’ minds at ease, or alert them to take action early enough to make a meaningful difference in their children’s quality of life. The field is ripe for technological advancement, and the potential benefit couldn’t be more urgently needed. With developmental delay affecting 1 in 6 U.S. children and a national shortage of diagnosticians, anxious parents often wait over a year to get in to see a specialist; this means that many children miss out on important early interventional therapies. Being in a position to help with a problem so personal to so many people feels like such a privilege.
AK) With all prediction problems, there is a natural tension between maximizing accuracy vs. maintaining interpretability. At Cognoa, what kinds of prediction problems do you encounter that require interpretability? Are there some prediction problems for which black-box models are acceptable or encouraged?
HA) Anything we build that is designed to interact with or influence the medical diagnostic process is required to be interpretable by medical professionals, and understandably so. At a minimum, this means that the most relevant factors to the prediction must be knowable, and the features be tied to meaningful semantic concepts. While this makes certain ML techniques (like PCA or SVM) unfavorable, it doesn’t pose an insurmountable limitation in practice. Models that are peripheral to the diagnostic process (like patient clustering, signal processing, anomaly detection, and time series analysis techniques) tend to remain “black-boxy”.
AK) How do you and your team communicate complex machine learning concepts to parents and cognitive clinical scientists?
HA) The trick is to keep the messaging firmly grounded in the application domain and avoid drifting into specifics that are not directly interpretable in the problem space. A parent isn’t interested in learning whether the underlying screening model was trained with ensemble techniques or which kernel method was used in the SVM classifier. The aspects that matter in this case include meaningful measures of reliability of assessment and information about the factors that significantly contributed to the conclusion. We also found that our users greatly value any information we can give them about the statistical significance of their experience relative to their respective demographic bin. Decile placements, false positive/negative rates, and confidence ranges are good examples.
AK) To what extent is further research in early childhood development influenced by the use of predictive machine learning models?
HA) Today, the typical age of diagnosis for a condition like autism remains over 4, even though it has long been established that earlier diagnosis dramatically improves the impact of intervention. A new breed of clinical science and data science experts are currently busy at work looking for ways to put predictive modeling at work on younger and younger children. The younger they are, the more subtle and fragmented the relevant signals are, which puts the challenge right up the alley of data-driven modeling. The fruit of this wide collaboration might be reliably diagnosing developmental conditions within the first year of life.
AK) With many medical applications, modeling can be extremely difficult due to the so-called “p >> n” problem, where you may have very rich “wide” data but not enough instances to learn effectively. Furthermore, you may have to rely on inconclusive screening, missing data, or noisy measurements. Do you regularly experience these phenomena at Cognoa, and if so, do you have any preferred techniques to circumvent them?
HA) We call it the wide-and-shallow dataset problem, and it is perennial in the field of clinical science. One approach we use to mitigate that limitation is to avail ourselves from two different but complementary sources of data: Clinical patient records are labeled by experts and hence relatively clean and reliable, but sparse, shallow, heavily unbalanced, and very expensive to acquire. Data we accrue from our app user-base is orders of magnitude more voluminous, cheaper to amass, timelier and denser, but inherently noisy and relatively unreliable. At Cognoa we developed a multi-pronged approach in which each data source is put to proper use. For example, we might mine our user-base data to better understand the dimensions and/or segments that are most relevant to the problem at hand, and the nature of the (heavily non-linear) relationships and dependencies interconnecting the relevant dimensions. These insights would then influence the way we seek to collect, filter, and balance clinical patient records used for training our behavioral health screening models.
*We would love to see you at our next MLconf in New York. Mention “Halim18” and save 18% on a ticket to the event!
Halim Abbas, VP of Data Science, Cognoa, is a high tech innovator who spearheaded world-class data science projects at game changing tech firms such as eBay and Quixey. Formally educated in Machine Learning, his professional expertise span Information Retrieval, Natural Language Processing, and Big Data. Halim has a proven track record of applying state of the art data science techniques across industry verticals such as eCommerce, web & mobile services, airline, BioPharma, and the medical technology industry.
He currently leads the Data Science department at Cognoa, a data driven behavioral health care Palo Alto startup.
Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.