The MLconf Blog

Interview with Andreas Mueller

One of our Program Committee members, Reshama Shaikh, recently interviewed Andreas Mueller, a Lecturer in Data Science at Columbia University and core developer of the Python library scikit-learn, on some of his recent work with the scikit-learn open source community. There is a scikit-learn sprint that is co-organized by Andreas and Reshama (an organizer for the meetup group, Women in Machine Learning and Data Science) to increase women’s participation in open source contribution, on March 4th in NYC. Check it out here.

RS) Tell us briefly about yourself

AM) I’m currently a lecturer in Data Science at Columbia University, where I teach applied machine learning. I have been a core developer of the Python library scikit-learn for the past 6 years. I recently published the book Introduction to Machine Learning for Python.

RS) How did you get involved in scikit-learn and open source in general?

AM) While working on my Ph.D. in computer vision and learning, the scikit-learn library became an essential part of my toolkit. I was an ardent user of the library, and I wanted to partake in its advancement. My initial participation in open source began in 2011 at the NIPS conference in Granada, Spain, where I had attended a scikit-learn sprint. The scikit-learn release manager at the time had to leave, and the project leads asked me to become release manager; that’s how it all got started.

RS) Last year, you reached out me, as an organizer of the meetup group, Women in Machine Learning and Data Science, and asked of our group’s interest in doing a sprint. You were working on a grant to NSF to fund the sprint for my meetup group. Where did you get the idea to submit a grant to increase women’s participation in open source?

AM) It was part of a bigger grant submission to the NSF. It is very obvious that in academia, in particular in computer science, there are very few women, there is gender bias. This is apparent at conferences where there are noticeably few women. Unfortunately, in open source, the gender bias is even worse. And in academic open-source, it is even lower. There is only one woman among the top 100 contributors to the scikit-learn library. Fortunately, there are lots of funding agencies that are happy to fund diversity and research.

RS) What are your long-term goals for increasing women’s participation in open source?

AM) My goal is to have more women actively involved in scikit-learn. Right now, there are 1 or 2, so any number greater than that is progress. Ultimately, we would like to have more women involved in central open source projects in other python projects such as numpy, matplotlib and jupyter.

RS) What do you think women bring to open source that is missing?

AM) This is a complicated question, and I want to avoid statements that are generalizations; that one gender does something that another doesn’t. My ultimate goal is to make sure that everyone in the community participates. Since both men and women use open source, it would be beneficial for the entire ecosystem if both men and women were contributors.

RS) Why do you think women are not as involved in open source?

AM) There could be a number of hypotheses. Maybe we are just so unfriendly to women and they start to drop out – I don’t think that’s it, though. The gender disparity is a substantial problem in other places in tech. It’s possible it is a funnel problem, where women do not have the opportunity to start being involved. A female friend of mine, a high-profile machine learning researcher, told me she was anxious to post on the scikit-learn issue tracker. We need to find barriers and remove them.

RS) Why is contributing to open source so important?

AM) This is an easier question. There are many tech applications and research that have been written in open source. Basically the whole internet works on Linux, and that is open source.
In contrast, there are software projects that receive corporate funding, either in terms of money or time. This is very true for the Apache ecosystem. The scientific python ecosystem, as well as other scientific programming languages such as R and Julia, are mostly the product of volunteer labor. Most scientific packages don’t have support from industry at all. There are so many people (including students and self-learners) who would not be able to do their work without it. Accessibility to open source is fundamental for education and research. This accessibility leads to opportunities for users that has categorically profound advancement for many sectors of society. The startup community has flourished as a result of this access.

RS) How does one get involved in contributing to open source?

AM) People can reach out to a project on a mailing list. Projects have guidelines on how to contribute, how to get started; they can also sign up for the mailing list. There is an issue tracker on github that lists things people can work on: fix a bug or make a small addition. It’s a good idea to start with something small. The entire process on how to submit a contribution might be complicated; My advice: start small and then go to more interesting stuff. Small contributions really help. Details here: http://scikit-learn.org/dev/developers/contributing.html

RS) What are other open source projects?

AM) Other open source Python data science projects are: numpy, matplotlib, jupyter, pandas and scipy. More details can be found at: scikit-learn.org.
*Both Reshama and Andreas will be attending MLconf NYC on Friday, March 24th. Andreas will be discussing scikit-learn and his O’Reilly book at a table in the networking space during the conference. Mention “Andreas18” and save 18% on a ticket to the event!

About Andreas Mueller

Andreas Mueller is a lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to Machine Learning with Python“, describing a practical approach to machine learning with python and scikit-learn. Dr. Mueller is one of the core developers of the scikit-learn machine learning library, and has been co-maintaining it for several years. Dr. Mueller is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon.

Call For Speakers- 2017 Events

MLconf is calling for speakers for 2017 events!

Algorithms that have graduated academic conferences such as NIPS, ICML, etc and have proven to be effective, robust and scalable in production
Novel data science practices, such as data transformations, new data sources, novel representations, etc
Machine Learning/AI case studies (Lessons learned), demonstrating challenges in the wild and how to handle them in a new way
New platforms, tools for machine learning. Emphasis should be given on the technical challenges, benchmarks and motivation for the development
New business practices, for managing, growing data science teams and expanding machine learning to new domains
Tutorials and novel ways of presenting and simplifying machine learning domains, including: deep learning, kernel methods, bayesian nonparametrics, tensor algebra, etc
Up and coming areas of machine learning that you think will dominate the industry in the future, such as probabilistic programming etc.

Topics we’re looking for:

Natural Language Processing
Deep Learning
Reinforcement Learning
Neural Turing Machines/Neural Theorem Provers
Generative Models
Probabilistic Programming
Probabilistic Logic and Reasoning
Chatbots/ ALL Bots
Bayesian Inference
One Shot Learning
Markov Logic Networks
Structure Learning
Synthetic Art, Biology
Ethics in Machine Learning
Sketching Randomized Algorithms
Game Theory
Community Detection
Large-Scale Clustering
Time Series
Image Analysis
Bayesian Non-Parametrics
Topic Models

Submit Abstract – The deadline schedule is as follows:

MLconf NYC – 01/31/2017 Submission Deadline
MLconf SEA – 03/01/2017 Submission Deadline
MLconf ATL – 06/01/2017 Submission Deadline
MLconf SF – 07/01/2017 Submission Deadline

The selection of the presentation will be based on:

Clarity and novelty of the presentation
Diversity of the topics for the conference
Speaker’s experience in the industry

Guest Blog Post: "Yes, Virginia, You Can Address Causality in Your Dataset: Alex Dimakis on Causality"

MLconf SF 2016 saw many wonderful speakers that took us on a tour of the many business sensitive aspects of modeling Big Data: choosing the right problem to solve, how it’s not always about deep learning; logistic regression, SVMs, and gradient boosting still have a heavy hand to play in solving problems and deriving insights; and how many of the best business solutions require different approaches for different aspects of a problem. Causality of factors in a model is an elusive quality that speaks to many business goals and the core reasons for starting a query, solving a problem, creating a model. Alex Dimakis, at the most recent MLconf in San Francisco, outlined for us one computational framework for ascertaining causality.
What is Causal Inference?
There are many frameworks for investigating causality including philosophical, statistical, & machine learning ones. To name a few of the most popular frameworks, these include Granger causality (used widely in Neuroscience and Economics), Hume’s counterfactuals framework, and Pearl’s structural equation causality (which grew out of Fisher & Wright’s genetic heredity work) that utilizes graphical causality to represent our understanding of causality. The various frameworks for understanding causality can partly be understood as a product of the fact that causality is complex, sometimes evidence for a cause is different across contexts, and the fact that some causes are not deterministic, while others are probabilistic.
Statistically, causation has a history with interpretability of models and model construction. Causal interest in models related to the mechanism, or understanding how the mechanism relates to the phenomenon under study. This requires defining what we mean by mechanism and understanding what we know and how we know it in order to ascertain whether two observations are related/not related causally.

In statistical causality, we try to address the question of whether or not there is a dependency between two variables. Correlations are usually the first blunt tool used to asses whether or not two variables are related and many would argue that with sufficient amounts of data, causation isn’t necessary because vast amounts of data (petabytes, etc) increase reliability of correlations while drowning out noise. So, why would we care about ascertaining causality when we have so much data?
One motivation for causal analyses is that while correlations show how two variables increase or decrease together, sometimes that is all two variables are: two time-series with similar trends. In decision-making, we do not want to rely on two things that correlate by chance or have similar trends but are causally unrelated. A recent example of this sort of effect is Google’s searches /Flu tracker keeping pace with Center for Disease Control (CDC) estimates of flu incidence. Flu incidence and searches related to flu in the Google correlated and seemed to predict one another until they didn’t. There could be many reasons for this: searches being magnified by friends, neighbors, and relatives looking for information related to someone’s flu (but not their own) or the impact of news about the flu on search queries. We know that the Google search predictor of flu incidence is not causal, and it was useful for a time (not all predictors need to be causal in order to be useful), but in an ideal world we would understand the causal factors in order to make decisions.
In assessing independence of observations, we ordinarily consider two uncorrelated variables to be independent from each other (orthogonal). This assumption holds when the observations are Gaussian (continuous measured variables). This doesn’t work for categorical or event-based observations such as “having a cancer diagnosis”, “buying a coat”, “starting a war.” With these variables, you must consider the joint probability of the two observations co-occurring. In a rank one contingency table, one compares the observed rate at which the two variables are observed together against marginalized independently generated data. To check for independency, one checks to see if the matrix is well characterized by the product of the scalars generated randomly/independently. If the contingency table approximates the independently generated data, the variables are independent. If not, the events/variables may have a dependent relationship that ought to be explored.
Exploiting Conditional Dependency Graphs in order to infer causality: Interventions
Alex Dimakis introduces Pearl’s structural equation framework for causal reasoning by introducing directed graphs first. Directed graphs can be used to represent conditional dependencies. For example, A-> B -> C, would represent variables A and C that have a relationship to each other conditioned on B. This is A and C are related to each other, but mediated or partially affected by variable B. Learning a directed graphical model is a first step to learning the causal relationships amongst variables. With enough data, one can learn all of the conditional independencies in the graph. Then, we can query the graph about directions of effects. For Pearl, the central question is about the direction of effects: did A cause B or did B cause A, or are these two variables not related to each other? However, we know that joint probability distributions on two random variables can be factorized in both ways and can be true in both the world that A causes B and when B causes A. So joint probability distributions are not enough. Therefore, we need an instrumental variable to intervene. An intervention can be a control group, or a known precondition, or a temporal factor that appears or is forced upon the observations before the event. In this case, if the observations are truly independent, your intervention cases will not differ from the other cases in event incidence because they are both randomly assigned. If you observe a difference, then you have a causal relationship between your two variables because they do not co-occur randomly.
One way to learn causal relationships of variables in a complex directed graph (Skeleton Graph) is to intervene on a set of variables (see graph representation). By intervening on a set of variables, one learns the directions of relationships between those variables intervened on and the edges of those variables connected that were not intervened on. After learning causal directions, one can design another intervention to learn more directions of the graph. There are two ways to proceed after this: either deciding a priori which order the interventions will be done, or deciding the order of interventions on the graph adaptively depending on the outcome of previous interventions (or a version of adaptive interventions by designing randomized adaptive interventions). This is an active area of research, including work done in Alex’s group, where the efficiency of learning causal directions of graphs is measured. Some groups have found that fixed interventions are enough to learn all you can from the Skeleton Graph about causal directions and adaptive interventions do not add more information, but there is some evidence to suggest that highly randomized adaptive interventions do result in better information. This is an active area of research.

The slides of Alex’s talk are available here and you can watch his full talk here. Dimakis recommends a course by Richard Scheines available online free of charge, here. If you are interested in learning more about this topic, Samantha Kleinberg, who spoke at MLconf NYC this year, has a book on the subject “Why: A Guide to Finding and Using Causes” and a video talk available here. MLconf has a code you can use to get this book with O’Reilly.

About the blogger: Ana Maria Fernandez is a PhD Candidate, Clinical and Molecular Medicine, at University of Edinburgh and an active member of the MLconf community.

The 2016 Winner of the MLconf Industry Impact Student Research Award, Sponsored by Google, Has Been Announced!

The winner of the 2016 MLconf Industry Impact Student Research Award, which is sponsored by Google has been announced. Our committee has reviewed several nominees and found Tianqi Chen’s research on XGBoost and MXNet to be the most impactful and interesting for future developments in industry.
Tianqi Chen is the winner of the 2016 MLconf Industry Impact Student Research Award! This announcement was made on Friday, November 11th, 2016 in San Francisco. Tianqi accepted via a video acceptance speech (available here).
In 2015, there were 2 winners of the award, including Viriginia Smith (UC Berkeley) whom presented on November 11, 2016 at MLconf SF and Furong Huang (UC Irvine) whom presented at MLconf NYC in April 2016.
Tianqi has been invited to present his work on XGBoost in Seattle at MLconf in 2017. His advisor, Dr. Carlos Guestrin, has presented at MLconf numerous times as well.
Tianqi works at the intersection of learning and systems. He has built many scalable learning systems. His research focuses on scalable boosted trees and work on a package XGBoost, which is widely used for competitive ML and in the industry for supervised learning problems where you train data to predict another variable because it provides parallelized boosted trees that run in an efficient and accurate way. XGBoost is available in many distributed environments for production such as Hadoop, MPI, SGE, Flink, & Spark, and in many preferred languages such as python, R, Julia, java, scala. The framework constructs tree ensembles. It is not easy to train trees at once, so XGBoost takes an additive model and trains one tree, uses the information from it and adds another tree. Then, after the tree ensembles are created, the model needs to be regularized. First, the complexity is defined in order to regularize the model and better understand what information is being learned. Regularization is one part most tree packages treat less carefully, or ignore. This is because the traditional treatment of tree learning emphasized improving impurity, while complexity control was left to heuristics. By defining it formally, we understand it better and it works well in practice. One can derive a structure score and a goodness-of-fit measure for the tree ensemble.
Tianqi is also well-known for his contribution to work on MXNet. MX stands for mix and minimize and is a dynamic dependency scheduler that automatically parallelizes both declarative and imperative operations. The heart of MXNet is NNMVM an intermediate layer just like LLVM. the abstraction to NNVm allows several just in time code optimization s that significantly boost the performance. MXNet as a competitor to TensorFlow is widely recognized as it has been heavily invested in by Amazon.

Bio:
Tianqi holds a bachelor’s degree in Computer Science from Shanghai Jiao Tong University, where he was a member of ACM Class, now part of Zhiyuan College in SJTU. He did his master’s degree at Changhai Jiao Tong University in China on Apex Data and Knowledge Management before joining the University of Washington as a PhD. He has had several prestigious internships and has been a visiting scholar including: Google on the Brain Team, at Graphlab authoring the boosted tree and neural net toolkit, at Microsoft Research Asia in the Machine Learning Group, and the Digital Enterprise Institute in Galway Ireland. What really excites Tianqi is what processes and goals can be enabled when we bring advanced learning techniques and systems together. He pushes the envelope on deep learning, knowledge transfer and lifelong learning. His PhD is supported by a Google PhD Fellowship.

MLconf SF 2016 Speaker Resources

We recently asked the speakers of MLconf San Francisco 2016 to share their favorite articles, books & papers with the MLconf audience. We hope you find this list interesting and educational!

Daria Sorokina, Applied Scientist, A9(Amazon)

Amazon Search: The Joy of Ranking Products

Stephanie deWet, Software Engineer, Pinterest

Yunsong Guo. Pinnability: Machine Learning in the Pinterest Home Feed. https://engineering.pinterest.com/blog/pinnability-machine-learning-home-feed
Deepak Agarwal, Bee-Chung Chen, Rupesh Gupta, Joshua Hartman, Qi He, Anand Iyer, Sumanth Kolar, Yiming Ma, Pannagadatta Shivaswamy, Ajit Singh, and Liang Zhang. 2014. Activity ranking in LinkedIn feed. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). ACM, New York, NY, USA, 1603-1612. DOI: http://dx.doi.org/10.1145/2623330.2623362
Hao Ma, Xueqing Liu, and Zhihong Shen. 2016. User Fatigue in Online News Recommendation. In Proceedings of the 25th International Conference on World Wide Web (WWW ’16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1363-1372. DOI: http://dx.doi.org/10.1145/2872427.2874813
Ewa Dominowska. Generating a Billion Personal News Feeds. MLConf SEA 2016. live talk. https://www.youtube.com/watch?v=iXKR3HE-m8c

Virginia Smith, Researcher, UC Berkeley

CoCoA: A General Framework for Communication-Efficient Distributed Optimization. V. Smith, S. Forte, C. Ma, M. Takac, M. I. Jordan, M. Jaggi. Preprint, 2016. https://arxiv.org/abs/1611.02189.
Adding vs. Averaging in Distributed Primal-Dual Optimization. C. Ma, V. Smith, M. Jaggi, M. I. Jordan, P. Richtarik, M. Takac. International Conference on Machine Learning (ICML ’15). https://arxiv.org/abs/1502.03508.
Communication-Efficient Distributed Dual Coordinate Ascent. M. Jaggi, V. Smith, M. Takac, J. Terhorst, S. Krishnan, T. Hofmann, M. I. Jordan. Neural Information Processing Systems (NIPS ’14). https://arxiv.org/abs/1409.1458.

Guy Lebanon, Director of Machine Learning & Data Science, Netflix

Blog Post: Selecting the best artwork for videos through A/B testing

Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineering, University of Texas at Austin

Software and datasets:

Tuebingen Benchmark:
https://webdav.tuebingen.mpg.de/cause-effect/
Tetrad project:
http://www.phil.cmu.edu/projects/tetrad/
Entropic Causality:
https://github.com/mkocaoglu/Entropic-Causality

Video Tutorials:

https://www.youtube.com/watch?v=9yEYZURoE3Y&feature=youtu.be
CCD Summer Short Course 2016
CMU Center for Causal Discovery short course on Causality and Tetrad.
https://www.youtube.com/watch?v=PpY7Slo57XQ
Tutorial: All of Causal Discovery (by Frederick Eberhardt)

Books and Papers:

P. Spirtes, C. Glymour and R. Scheines, Causation, Prediction, and Search. Bradford Books, 2001.
https://www.amazon.com/Causation-Prediction-Adaptive-Computation-Learning/dp/0262194406
Causality by J. Pearl
Cambridge University Press, 2009.
https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X
Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction,
G. Imbens and D. Rubin
https://www.amazon.com/Causal-Inference-Statistics-Biomedical-Sciences/dp/0521885884
Jonas Peters, Peter Buehlmann and Nicolai Meinshausen (2016)
Causal inference using invariant prediction: identification and confidence intervals
Journal of the Royal Statistical Society, Series B
https://www.statslife.org.uk/files/rss-preprint-causal-inference-may-2016.pdf
Frederich Eberhardt, Clark Glymour, and Richard Scheines.
On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables.
http://www.jmlr.org/proceedings/papers/v6/eberhardt10a/eberhardt10a.pdf
Alain Hauser and Peter Buhlmann. Two optimal strategies for active learning of causal models from interventional data.
International Journal of Approximate Reasoning, 55(4):926–939, 2014.
http://leo.ugr.es/pgm2012/submissions/pgm2012_submission_11.pdf
Learning Causal Graphs with Small Interventions
K. Shanmugam, M. Kocaoglu, A.G. Dimakis, S. Vishwanath (NIPS 2015)
https://papers.nips.cc/paper/5909-learning-causal-graphs-with-small-interventions.pdf
Nonlinear causal discovery with additive noise models,
Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, Bernhard Scholkopf (NIPS 2008)
http://is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/NIPS2008-Hoyer-neu_5406[0].pdf

Daniel Shank, Data Scientist, Talla

Implementations:

Tensorflow: https://github.com/carpedm20/NTM-tensorflow
Go: https://github.com/fumin/ntm
Torch: https://github.com/kaishengtai/torch-ntm
Node.JS: https://github.com/gcgibson/NTM
Lasagne: https://github.com/snipsco/ntm-lasagne
Theano: https://github.com/shawntan/neural-turing-machines

Papers:

Graves et al. 2016 – Hybrid computing using a neural network with dynamic external memory
Graves et al. 2014 – Neural Turing Machines
Yu et al. 2015 – Empirical Study on Deep Learning Models for Question Answering
Rae et al. 2016 – Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

Harm van Seijen, Research Scientist, Maluuba

Code Examples:

Simple DQN Example In Python:
https://edersantana.github.io/articles/keras_rl/
Tool For Testing/Developing RL Algorithms:
https://gym.openai.com/

Books On Display At MLconf San Francisco

Morgan & Claypool:

Use “MLCON16” to save 20% off through December 31^st, 2016: http://store.morganclaypool.com
Cambridge University Press:

Additional Machine Learning & Aritificial Intelligence Books on Display