We’ve been quite busy, preparing for MLconf Seattle in 3 ½ weeks, but I wanted to take a few moments to share the highlights from our 3rd MLconf NYC, which occurred on April 15th. This year, 400 people gathered at 230 5th Avenue, in Midtown NY. We were quote pleased that the AV and video quality turned out great. Everything seem to flow smoothly except from some glitches here and there. Lunch lines were a bit too long, but thanks to the heroic efforts of the crew the problem got resolved. Attendees were excited that we had more than 100 books to give away, thanks to our generous publishers. The general consensus was that attendees had a great time and the presentation room was full for the full event, through to the last presentation. The floating photographer caught many of nice moments you can find in our Facebook page here.
EEG technology seems to have a sweet spot, last year Ted Willke and Irina Rish astounded the crowd with their stories. This time, it was Jennifer Marsman’s turn with her energetic presentation and futuristic EEG device that excited the audience. Braxton McKee’s talk about how he envisions scalable platforms with more automation and help from the compilers and program analysis seemed to create discussions in the hallways. This MLconf seemed to raise a bit of controversy and discussion during the breaks.
One of the core values of MLconf has been algorithms. We are always looking into new applications and platforms, but algorithms remain an audience favorite. This year it was online algorithm’s turn during the keynote presentation, by Yahoo’s Research Director, Edo Liberty. Sergei Vassilvitskii, Research Scientist at Google, presented on the good old k-means algorithm and recent advances there. Along the same lines, Samantha Kleinberg’s time’s series talk about finding causes, rather than correlations, when the data are highly noisy and often missing captivated the audience pretty early. And that was just the start, Furong Huang’s Tensors, Yael Elmatad’s stable marriages and even more kept the spirit going.
But machine learning is not theory and algorithms only. This time speakers provided their favorite papers and their github repositories here. We’re also creating a new x-ray quiz that will contain all the essentials and key points of MLconf NYC 2016! We’re hoping this “Lessons Learned” tool will aid in content retention for attendees, following each conference. Details to follow soon..
– Nikolaos Vasiloglou, Technical Chair, MLconf
MLconf NYC Speaker Suggested Papers
We recently asked the speakers of MLconf NYC 2016 to share their favorite papers with the MLconf audience. We hope you find this list interesting and educational!
Kaheer Suleman, CTO, Maluuba
Pointer Networks
Oriol Vinyals, Meire Fortunato, Navdeep Jaitly
http://arxiv.org/abs/1506.03134
Grammar as a Foreign Language
Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton
http://arxiv.org/abs/1412.7449
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau
http://arxiv.org/abs/1507.04808
Skip-Thought Vectors
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler
http://arxiv.org/abs/1506.06726
Minimally Constrained Multilingual Word Embeddings via Artificial Code Switching, Michael Wick, Pallika Kanani, Adam Pocock
https://blogs.oracle.com/IRML/entry/minimally_constrained_word_embeddings_via
Samantha Kleinberg, Assistant Professor of Computer Science, Stevens Institute of Technology
Deming, data and observational studies
Young, S. Stanley, and Alan Karr, Significance 8.3 (2011): 116-120.
Homophily and contagion are generically confounded in observational social network studies
Shalizi, Cosma Rohilla, and Andrew C. Thomas., Sociological methods & research 40.2 (2011): 211-239.
How to grow a mind: Statistics, structure, and abstraction. Science
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011), 331(6022):1279–1285.
(computational models of cognition, which give some inspiration to machine learning methods)
Personalized nutrition by prediction of glycemic responses
Zeevi, David, et al. Cell 163.5 (2015): 1079-1094.
Ike Nassi, Founder, TidalScale
Computing Marginals Using MapReduce
Ullman, et. al.
http://arxiv.org/abs/1509.08855
Framework for an In-depth Comparison of Scale-up and Scale-out
Sevilla, Ioannidou, Nassi, et. al.
https://issdm.soe.ucsc.edu/sites/default/files/sevilla-discs13.pdf
Lei Yang, Senior Engineering Manager, Quora
Hidden technical debt in machine learning systems
Mastering the game of Go with deep neural networks and tree search
Predictability of popularity
Jennifer Marsman, Principal Developer Evangelist, Microsoft
Paper behind the Emotion Detection API in Microsoft Cognitive Services
Deep Neural Decision Forests. [Winner of the David Marr Prize 2015]
Microsoft Research publications
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo
Simple and scalable response prediction for display advertising
O. Chapelle et al.
One-Pass Ranking Models for Low-Latency Product Recommendations
A. Freno et al.
Ad Click Prediction: a View from the Trenches
H. B. McMahan et al.
Yael Elmatad, Senior Data Scientist, Tapad
Finding Connected Components in Map-Reduce in Logarithmic Rounds
http://arxiv.org/pdf/1203.5387.pdf
College Admissions and the Stability of Marriage
Gale, D.; Shapley, L. S. (1962). American Mathematical Monthly 69: 9–14. doi:10.2307/2312726. JSTOR 2312726.
MLconf in 30 minutes!
You now have the opportunity to go through MLconf content in less than 30 minutes. We created an interactive quiz with the most relevant questions from the material from presentations at MLconf SF. Take a guess on the answer and we will give you feedback by pointing you to the video/slides snippet that has the answer. This is a great opportunity for those of you who couldn’t attend to get an X-ray of the conference. Also if you did attend the conference and you want to test your comprehension, taking the test is the way to go.
For a limited time, we’re offering the X-ray of 2015 MLconf San Francisco for 25$. Click here and start!!
Interview with Ike Nassi from TidalScale
I came across TidalScale two years ago and I was very impressed with their vision to synthesize very large shared memory virtual machines by using commodity servers. A product like that would eliminate the need for building distributed software. Let’s see what they said in their interview.
What was the motivation for starting Tidal Scale? What was the gap that you found in the market?
Let me answer that from two different directions, the first direction is that I was the chief scientist at SAP and we have developed this product which was called SAP Hana. So one of the things that I observed and had to convince people at SAP was that if you’re interested in in-memory databases, you need a lot of memory! It’s just that simple. And so before I left SAP I wound up building around 15 large memory systems. And after I left SAP I became a professor and you know for a couple months didn’t think too much more about that problem and then one day I’m just thinking about it more and more and I’m saying, “You know, my observations were the right observations, my instincts were correct” and so I started TidalScale. So that’s one answer. I thought in-memory computing and in-memory databases for big data in data science were absolutely crucial and if you can do it without having to modify your database or software that’s even better. So that was the first thing. The second thing was that at a lower level, at a more fundamental level, processor core densities have been going up quite nicely over the years, the memory density is also going up but not at the same rate. The core densities (the number of cores on a chip) are increasing at a much greater rate than the memory densities. And the piece that I realized was that the ratio between the two has been going down and that’s not what people want especially for enterprise software applications. They want more control over that ratio and they don’t want that ratio to go down. At best they want that ratio keep the same or even go up. And the reason why you can’t just keep putting more and more memory is a function of the pin count on the processors. These processors are getting more and more pins. You need pins to communicate with other processors, to address other memory, to transfer the data on the data buses. The results as you know I’m sure have been staggering actually, we have one benchmark, well not a benchmark but a real customer workload, one of our first beta customers we were able to show 60x performance improvement the very first time our customer tried our software.
In what sense was it 60x, what was their system benchmarking against?
It was MySQL. It was three SQL queries on a large MySQL database and what we found, which should not be surprising, is that if you put a large INNODB cache in a MySQL configuration you have basically converted MySQL to an in-memory database and so you can reduce the amount of paging you have to do.
What is the main idea behind the Tidalscale product, what is the science behind it?
So back in 1968, Peter Denning wrote a paper in the Communications of the ACM in which he defined the term ‘Working Set’. And working set at the time was just memory but it had a profound effect if you used the working sets to help schedule processes in computer systems. Because if you could keep track of the working sets, the memory working sets, and you had the ability to anticipate the probable needs of the processor to use a certain set of pages and if you guarantee those pages to be in memory it could speed things up quite a lot. So what we did at Tidalscale was we virtualized not only memory but all of the resources in the system so we virtualized the processors, the memory, the ethernet, the disks, the storage controllers we basically virtualized everything and then we did something that nobody else is doing. We built in the code to do dynamic migration not only of memory but of processors as well. So if you have a processor that is trying to access a page that is not local to the processor we can either move the processor or move the memory. And we can make that choice dynamically in microseconds and if we do the job of managing these working sets then there’s no traffic no network traffic on the interconnect and the machine works at speed and we do that in such a way that it’s compatible with everything.
So was Jim Grey halfway right when he said move computations to data? It seems to me that you believe that you can move data to computations.
Well we do that at a very very very low level.
You also believe that you can also still move the computations to the data but you can also move the data to the computations if necessary.
Correct. But we do that dynamically. The pattern of memory access is not something we anticipate or build in in the beginning we just react to whatever happens.
What kind of applications are really ideal for Tidal Scale? What are the ones that do not perform well?
Let me answer the first question first. We like applications that need a lot of memory. Those might be programs written in R or Python or they’re using graph databases or in-memory SQL databases or even non-SQL databases. We are in active discussions with people doing biomedical engineering, specifically computational genomics, people doing large scale simulations either electronic design automation or other discrete event simulations. We have customers that consistently tell us they can’t run simulations this large and we have been able to run very very large simulations for them. There has been only one case that we did not do well. After doing analysis we realized that their algorithm had a lot of random accesses for which we could not manage working sets. We actually helped them rewrite it with better memory access patterns and it worked.
What is the software complexity of TidalScale? Would it be easy for the open source community to replicate it?
I doubt it would be that easy. We are a team of highly specialized people in operating systems with kernel hacking skills who have worked together several years to get the product together. There are a lot tricks and heuristics to make the system work well. And of course a lot of machine learning behind predicting the memory access patterns.
One of the problems with deep learning systems like Theano, TensorFlow, Torchetc, is that it is hard to run on multi-gpu hardware. Most of them don’t support it and even when they do, the user has to distribution manually. How easy would it be for Tidal Scale to virtualize gpu systems?
We have not tried to integrate GPU’s at this time although there is no strong technical difficulty in doing so. We do plan to do it when we can convince ourselves there’s sufficient customer demand. In order to emulate a GPU, we have to do some very low level interface emulation work that so far hasn’t made it to the top of our priority list.
Ike Nassi, Founder, Tidalscale
Book Giveaways at MLconf NYC!
Our generous publishers are sending books again to MLconf NYC. Make sure to grab coupons, as they’ll be offering exclusive book discounts at MLconf! We’ll be hosting book giveaways at the conclusion of the event for the most unique tweets that mention @mlconf and/or #mlconfnyc. Participating publishers include: Now Publishers, MIT Press, Cambridge University Press. CRC Press, Springer and O’Reilly Media.
Now Publishers
Learning Deep Architectures for AI, Yoshua Bengio, now publishers
Bayesian Reinforcement Learning: A Survey, Mohammad Ghavamzadeh | Shie Mannor | Joelle Pineau | Aviv Tamar, now publishers
An Introduction to Conditional Random Fields, Charles Sutton | Andrew McCallum, now publishers
Kernels for Vector-Valued Functions: A Review, Mauricio A. Álvarez | Lorenzo Rosasco | Neil D. Lawrence, now publishers
Online Learning and Online Convex Optimization, Shai Shalev-Shwartz, now publishers
Convex Optimization: Algorithms and Complexity, Sébastien Bubeck, now publishers
An Introduction to Matrix Concentration Inequalities, Joel A. Tropp, now publishers
Explicit-Duration Markov Switching Models, Silvia Chiappa, now publishers
Adaptation, Learning, and Optimization over Networks, Ali H. Sayed, now publishers
Theory of Disagreement-Based Active Learning, Steve Hanneke, now publishers
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Rémi Munos, now publishers
Learning with Submodular Functions: A Convex Optimization Perspective, Francis Bach, now publishers
Backward Simulation Methods for Monte Carlo Statistical Inference, Fredrik Lindsten | Thomas B. Schön, now publishers
Graphical Models, Exponential Families, and Variational Inference, Martin J. Wainwright | Michael I. Jordan, now publishers
MIT Press
Introduction to Machine Learning, third edition, hc, Alpaydin, MIT Press
Building Ontologies with Basic Formal Ontology, pb, Arp, MIT Press
Signals and Boundaries, pb, Holland, MIT Press
Fundamentals of Machine Learning for Predictive Data Analytics, hc, Kelleher, MIT Press
Foundations of Machine Learning, hc, Mohri, MIT Press
Advanced Structured Prediction, hc, Nowozin, MIT Press
Practical Applications of Sparse Modeling, hc, Rish, MIT Press
Boosting, pb, Schapire, MIT Press
Optimization for Machine Learning, hc, Sra, MIT Press
Machine Learning in Non-Stationary Environments, hc, Sugiyama, MIT Press
Artificial Cognitive Systems, hc, Vernon, MIT Press
Cambridge University Press
Statistical Methods for Recommender Systems, Agarwal | Chen, Cambridge University Press
Interactions with Search Systems, White, Cambridge University Press
Introduction to Random Graphs, Frieze | Karonski, Cambridge University Press
Computational Social Science, Alvarez, Cambridge University Press
Privacy, Big Data, and the Public Good, Lane et al, Cambridge University Press
Machine Learning, Flach, Cambridge University Press
Understanding Machine Learning, Shalev-Shwartz | Ben-David, Cambridge University Press
Data Mining and Analysis, Zaki | Meira, Cambridge University Press
Bayesian Reasoning and Machine Learning, Barber, Cambridge University Press
Mining of Massive Data Sets, Leskovec et al, Cambridge University Press
Social Media Mining, Zafarani et al, Cambridge University Press
Truth or Truthiness, Wainer, Cambridge University Press
Twitter: A Digital Socioscope, Mejova et al, Cambridge University Press
Causal Inference, Imbens | Rubin, Cambridge University Press
A Gentle Introduction to Optimization, Guenin et al, Cambridge University Press
CRC Press
Text Mining and Visualization: Case Studies Using Open Source Tools, Hoffman | Chisholm, CRC Press
Handbook of Big Data, Bühlmann | Drineas | Kane | Laan, CRC Press
Accelerating Discovery: Mining Unstructured Information for Hypothesis Generation, Spangler, CRC Press
Statistical Learning with Sparsity: The Lasso and Generalizations, Hastie | Tibshirani | Wainwright, CRC Press
Statistical Reinforcement Learning: Modern Machine Learning Approaches, Sugiyama, CRC Press
Machine Learning: An Algorithmic Perspective, Second Edition, Marsland, CRC Press
Sparse Modeling: Theory, Algorithms, and Applications, Rish | Grabarnik, CRC Press
Computational Trust Models and Machine Learning, Liu | Datta | Lim, CRC Press
Regularization, Optimization, Kernels, and Support Vector Machines, Suykens | Signoretto | Argyriou, CRC Press
Data Classification: Algorithms and Applications, Aggarwal, CRC Press
Springer
Data Mining: The Textbook, Aggarwal, Springer
Bayesian Computation with R, Albert, Springer
Learning with Partially Labeled and Interdependent Data, Amini | Usunier, Springer
Text Mining with MATLAB®, Banchs, Springer
Pattern Recognition and Machine Learning, Bishop, Springer
Principles of Data Mining, Bramer, Springer
Machine Learning in Medicine – Cookbook, Cleophas | Zwinderman, Springer
Robotics, Vision and Control: Fundamental Algorithms in MATLAB, Corke, Springer
Introduction to Evolutionary Computing, Eiben | Smith, Springer
Chance Rules: An Informal Guide to Probability, Risk and Statistics, Everitt, Springer
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie | Tibshirani | Friedman, Springer
An Introduction to Statistical Learning, James | Witten | Hastie | Tibshirani, Springer
Statistical Analysis of Network Data with R, Kolaczyk | Csárdi, Springer
An Introduction to Machine Learning, Kubat, Springer
Twitter Data Analytics, Kumar | Morstatter | Liu, Springer
Web Data Mining, Liu, Springer
Big Data: A Primer, Mohanty | Bhuyan | Chenthati, Springer
Big Data Imperatives, Mohanty | Jagadeesh | Srivatsa, Springer
Bayesian Networks in R, Nagarajan | Scutari | Lèbre, Springer
Emerging Paradigms in Machine Learning, Ramanna | Jain | Howlett, Springer
Diffusion in Social Networks, Shakarian | Bhatnagar | Aleali | Shaabani | Guo, Springer
All of Statistics, Wasserman, Springer
Data Mining with Rattle and R, Williams, Springer
A Beginner’s Guide to R, Zuur | Ieno | Meesters, Springer
Misc Publishers
How to Create a Mind: The Secret of Human Thought Revealed, Kurzweil
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, Domingos
Big Data: Principles and Best Practices of Scalable Realtime Data Systems, Marz | Warren
Superforecasting: The Art and Science of Prediction, Tetlock, Gardner
Data Smart: Using Data Science to Transform Information into Insight, Foreman
Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, Provost | Fawcett
Data Science from Scratch: First Principles with Python, Grus
Competing on Analytics: The New Science of Winning, Harris | Davenport
Naked Statistics: Stripping the Dread from the Data, Wheelan
The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t, Silver
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Siegal
The End of Average: How We Succeed in a World That Values Sameness, Rose
Breaking into Machine Learning
As more professionals clamor to enter the fields of data science and machine learning, a new generation of educational institutions has arrived on the scene. Business schools offer new executive education programs, such as CS departments specialized programs in data science. We’re also seeing a surge of organizations that offer programs of intense training within the duration of a few weeks, which are gaining notoriety and popularity. Having several options is great, but it can often be confusing to decide which direction to follow. One must ask the question- “which program is the right for me?”
We feel that a sampling approach is a great method to discover which program is the best fit for each individual. This April, MLconf NYC attendees will have the opportunity to experience a demo class from METIS. Mike Galvin will present “An introduction to word2vec and working with text data”. In this session, Galvin will introduce the basics of working with textual data using various Python libraries while several examples of real-world applications are offered. The class will start by introducing the basic bag of word representation and move onto other models with an emphasis on word2vec. By the end of this talk, participants will be able to use Python to explore and build their own models on text data.
If you are attending MLconf, sign up here to reserve your spot!
MLconf will be adding a training session soon! Stay tuned for more details!
-Nik Vasiloglou, Technical Chair, MLconf