Our past Technical Chair, discussed Erin LeDell’s upcoming talk: Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches at MLconf Seattle, scheduled for May 20th.
Xavier Amatriain responded to Pedro Domingos with a tweet, that ensembling is the master algorithm. Would you agree?
EL) If there is one supervised learning algorithm that I’d consider a “master algorithm,” then yes, I would consider ensembling — particularly stacking, also known as “Super Learning”, to be this algorithm. The “best” supervised learning algorithm that exists now or is invented in the future can always be incorporated into a Super Learner to achieve better performance. That’s the power of stacking. I don’t believe that there’s a single algorithm that can consistently outperform all other algorithms on all types of data — we’ve all heard the saying that there’s “there’s no free lunch”; that’s why ensembles are generally more useful than a single algorithm. The “No Free Lunch Theorem” is credited to David Wolpert, the inventor of stacking, incidentally.
The way Pedro Domingos describes the “master algorithm” is more than just a supervised learning algorithm — he talks about the importance of having a feedback loop, or a holistic learning system that includes machine learning algorithms at the center. Then there’s also Artificial General Intelligence (AGI), which is probably more of a master algorithm than anything else. For more information about the current state of AGI, look into the work of Ben Goertzel.
Does ensembling destroy model interpretability? A decision tree offers a nice path to explaining predictions, but a forest?
EL) Ensembles are inherently more complex and opaque than their singleton counterparts. However, anyone who cares about model performance will have a hard time justifying the use of a decision tree or linear model over an ensemble learner.
Some use-cases value model interpretability over model performance and that’s where I’ve seen people make this trade-off. However, there is a lot of work going into black-box model interpretability and I am optimistic about some of the recent advancements in that area. For example, Local Interpretable Model-Agnostic Explanations (LIME), is a method for explaining the predictions of any classifier.
What was the impact of your recent scalable implementation at H2O? What was the speedup versus the legacy implementation? What was the impact for your current customers?
EL) The H2O Ensemble project is a truely scalable ensemble learner because it is built on top of H2O, an open source, scalable, distributed machine learning platform. H2O Ensemble implements the stacking algorithm for big data.
I suppose the “legacy” stacking implementation is the SuperLearner R package. Other than Weka’s Stacking function, the SuperLearner and subsemble R packages contain the only proper stacking implementations that I’m aware of. The speed-up of H2O Ensemble over the SuperLearner package is essentially unlimited because there is no limit to the size of an H2O cluster. There are more details in my dissertation.
I think it’s fairly common for companies to deploy simple linear models into production because it’s the easiest algorithm to scale. Some of the more sophisticated teams deploy a GBM or Random Forest. This leaves a lot of model performance (and associated revenue) on the table. When model performance translates directly to revenue, the advantages of ensembling are very convincing.
Have you ever seen a case where ensembling was not a good choice?
EL) Yes, there are cases where I’ve been able to achieve equal or slightly better performance using a single Gradient Boosting Machine (GBM), for example, than I have with an ensemble containing that same GBM. This means that the GBM is able to approximate the true prediction function well, and I haven’t done a good enough job of creating a diverse set of base learners for my ensemble, or I’ve made a poor choice of metalearning algorithm for the Super Learner.
This situation is far less likely when using stacking (vs some more basic ensembling scheme), however it is important to acknowledge that it happens from time to time. In addition to the causes above, I’ve seen this happen when I’ve spent a lot of time fine-tuning one or more of the constituent algorithms, or when there is not enough training data.
How easy it is to jump from Biostatistics to Data Science? Is the reverse path harder?
EL) For me, there was no division between the two. As a Biostatistics PhD student, I used real-life, clinical datasets as my motivation for developing new algorithms and software, which is very close to what I do in industry now. Not all biostatisticians spend as much time developing software as I did, but I think there is a good overlap between the two. People who chose a Biostatistics PhD program over a Statistics PhD program may be more interested in developing methodology and software for messy real-life data and applications.
As for the reverse path, as long as you have a math/stats background, then I think the transition from Data Science to Biostatistics/Statistics would be easy.
Is the machine learning/data science hype stagnating life sciences by attracting the best talent? Is there a way to steer scientists into curing cancer rather than increase ad clicking?
EL) It’s true that a lot of data scientists come from a pure science background — there are many astronomers, physicists, engineers who have left their respective fields for a career in data science. I have many friends with PhDs (from very good schools!) who can’t find a decent academic or industry job in their area of research. That is a shame, no doubt. The issue is not that the data science hype is causing these folks to leave their fields and enter data science — it’s that our country/society does not prioritize basic research enough to support research-based career paths. Rather than going to work at a consulting firm or applying for their third post-doc, data science is offering these folks a career using many of their highly-valuable skills such as like data analysis and programming (although often in a different subject area).
I think how you apply your data science / machine learning skillset matters a great deal. Some people say that “data is just data” and therefore it doesn’t matter how they apply their skills, but whether that’s a belief you subscribe to or not — everyone is responsible for what they create in this world.
We are at a point where Data Scientists and Computer Scientists hold quite a bit of power and we have an opportunity to decide how and where we’d like to wield this power. Perhaps this will upset some people who work in this industry (or maybe they will even agree with me), but I believe that ad-click related jobs are a horrible waste of our collective human potential.
The healthcare industry, at least in the United States, is one of the least nimble and technologically backwards industries that we have. It is hard to innovate in this area, which might be a reason why smart, talented people stay away. On the other hand, we are starting to see new companies applying machine learning to healthcare in highly innovative and exciting ways. For example, Jeremy Howard’s Enlitic is applying deep learning to medical imaging. The more successful examples that we have of this type of thing, the more that will inspire data scientists to leave their ad-click jobs for careers that lead to curing cancer.

Erin LeDell, Machine Learning Scientist, h2o.ai
Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc.
Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from UC Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Interview with Kristian Kersting, Associate Professor for Computer Science, TU Dortmund University
Our past Technical Chair, discussed Kristian Kersting’s upcoming talk: Declarative Programming for Statistical ML at MLconf Seattle, scheduled for May 20th.
Why do you think expressing machine learning in a relational way would democratize machine learning?
KK) Consider a typical machine learning user in action solving a problem for some data. She selects a model for the underlying phenomenon to be learned (choosing a learning bias), formats the raw data according to the chosen model, and then tunes the model parameters by minimizing some objective function induced by the data and the model assumptions. Often, the optimization problem solved in the last step falls within a class of mathematical programs for which efficient and robust solvers are available. Unfortunately, however, today’s solvers for mathematical programs typically require that the mathematical program is presented in some canonical algebraic form or offer only some very restricted modeling environment. The process of turning the intuition that defines the model “on paper” into a canonical form could be quite cumbersome. Moreover, the reusability of such code is limited, as relatively minor modification could require large modifications of the code. This is where declarative modelling languages such as RELOOP enter the stage. They free the machine learning user from thinking about the canonical algebraic form and instead help focusing on the model “on paper”. They allow the user to abstract from entities and in turn to formulate general objectives and constraints that hold across different situations. All this increases the ability to rapidly combine, deploy, and maintain existing algorithms. To be honest, however, relations are not everything. This is why we embedded the relational language into an imperative language featuring for-loops.
The big data hype has lead a lot of people focusing on improving scalability of exotic algorithms. You have chosen two: Linear Programming and Quadratic Programming to do machine learning and focus more on feeding the data in. What are the pros and cons of this approach?
KK) We just started with LPs and QPs since they are the working horses of classical machine learning. This ways we build a basis for what might be called relational statistical learning. Afterwards, we will move on to other machine learning approaches, even maybe deep learning approaches.
Do you think it’s more important for a data scientist to easily customize the objective and encode constraints rather than use a fancy ML algorithm?
KK) Good question. I think this depends on the application. We are collaborating a lot with plant physiologists. They once asked me, what is the biological meaning of an eigenvector’? Quite difficult to answer. Or consider stochastic gradients. They argued `so you want me to through away 90% of my data? How do I explain this to my students, who spent 200 days in the field to gather the data?’. Similar with deep learning. They want to trust the algorithm, at least in the beginning of data science project, and to gain insights into their data. Here is where focusing on the constraints can help; they might be easier to understand, at least when encoded in a high-level language. Or, consider collective classification, i.e., the classification of one entity may change the classification of a related entity. Typically one uses a kernel to realize this when using support vector machines. However, just placing some additional constraints encoding that related entities should be on the same side of the hyperplane can do the job, too, as this also captures the manifold structure in the high-dimensional feature space. Unfortunately, in contrast to the AI community, the ML community has not really developed a methodology for constraints, yet.
There is big crowd from engineering with expert skills in optimization that has struggled to get into data science and earn the corresponding salary, do you think you are opening a door for them?
KK) Hopefully we can at least help. Optimisation is definitely one of the foundations of statistical machine learning. High-level languages for optimization hopefully help to talk about the models and hence to bridge the disciplines even further.
Do you see the solvers becoming scalable enough so that your approach can be applied to big data? Is there a different path?
KK) Hmm, scalability is always an issue. However, it is not just the solver but the way the solver interacts with the modelling language. Consider e.g. relational models. They often have symmetries that can be used to reduce the model automatically. This is sometimes used lifted inference. And, we have just started to exploit structure within statistical machine learning. Imagine a cutting plane solver that is not computing the mostly violated constraint but the mostly violated and fastest to compute one. As yet another example, one of my PhD students, Martin Mladenov, just found a nice way to combine matrix free optimization methods together with relational languages such as RELOOP. With this we can solve problems involving a Billion of non-zero variables faster than Gurobi. So at least there is strong evidence that a new generation of solvers can scale well, even better than existing ones. Moreover, instead of compiling to an intermediate structure, why not compiling directly into a low-level C/C++ program that implements a problem-specific solver. In a sense, I envision a “-O” flags for machine learning, very much like we know it from C/C++ compilers.
How does your Relational Linear Programing play along with the Relational Databases?
KK) Relational DBs have be the home of high-value, data-driven applications for over four decades. This may explain why you see a push in industry to marry statistical analytic frameworks like R and Python with almost every data processing engine. As a machine learner this is nice as you do not have to worry about the data management and retrieval anymore. However, it is tricky to just map the data from a relational DB into a single table, the traditional representation for machine learning. You are likely to change the statistics. We need a relational machine learning that can deal with entities and relations. We just started with LPs and QPs since they are the working horses of classical machine learning. In the long run, we want to develop a tight integration of Relational Databases and Machine Learning, even maybe something like Deep Relational Machines.

Kristian Kersting, Associate Professor for Computer Science, TU Dortmund University
Kristian Kersting is an Associate Professor for Computer Science at the TU Dortmund University, Germany. He received his PhD from the University of Freiburg, Germany, in 2006. After a PostDoc at MIT, he moved to the Fraunhofer IAIS and the University of Bonn using a Fraunhofer ATTRACT Fellowship. His main research interests are data mining, machine learning, and statistical relational AI, with applications to medicine, plant phenotpying, traffic, and collective attention. Kristian has published over 130 technical papers, and his work has been recognized by several awards, including the ECCAI Dissertation Award for the best AI dissertation in Europe.
He gave several tutorials at top venues and serves regularly on the PC (often at the senior level) of the top machine learning, data mining, and AI venues. Kristian co-founded the international workshop series on Statistical Relational AI and co-chaired ECML PKDD 2013, the premier European venue for Machine Learning and Data Mining, as well as the Best Paper Award Committee of ACM KDD 2015. Currently, he is an action editor of DAMI, MLJ, AIJ, and JAIR as well as the editor of JAIR’s special track on Deep Learning, Knowledge Representation, and Reasoning.
Interview with Avi Pfeffer, Principal Scientist, Charles River Analytics
Our past Technical Chair, discussed Avi Pfeffer’s upcoming talk: Practical Probabilistic Programming with Figaro at MLconf Seattle, scheduled for May 20th.
How would you explain what probabilistic programing is to an excel user or to a company executive?
AP) Probabilistic programming uses programming languages to enable probabilistic machine learning applications. In probabilistic machine learning, you use a probabilistic model of your domain along with inference and learning algorithms to predict the future, infer past causes of current observations, and learn from experience to produce better predictions. Using programming languages, probabilistic programming enables probabilistic machine learning applications to be developed with far less effort than before by providing an expressive language for representing models and general-purpose inference and learning algorithms that automatically apply to models written in the language.
What is the holy grail, the long term vision of Probabilistic Programing?
AP) Our long term vision is to provide a clear, English-like language that a domain expert who has little knowledge of machine learning can use to describe data. A probabilistic programming system would automatically learn a probabilistic model of the domain, without the user needing to choose or configure inference algorithms.
We’ve recently been hearing that big data is a big headache and often big noise. Is probabilistic programing the ibuprofen of big data headaches?
AP) Probabilistic programming is sometimes called “big model” rather than “big data”. The idea is that you can use richer, more detailed models than you would be able to use otherwise. This applies no matter how much data you have.
You are leading a consulting company that offers solutions to clients based on PP. How difficult is it to sell it when everybody lives in the mania of deep learning?
AP) We don’t find it hard to sell probabilistic programming. There are several features in particular that set it apart from deep learning methods. (1) It’s easy to include domain knowledge (and lots of it) in models; (2) probabilistic programming can work well in domains where you don’t have a lot of data; (3) probabilistic programming models are explainable and understandable, whereas deep learning models can be hard to interpret; (4) probabilistic programming can predict outputs that belong to rich data types of variable size, such as sentences or social networks.
Tell us about your new book on Practical Probabilistic Programing. What is the target audience? How come we got a practical book about PP, before an academic book on the subject was published?
AP) Practical Probabilistic Programming aims at helping users, who could be programmers, students, or experts in other areas, understand and use probabilistic programming. I wrote a practical book because my interest over the last few years has been on developing practical tools and applications. The reason we go a practical book rather than an academic book is that I’m not in academia anymore so I feel no pressure to write a theoretical textbook. As a small company, we’re interested in developing applications our customers can use, and I wanted to share that experience.
Can you tell us about a success story with PP? One maybe that deep learning would have failed 🙂
AP) We developed an application that learns the lineage of malware samples. When someone writes a new piece of malware, they often borrow code they or someone else has written in the past. So malware has a lineage or family history. We built an application that extracts all sorts of features of malware, clusters the malware into families, and then computes the most likely lineage of each family. It’s this last step that used probabilistic programming. We combined separate probabilistic models of the timestamps of each malware sample in the family, along with a connecting model of the lineage of each family, and ran our inference algorithms to compute the most likely lineage. The lineages we learned using our probabilistic reasoning technique were much better than ones we got with our previous algorithm. And this is the kind of probabilistic reasoning application that would have been really difficult to program without probabilistic programming.
How hard you think it is to teach a domain expert PP?
AP) With the current state of the art, a domain expert can pick up the basics and be writing simple probabilistic programs in a couple of hours. We’ve had some Figaro novices (Figaro is our probabilistic programming system) develop quite impressive applications in a short time using simple models. However, it takes longer to obtain the knowledge and experience to put together sophisticated applications, particularly in knowing how to set up inference algorithms to work on a given problem. This is why our current research focus is on making inference optimization as automatic as possible. You just press a button and we automatically figure out how to decompose and optimize the problem. We’re also working on developing a language that doesn’t require any knowledge of programming languages to use.

Avi Pfeffer, Principal Scientist, Charles River Analytics
Dr. Avi Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Avi has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational learning, which provides the ability to combine probabilistic and relational reasoning. He is the lead developer of Charles River Analytics’ Figaro probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning the beliefs, preferences, and decision making strategies of people in strategic situations. Avi received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.
Interview with Evan Estola, Lead Machine Learning Engineer, Meetup
Our past Technical Chair, discussed Evan Estola’s upcoming talk: When Recommendations Systems Go Bad at MLconf Seattle, scheduled for May 20th.
Will machine learning fix machine learning? (I mean the ethical side)
EE) I don’t think that machine learning alone will help us make more ethical algorithms. From the most basic view, how can an algorithm ever know that using a particular feature is unethical without a human saying so? I’m definitely excited about progress that can be made in this area, and certainly there are tools to be developed that will help us make better decisions, but in the end ethics is a human decision.
Ethics is something cultural. I mean unethical actions in some cultures are ethical in others. What is the culture of an algorithm? Is it the culture of the author? Most of the recent fallouts did not really reflect programmer’s view, they acted spontaneously. Are the algorithms allowed to have their own culture?
EE) Algorithms are a reflection of the people that use them. As machine learners, we hold the keys to a part of the business that is rarely well understood by leadership. If there is a risk that your algorithm is doing something unethical or even illegal, you have an obligation to let people know. If your organizational values have not been defined, you should make sure to define them before you launch a model that could compromise them.
Expert psychologists often resort to manipulative techniques in order to target vulnerable social groups such as kids in order to push them to spend more. Yes, I am a parent and I have felt that a lot of ads try to do that. Why is it fair for a group of humans to do this and not for a group of algorithms?
EE) Advertising to children is unethical and should be illegal! But this is just an opinion and not expert judgement. We know that targeting a product at a particular group of people is a key area of study in marketing and it has been used to great effect. We also know that discrimination exists. We know that social injustice exists. As computers are trained to make more and more decisions for us, we can influence whether they should encapsulate the bias that exists in our society, or if they should be better.
When does statistical inference become unethical?
EE) Statistical inference is just a tool, it is how we use a tool that makes it good or bad. Statistics will tell us that women make less than men, if we use this to infer that we should pay women less, we are in the wrong. If we use that information to stand up and say “This is wrong and we should fix it”, then the math has done good.
Companies are required, by the law, to show that they take preventative measures in regards to ethics and compliance. Could the FairTest algorithm presented in this MLconf be the new ethics regulation for high Tech? Or maybe sentiment analysis monitoring is a better way?
EE) I love the FairTest algorithm because it helps us solve the difficult problem of identifying when we have a feature that is a proxy for a feature we know we shouldn’t be using. Name can be accidentally used as a proxy for gender or race, zip code a proxy for race or income. This is a difficult and worthy problem of attacking.
Critically the user must still determine the features that aren’t allowed. We still have to have that difficult conversation about our values, our ethics, and what we will do about it. In terms of enforcing compliance, it is great that we have a tool that will show that an algorithm is biased against a particular group. Now let’s make sure everyone knows about it and what they can do to reverse it.

Evan Estola, Lead Machine Learning Engineer, Meetup
Evan is a Lead Machine Learning Engineer working on the Data Team at Meetup. Combining product design, machine learning research and software engineering, Evan builds systems that help Meetup’s members find the best thing in the world: real local community. Before Meetup, Evan worked on hotel recommendations at Orbitz Worldwide, and he began his career in the Information Retrieval Lab at the Illinois Institute of Technology.
Interview with Florian Tramèr, Researcher, EPFL
Our past Technical Chair, discussed Florian Tramèr’s upcoming talk: Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit at MLconf Seattle, scheduled for May 20th.
Will machine learning fix machine learning? (I mean the ethical side)
FT) Can a machine learn to be fair or just? Answering this question seems to first require an apt and complete definition of what constitutes unethical behavior. For instance, given two models, by what metrics would we compare their “ethicality”? Another challenge is to agree upon what constitutes the ground truth for fair behaviour. Researchers from the machine learning and data mining communities have started looking into these questions, but formally defining fairness has proven to be a very difficult task, given the notion’s contextual (and even cultural, see below) aspect.
With FairTest we have set out to first tackle the task of efficiently and systematically detecting spurious inferences made by machine learning algorithms. Our goal is to help point developers to underlying fairness issues, and hopefully provide preliminary paths to fixing them. I am confident that further progress will be made by the community in understanding ethical issues related to machine learning, as well as in fixing or regulating these issues.
Ethics is something cultural. I mean unethical actions in some cultures are ethical in others. What is the culture of an algorithm? Is it the culture of the author? Most of the recent fallouts did not really reflect programmer’s view, they acted spontaneously. Are the algorithms allowed to have their own culture?
FT) An algorithm can definitely be influenced by the culture of its author, or the cultural setting underlying its training examples. For instance, a model trained over data mirroring historical social biases (e.g. on race or gender) will perceive such biases as ground truth and likely perpetrate them. The problem is in controlling which aspects of our culture (e.g. its ethics rather than its biases) end up “absorbed” by the algorithm. On a different note, algorithms can also bring us to rethink some of our cultural norms. This is being evidenced, for instance, by debates around privacy, a notion that is being significantly reshaped by the avenue of the digital world.
Expert psychologists often resort to manipulative techniques in order to target vulnerable social groups such as kids in order to push them to spend more. Yes, I am a parent and I have felt that a lot of ads try to do that. Why is it fair for a group of humans to do this and not for a group of algorithms?
FT) In principal, I don’t believe there should be a distinction between the two. Something that is deemed unfair for an algorithm should be deemed unfair for a human being, and vice versa. On a side note, in my opinion, some forms of advertisement on children (e.g. for junk food) are unethical. However, one factor that absolutely should be taken into account here is scale.
Algorithmic decision making (including targeted advertisement) is predicted to have an ever-growing impact on our everyday lives, implying that even minor unfair effects will have the potential to cause tremendous harm. Quoting Stephen Hawking, “the trend seems to be toward […] technology driving ever-increasing inequality”. In particular, the impact of algorithms on the employment market (e.g. hiring processes, automation, etc.) will likely be tremendous.
This will require us to rethink certain of our notions and regulations around fairness and ethicality, which were not necessarily defined with internet-level scale in mind. This nuance in scale can perhaps be better understood with an analogy to digital privacy. Compare innocently reading a text over someone’s shoulder, to reading everyone’s texts by means of a mass surveillance system. The difference in scale introduced by algorithms may bring a vastly different meaning to unfairness, in the same way that it completely changed our notion of privacy invasion.
When does statistical inference become unethical?
FT) A fundamental tension is that machine learning is, by definition, about discrimination (in the statistical sense): A model is built to learn to classify (i.e. discriminate) new observations. So when is this discrimination not ok? A general concern is when a pattern or rule is generalized to a non-homogenous group (maybe for lack of sufficient data or features).
A prominent example of this phenomenon is the notion of redlining: a service being offered or denied based solely on geographical area, without consideration for relevant discrepancies inside these areas. From a statistical point of view, sensitive features such as race or gender (or features strongly correlated with these) end up being used as proxies for unmeasurable, yet actually relevant, quantities.
Companies are required, by the law, to show that they take preventative measures in regards to ethics and compliance. Could the FairTest algorithm presented in this MLconf be the new ethics regulation for high Tech? Or maybe sentiment analysis monitoring is a better way?
FT) It seems somewhat early to answer such a question, as the foremost challenge to overcome remains awareness, in my opinion. Concerns about the ethics of machine learning are starting to be recognized (as evidenced by this event and others), and tools such as FairTest will help us in better understanding how prominent and wide-spread these issues currently are.
However, machine learning methods are increasingly being applied in extremely diverse settings, for which domain-specific regulations will probably be needed. It is thus likely that more than a single tool or method will be necessary to meet the broad needs or requirements for ensuring ethical usage of machine learning.

Florian Tramèr, Researcher, EPFL
Florian Tramèr is a research assistant at EPFL in Switzerland, hosted by Prof. J-P. Hubaux. He received his Masters in Computer Science from EPFL in 2015 and will be joining Stanford University as a PhD student this Fall. During his Master Thesis, Florian collaborated with researchers at EPFL, Columbia University and Cornell Tech to design and implement FairTest, a testing toolkit to discover unfair and unwarranted behaviours in modern data-driven algorithms. Florian’s interests lie primarily in Cryptography and Security, with recent work devoted to the study of security and privacy in the areas of genomics, ride-hailing services, and Machine-Learning-as-a-Service platforms.
Interview with Jake Mannix, Lead Data Engineer, Lucidworks
Our past Technical Chair, discussed Jake Mannix’s upcoming talk: Smarter Search With Spark-Solr at MLconf Seattle, scheduled for May 20th.
As an ML excutive, you have to deal with different algorithms, systems, and platforms. How do you deal with this technical complexity?
JM) Typically: aim for simplicity and tried-and-true effectiveness w.r.t algorithms, ease of modification and extensibility for systems, and a balance between openness, stability, and scalability for platforms.
Is it worth buying, building in-house, or using open source ml?
JM) Yes…. LoL. Ok seriously now, depending on your organization and its needs, all three of these can be the right choice. If you have a team of > 3-5 experienced data scientists / engineers with graduate training in ML, building some in-house can work. If you’re an org without a lot of direct internal ML training, try and see what out-of-the-box OSS ML can get you (esp. after engaging with the community to see where the rough edges are, and if they’re surmountable). If you’re a large organization with at least one internal ML expert who can vet closed-source vendors, then buying ML products can be viable (but I would still warn against going with anything *too* “black-box”. I translate that in my head as “black magic”, a.k.a. hard to tell as different from ‘snake oil’).
How do you introduce new algorithms in your production? Do you proactively follow the literature or you develop/modify ones according to your result analysis?
JM) The literature, when academic, is rarely applicable soon in production. When it comes out of serious industry work, it’s worth looking at, to see if you have the requisite inputs (i.e. if Google has a paper showing how if you have 100M images with labeled tags and use some collaborative filter over billions of clicks, well, it doesn’t really matter to most of us: we don’t have those things). Always trying to “keep up with the latest” developments tends to only be useful, IMO, in the broadest view on the field: see the next question, for example.
Everybody is thrilled with deep learning. We have seen great results in image/video/translation. Have you seen the same revolution in your domain?
JM) I’ve heard inklings of DL being useful for text, and while I’ve seen people play around with e.g. word2vec-style models, I haven’t heard of a single production usage of it. Yes, perhaps for machine translation, but even there, I don’t see it *replacing* all the older techniques just yet.
Is there new upcoming research that you think will change the game in the way we search and understand documents?
JM) Syntactic parsing of text documents, when achievable at scale, certainly has the capability of significantly improving search – see for example the effect that Google Knowledge Vault cards have on factual queries, or keep an eye out for advancements in the Allen Institute for AI’s Semantic Scholar ( https://www.semanticscholar.org ).
What is your opinion about knowledge base technology (deep dive, vault, etc)? How close we are to offering a Watson out of a directory with text data?
JM) Most of this technology is both a) fantastic, and b) completely locked behind closed doors. None of it is terribly open, and I’m not sure I see that changing any time soon.

Jake Mannix, Lead Data Engineer, Lucidworks
Living in the intersection of search, recommender-systems, and applied machine learning, with an eye for horizontal scalability and distributed systems. Currently Lead Data Engineer in the Office of the CTO at Lucidworks, doing research and development of data-driven applications on Lucene/Solr and Spark.
Previously built out LinkedIn’s search engine, and was a founding member of the Recommender Systems team there and after that, at Twitter, built their user/account search system and lead that team before creating the Personalization and Interest Modeling team, focused on text classification and graph-based authority and influence propagation.
Apache Mahout committer, PMC Member (and former PMC Chair).
In a past life, studied algebraic topology and particle cosmology.