Our past Technical Chair, discussed Kristian Kersting’s upcoming talk: Declarative Programming for Statistical ML at MLconf Seattle, scheduled for May 20th.
Why do you think expressing machine learning in a relational way would democratize machine learning?
KK) Consider a typical machine learning user in action solving a problem for some data. She selects a model for the underlying phenomenon to be learned (choosing a learning bias), formats the raw data according to the chosen model, and then tunes the model parameters by minimizing some objective function induced by the data and the model assumptions. Often, the optimization problem solved in the last step falls within a class of mathematical programs for which efficient and robust solvers are available. Unfortunately, however, today’s solvers for mathematical programs typically require that the mathematical program is presented in some canonical algebraic form or offer only some very restricted modeling environment. The process of turning the intuition that defines the model “on paper” into a canonical form could be quite cumbersome. Moreover, the reusability of such code is limited, as relatively minor modification could require large modifications of the code. This is where declarative modelling languages such as RELOOP enter the stage. They free the machine learning user from thinking about the canonical algebraic form and instead help focusing on the model “on paper”. They allow the user to abstract from entities and in turn to formulate general objectives and constraints that hold across different situations. All this increases the ability to rapidly combine, deploy, and maintain existing algorithms. To be honest, however, relations are not everything. This is why we embedded the relational language into an imperative language featuring for-loops.
The big data hype has lead a lot of people focusing on improving scalability of exotic algorithms. You have chosen two: Linear Programming and Quadratic Programming to do machine learning and focus more on feeding the data in. What are the pros and cons of this approach?
KK) We just started with LPs and QPs since they are the working horses of classical machine learning. This ways we build a basis for what might be called relational statistical learning. Afterwards, we will move on to other machine learning approaches, even maybe deep learning approaches.
Do you think it’s more important for a data scientist to easily customize the objective and encode constraints rather than use a fancy ML algorithm?
KK) Good question. I think this depends on the application. We are collaborating a lot with plant physiologists. They once asked me, what is the biological meaning of an eigenvector’? Quite difficult to answer. Or consider stochastic gradients. They argued `so you want me to through away 90% of my data? How do I explain this to my students, who spent 200 days in the field to gather the data?’. Similar with deep learning. They want to trust the algorithm, at least in the beginning of data science project, and to gain insights into their data. Here is where focusing on the constraints can help; they might be easier to understand, at least when encoded in a high-level language. Or, consider collective classification, i.e., the classification of one entity may change the classification of a related entity. Typically one uses a kernel to realize this when using support vector machines. However, just placing some additional constraints encoding that related entities should be on the same side of the hyperplane can do the job, too, as this also captures the manifold structure in the high-dimensional feature space. Unfortunately, in contrast to the AI community, the ML community has not really developed a methodology for constraints, yet.
There is big crowd from engineering with expert skills in optimization that has struggled to get into data science and earn the corresponding salary, do you think you are opening a door for them?
KK) Hopefully we can at least help. Optimisation is definitely one of the foundations of statistical machine learning. High-level languages for optimization hopefully help to talk about the models and hence to bridge the disciplines even further.
Do you see the solvers becoming scalable enough so that your approach can be applied to big data? Is there a different path?
KK) Hmm, scalability is always an issue. However, it is not just the solver but the way the solver interacts with the modelling language. Consider e.g. relational models. They often have symmetries that can be used to reduce the model automatically. This is sometimes used lifted inference. And, we have just started to exploit structure within statistical machine learning. Imagine a cutting plane solver that is not computing the mostly violated constraint but the mostly violated and fastest to compute one. As yet another example, one of my PhD students, Martin Mladenov, just found a nice way to combine matrix free optimization methods together with relational languages such as RELOOP. With this we can solve problems involving a Billion of non-zero variables faster than Gurobi. So at least there is strong evidence that a new generation of solvers can scale well, even better than existing ones. Moreover, instead of compiling to an intermediate structure, why not compiling directly into a low-level C/C++ program that implements a problem-specific solver. In a sense, I envision a “-O” flags for machine learning, very much like we know it from C/C++ compilers.
How does your Relational Linear Programing play along with the Relational Databases?
KK) Relational DBs have be the home of high-value, data-driven applications for over four decades. This may explain why you see a push in industry to marry statistical analytic frameworks like R and Python with almost every data processing engine. As a machine learner this is nice as you do not have to worry about the data management and retrieval anymore. However, it is tricky to just map the data from a relational DB into a single table, the traditional representation for machine learning. You are likely to change the statistics. We need a relational machine learning that can deal with entities and relations. We just started with LPs and QPs since they are the working horses of classical machine learning. In the long run, we want to develop a tight integration of Relational Databases and Machine Learning, even maybe something like Deep Relational Machines.
Kristian Kersting, Associate Professor for Computer Science, TU Dortmund University
Kristian Kersting is an Associate Professor for Computer Science at the TU Dortmund University, Germany. He received his PhD from the University of Freiburg, Germany, in 2006. After a PostDoc at MIT, he moved to the Fraunhofer IAIS and the University of Bonn using a Fraunhofer ATTRACT Fellowship. His main research interests are data mining, machine learning, and statistical relational AI, with applications to medicine, plant phenotpying, traffic, and collective attention. Kristian has published over 130 technical papers, and his work has been recognized by several awards, including the ECCAI Dissertation Award for the best AI dissertation in Europe.
He gave several tutorials at top venues and serves regularly on the PC (often at the senior level) of the top machine learning, data mining, and AI venues. Kristian co-founded the international workshop series on Statistical Relational AI and co-chaired ECML PKDD 2013, the premier European venue for Machine Learning and Data Mining, as well as the Best Paper Award Committee of ACM KDD 2015. Currently, he is an action editor of DAMI, MLJ, AIJ, and JAIR as well as the editor of JAIR’s special track on Deep Learning, Knowledge Representation, and Reasoning.