You have pioneered the research of finding global optimal in nonconvex problems, something that has been a big headache for every machine learning user. You have proved optimality in tensor factorization problems. Can you mention other areas where the community has found algorithms that find global optima?
AA) It turns out that many non-convex machine learning problems can be solved using computationally efficient algorithms. In addition to tensor factorization, this includes matrix completion, robust PCA, phase retrieval, dictionary learning, and so on: the list is growing. In contrast to traditional computer science theory, where the focus is on solving worst-case instances, in machine learning, our goal is solve only a limited class of problem instances. Thus, instead of studying the hardness of solving the worst-case instance, the focus is on characterizing conditions under which finding global optima becomes tractable. For the problems mentioned above, those conditions turn out to be quite natural and mild.
Can you reference your favorite papers in that area? Tell us why you like them?
AA) There is so much new research being done, so I feel that listing just a few papers may not be the right thing to do. Instead, I recommend reading some of the blogs and newsletters that are reporting the latest research in an accessible manner. These include:
Deep Learning is a highly nonconvex problem. The community has found several tricks to get good solutions by using variants of stochastic gradient descent. Does it make sense to apply your research on this domain? What can theory of global optimality for nonconvex problems offer?
AA) With any non-convex problem, in order to reach the global optimum in bounded time, one needs to avoid the saddle points efficiently. Some of the recent work done by me and other researchers in the area has been to address this question. These works can lead to algorithms with faster training times, since they can overcome the plateau phenomena in training, where the algorithm makes no progress in the objective value for long periods of time. In addition, the issue of local optima is also being addressed. One of my recent results showed that there are alternative training algorithms for training neural networks based on tensor decomposition. There is also recent work on homotopy methods that can overcome the problem of local optima in nonconvex optimization. I believe that we will soon have practical algorithms that can take guesswork out of deep learning.
One of the problems of deep learning systems is that they require a lot of computing power. Replicating the results from a paper by Google on a DNN for image understanding required $13000 in cloud time. Is there any hope that theory can give use faster and more efficient algorithms?
AA) There is indeed a concern that large-scale learning will only be possible in big companies with lots of resources. One of the ways around this is to make all the pre-trained models publicly available (which is already the case), and to employ efficient transfer learning algorithms. In computer vision, it is now standard to use pre-trained imagenet features, and with text processing, word embeddings are being used extensively. This is an area of active research and I am sure we will see many new outcomes in this space.
Big ML companies seem to drain talent from Academia. Shall we worry that in the future we will not have enough Professors to mentor students. Is this a valid fear? Do you think big companies should give sabbaticals to professors to go back to academia? 🙂
AA) I think that these are exciting times to be in machine learning, whether it be in academia or industry. I view it as a big positive that many of the professor are now in industry to propel research forward. (full disclosure: I am on leave from UCI and working for Amazon web services). This will ensure in better technology transfer and closer connections between academia and industry. In fact, many of the universities are looking into closer partnerships with industry, allowing the faculty to engage in both academic and industrial research.
Tell us about your future research plans. Is there a new area that you find exciting? Tell us about papers you saw these year that you would have liked to author.
AA) As I mentioned earlier, I am currently on leave from UCI and am a principal scientist at Amazon Web services. This is an exciting opportunity to translate some of my theoretical research into practical algorithms running at scale. My team is dedicated towards building large-scale machine learning solutions on AWS and I am proud to be part of it. I am now looking at research questions that lie at the intersections of machine learning and systems.
What is the difference between Machine Learning and AI? What has changed since 2010 when ML seemed to have been the buzzword?
AA) There are many strong opinions on what constitutes machine learning and AI. I feel that the fields have converged more than ever before, and for the researchers working in the field, the focus should be on solving challenging problems rather than the terminologies.
Amazon Web Services
Anima Anandkumar is a principal scientist at Amazon Web Services, and is currently on leave from U.C.Irvine, where she is an associate professor. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. She is the recipient of several awards such as the Alfred. P. Sloan Fellowship, Microsoft Faculty Fellowship, Google research award, ARO and AFOSR Young Investigator Awards, NSF CAREER Award, Early Career Excellence in Research Award at UCI, Best Thesis Award from the ACM SIGMETRICS society, IBM Fran Allen PhD fellowship, and several best paper awards. She has been featured in a number of forums such as the Quora ML session, Huffington post, Forbes, O’Reilly media, and so on. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, an assistant professor at U.C. Irvine between 2010 and 2016, and a visiting researcher at Microsoft Research New England in 2012 and 2014.