Our past Technical Chair, interviewed Hussein Mehanna, Engineering Director – Core ML, Facebook, regarding his upcoming presentation Applying Deep Learning at Facebook Scale, scheduled for 09/23/16 at MLconf Atlanta.
One of the criticism against deep learning models was the complexity of inference. In your talk you will explain how you reduced the inference time. Does this mean there is no advantage of shallow models versus deep models anymore?
HM) No, I don’t think so. In fact, one of the tricks that are growing in popularity these days is using deep expensive models to learn and then use those to teach shallow models (dark matter transfer). There seems to be a theory that during learning you need more capacity and complexity but that could be reduced at inference. In fact at times it even improves accuracy. So I think shallow models will stay.
Do we need to compromise accuracy to make deep learning inference fast?
HM) Actually not necessarily, at times it may even improve generalization as complex models overfit. That said, figuring out how to reduce the computational load of a model is still non trivial. Making this simpler and more automatic is something that will help the industry.
TensorFlow, Torch, Theano, Mxnet, CNTK,…. Can you help us survive the babel of deep learning platforms? Can you help the MLconf audience what to choose, or how to choose?
HM) Yes, I probably can. That said, MLconf audience should feel happy because diversity increases the chances that they get tools closer to their needs. We are in a creative chaos phase in AI but things are converging.
I have noticed that some deep learning platforms are good with dense data and others with sparse. At Facebook you are dealing with both. How did you manage to unify both under one platform?
HM) Good question – I will need to check with our legal system before I answer that. All I can say now is that we treat both as a first class citizen and we are investing in algorithms that operate in the intersection. This is majorly beneficial for sparse scenarios since traditional deep learning has been dense focused as its easier to get hold of images than social data.
You implemented deep learning at scale. What is the gap between theory and practice? What are the tricks that make the difference that you don’t find written in a paper?
HM) That’s a fantastic question. Any ML algorithm is really dependent on the data. If you change the data, you change the problem completely. That’s the biggest difference in my opinion between Academia and Industry. It makes a lot of sense for academia to standardize their datasets but most of those don’t represent what the industry uses. Think about the intersection in data between the imagenet dataset and a system that needs to recognize consumer products. Probably very different. The other difference is that industrial systems receive continuous improvements that accumulate over time and so baselines in industry are much more tuned.
What was the most surprising fact that you have discovered about deep learning? Can you share a paper with us that had a great influence on you?
HM) I am going to seem biased towards Facebook AI Research a bit but I adore the character level deep learning for NLP. The fact you can learn from raw textual input with no preprocessing as you would with images is just extremely powerful. In my early college days, I just could not bear all the special rules that riddled NLP and I always believed there is a better solution. This paper provides good basis for that, we now have more sophisticated stuff in the team but that paper was a great start.

Hussein Mehanna, Engineering Director – Core ML, Facebook
I am the Director of the Core Machine Learning group at Facebook. Our team focuses on building state of the art ML/AI Platforms combined with applied research in event prediction and text understanding. We work closely with product teams in Ads, Feed, Search, Instagram and others to improve their user experiences.
In 2012, I joined Facebook as the original developer on the Ads ML platform. That quickly developed into a Facebook wide platform serving more than 30+ teams. Prior to Facebook, I worked at Microsoft on Search query alterations and suggestions in Bing and on communication technologies in Lync. I hold a masters degree in Speech Recognition from the University of Cambridge, UK where I worked on noise robustness modeling.