In the past decade, Deep Learning research has been focused on improving the State of the Art, and as a result complexity, number of parameters, latency, training resources required, etc. have all increased rapidly. Models like GPT-3 have hundreds of billions of parameters and require millions of dollars to train. Thus it has become essential to consider the cost of training and deploying these models in production. What if you could achieve the same model accuracy with a fraction of GPUs during training, and an order of magnitude less latency? The talk will provide the attendees an introduction to the Deep Learning techniques that they can leverage to achieve both training (faster to train, less data and resources) and inference efficiency (faster predictions, smaller model size, less compute required). The attendees will be able to apply the learnings to their existing models, and also get insights for future research and experimentation to achieve further efficiency improvements.
Session Summary
Efficient Deep Learning
MLconf Online 2021 – AI/ML Ops
Gaurav Menghani
Google Research
Staff Software Engineer
Learn more »