In the talk I will give a quick overview of the autonomous vehicle (AV) stack, then introduce unique challenges that we have at Cruise’s ML Infra, and share our approach – One Platform with clean system layers and Continuous Learning Machine for addressing the long tail challenge. And then we will deep dive into our distributed data processing and training architecture which enables large scale and reliable DL at Cruise’s scale. And as a part of that we will look how novel distributed applications orchestration approaches enable the next level of scale, fault tolerance and coordination of distributed training process.
Session Summary
ML Infrastructure for Autonomous Vehicles @ Cruise
MLconf 2023 New York City
Aleksandr Sidorov
Cruise
Principal Software Engineer
Learn more »