Most organizations have machine learning experts and devops experts, but few have individuals that possess both skills. As a result, there is a constant need for machine learning experts to rely on devops to spin up and manage clusters to train models, and for these devops professionals to extract as much context as possible from machine learning experts in a way that is optimized for any given training job. At best, this process introduces friction in model development. At worst, it reduces the volume and variety of models that a team can train. This particular talk focuses on the operational challenges that inhibit model training and tuning at scale.