Many customers of Benchling (a cloud-based platform for biotechnology research and development) use it to track experiments in which cells are grown in order to produce a biological component of a drug or therapeutic. In order to help our customers model and understand their growth processes, we built a modeling pipeline tailored to the growth of biologics. While the goal and processes involved in the growth of biologics are similar across customers, the exact configuration and composition of the data tracked for the process are completely custom for each customer. The modeling pipeline we built automatically adjusts to different data formats and distributions while capturing commonalities of the underlying systems being modeled. We can produce high quality models across a range of customers without having a specialist per customer to build the models. I will describe both the automation steps (and libraries) used and the domain specific modeling adjustments needed to create this growth of biologics modeling pipeline.
Session Summary
Humans helping machines help humans run machines: Combining automation and domain knowledge to enable productized modeling of the growth of biologics
MLconf 2022 San Francisco
Leah McGuire
Benchling
Machine Learning Engineer
Learn more »
In this talk I will describe how having specialists inject domain knowledge about the problem being modeled, combined with automation of modeling steps, can produce good quality models across highly varied small datasets. There is a large amount of mission critical, small, tabular data produced across many domains. There are not enough specialists to analyze this data, it is too small and too varied to tackle with the deep learning techniques that have taken over vision and speech modeling, and blindly applying ML or AutoML to them generally produce less than useful results. However, by having a specialist use AutoML tools in model building it is possible to scale that individuals expertise to cover many instances of a use case.