Nearly two decades have passed since the Human Genome Project was completed, fueling discoveries in genomics that have illuminated the complex story of the human past, and provided clues about the heritable origins of disease. Whereas some diseases have narrow genetic causes, the majority of common conditions – such as coronary artery disease or type 2 diabetes – have been shown to be “polygenic,” involving hundreds if not thousands of genes in addition to lifestyle and other non-genetic factors.
As we advance our mission to help people access, understand, and benefit from the human genome, 23andMe has been a leader in using machine learning to develop polygenic scores (PGS) in a direct-to-consumer model. In this talk, a scientist, an engineer, and a product manager discuss a collaborative project to scale up the infrastructure required to rapidly develop and deliver polygenic models into the consumer product at scale. This end-to-end analytic pipeline leverages 23andMe’s unprecedented genetic and phenotypic database, with over twelve million genotyping kits sold and over three billion survey questions answered. In this analysis flow, we first identify links between genetic variation and diseases or traits in a genome-wide analysis (GWAS). We then select sets of genetic variants linked to these outcomes as features in dozens of potential models fit in a parallelized machine learning service, promoting the most powerful models for immediate use in the consumer product. This internally developed software has drastically shortened development time (from months to days) and enabled the delivery of personalized results based on much larger models (20,000+ genetic features) to millions of customers in seconds.
The speakers will also highlight company efforts to improve genomic interpretation across a diversity of ancestral backgrounds. Unfortunately, the majority of research in genomics has included only participants of European ancestry, and some of the same genomic features that enable us to trace population migrations (and 23andMe’s popular Ancestry Composition tool) make it more difficult to generalize PGS developed in one population to another. For the first time, 23andMe’s end-to-end PGS pipeline can automatically test the performance of candidate models across multiple populations, optimizing our model selection process to ensure that we can deliver the most equitable product possible.