The emergence of GenAI has revolutionized various domains, from creative content generation with text, synthetic images, video and so much more. However, the success and effectiveness of GenAI models heavily rely on the quality of the underlying data during the fine-tuning process. Volumes of crude data are available on the web nowadays; all we need are the skills to identify and extract meaningful datasets and present them to GenAI models to unleash their full potential. This talk presents the power of the most fundamental aspect of AI – Data Curation, which often does not get its due limelight. It will also walk the audience through constructing good-quality datasets with hands-on Pythonic examples. By emphasizing the indispensability of quality data, this talk underscores the need for robust data collection and preprocessing practices to propel the advancements in GenAI.
Session Summary
Sculpting Data for Machine Learning: Generative AI edition
MLconf Online 2023
Jigyasa Grover
Faire
Senior Data Scientist
Learn more »
Rishabh Misra
Attentive Mobile
Senior Machine Learning Engineer
Learn more »