Rishabh Mehrotra is a Research Scientist at Spotify Research in London. He obtained his PhD in the field of Machine Learning and Information Retrieval from University College London where he was partially supported by a Google Research Award. His PhD research focused on inference of search tasks from query logs and their applications. His current research focuses on bandit based recommendations, counterfactual analysis and experimentation. Some of his recent work has been published at top conferences including WWW, SIGIR, NAACL, CIKM, RecSys and WSDM. He has co-taught a number of tutorials at leading conferences (WWW & CIKM) & was recently invited to teach a course on “Learning from User Interactions” at the Russian Summer School on Information Retrieval and the ACM SIGKDD Africa Summer School on Machine Learning for Search.
Upcoming Abstract Summary
Recommendations in a Marketplace: Personalizing Explainable Recommendations with Multi-objective Contextual Bandits:
In recent years, two sided marketplaces have emerged as viable business models in many real world applications (e.g. Amazon, AirBnb, Spotify, YouTube), wherein the platforms have customers not only on the demand side (e.g. users), but also on the supply side (e.g. retailer, artists). Such multi-sided marketplace involves interaction between multiple stakeholders among which there are different individuals with assorted needs. While traditional recommender systems focused specifically towards increasing consumer satisfaction by providing relevant content to the consumers, two-sided marketplaces face an interesting problem of optimizing their models for supplier preferences, and visibility.
In this talk, we begin by describing a contextual bandit model developed for serving explainable music recommendations to users and showcase the need for explicitly considering supplier-centric objectives during optimization. To jointly optimize the objectives of the different marketplace constituents, we present a multi-objective contextual bandit model aimed at maximizing long-term vectorial rewards across different competing objectives. Finally, we discuss theoretical performance guarantees as well as experimental results with historical log data and tests with live production traffic in a large-scale music recommendation service.