We apply text mining techniques on earnings transcripts to extract meaningful features that capture management and investment community signals. Our study demonstrate that there is ample opportunity for investors to apply Natural Language Processing (NLP) techniques on financial documents to differentiate between outperforming and under-performing stocks.
Using a corpus of transcripts of earnings calls for global companies from 2010 to 2021, we create fundamentally driven features spanning document attributes, readability, and sentiment on different sections of the transcripts using both bag-of-words approaches and context-driven deep learning language models like BERT.
We test the efficacy of these features in predicting the future stock returns of companies and find that there are opportunities for investors to use these signals in stock selection. Specifically, we find that readability and sentiment-based techniques can enhance an investor’s ability to differentiate amongst outperformers and underperformers and these results are robust across market capitalization as well as investment universes (US Large Cap, US Small Cap, World ex-US and Emerging Markets). By analyzing the performance patterns of the various call participants, we find evidence that the analyst questions may contain more information than the executive sections. Finally, we observe that sentiment features derived from BERT are promising and have more efficacy than bag-of-words approaches.