Old habits die hard. At the heart of supervised classification in machine learning lies one such ‘old habit’ of One Hot Encoding (OHE) target class labels. In spite of a substantial body of literature that demonstrates the shortcomings of using this simple technique, it continues to be considered the default way to represent categorical labels in a standard machine learning pipeline. In this talk, we embark on an exploration, trying to understand how to and what happens when we replace the one-hot-encoded label vectors with dense lower-dimensional continuous valued smooth label vectors. In cases where we lack any semantic insights into the label space, we intuitively try to keep the dense label vectors as far apart as possible, which maps to solving the equiangular Grassmannian line packing problem. We demonstrate the efficacy of this new technique across many classification problems that arise in Computer vision, Natural Language Processing, Time series analysis and Speech processing. We conclude by sifting through the open sourced python implementation of this idea and carrying out a quick live demo that showcases how easy it is to use in a machine learner’s pipeline.
Session Summary
Still using one hot encoding? Here’s some ‘HELP’!
MLconf Online 2020
Vinay Prabhu
UnifyID AI Labs
Chief Scientist
Learn more »