At the core of any e-commerce product catalog is product categorization. Accurate product categorization not only has an impact on revenue growth but is also a key for good customer experience. However, research has shown that even though the mistakes made by machine learning classification model have been dramatically reduced, the severity of mistakes has not changed much. When model mistakes do happen, the top model prediction is still embarrassingly far off the true category. To address this issue, we will first come up with a plausible explanation from the perspective of semantic label representation. Then for semantically meaningful organization of product categories, we will formulate two innovative label representations. Next, we will propose how to construct auxiliary objective with help of the semantic label representations in the context of optimizing multimodal product categorization. Finally, we will discuss the experimental results that show significant improvement to product categorization, in terms of both the accuracy and more importantly, the semantic similarity of top predictions.
Labels are human-generated signals that are highly semantic and information-dense. In addition, the similarity ranking among the labels is human interpretable, with room for inherent ambiguity. A convenient way to represent labels in supervised multi-class training is through One Hot Encoding (OHE). However, intrinsic similarity between the classes is ignored, and essentially each class is treated as an anonymous class that has no relationship to any other class. In contrast to OHE, label smoothing constructs a target with a probability distribution instead. This simple but effective method is probably well known to the MLconf audience, with wide adoption by many state-of-the-art models and well-documented benefits to both generalization and model calibration. However, uniform label smoothing results in loss of information about resemblances between instances of related classes. Our first proposed approach is built on the label smoothing concept. We leverage a-priori semantic information to model the probability distribution over classes, such that target label is represented using probability weights reflecting the similarity between the classes. If our first proposed approach aims at encoding semantic similarities in label representation, our second proposed approach is to design label representation that captures feature similarities. The intuition is that if we could somehow embed the common features shared by classes in the label representation, effectively the labels are binarized and a multi-class task is replaced by a multi-label task. For our proposed approach of semantic label smoothing, we will demonstrate how to construct soft targets using CLIP embeddings. There are many use cases beyond the eCommerce domain that could benefit from our proposed approach. Based on our experimental results, we will make recommendation for the closest implementation for our proposed semantic label smoothing approach, as well as a good approximation when facing challenge to quantify the semantic relationship due to ambiguity in taxonomy.