Semantic Label Representation with an Application on Multimodal Product Categorization

Semantic Label Representation
Semantic Label Representation
  • Class 1: [1, 0, 1, 0]
  • Class 2: [1, 0, 0, 1]
  • Class 3: [0, 1, 1, 0]
  • Class 4: [0, 1, 0, 1]
Multi-label vs. Multi-class (solid lines are TensorBoard smoothed plots of the actual data in gray lines)
Baseline model for multimodal product categorization
Multimodal product categorization with semantic label smoothing
  1. OHE (baseline model)
  2. Uniform label smoothing with α=0.1 (semantic agnostic)
  3. Uniform label smoothing with α=0.1 for the top 20 nearest neighbors (semantic-aware)
  4. Label smoothing based on semantic similarities for the top 20 nearest neighbors (semantic-aware)
  5. Combine approach #4 with curriculum learning (semantic-aware)
Table 1: Improvement of Top-1 and Top-5 Accuracies
Table 2: Number of Top Predictions as Close Neighbors of the True Label


  1. Bertinetto, Luca, et al. “Making better mistakes: Leveraging class hierarchies with deep networks.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
  2. Müller, Rafael, Simon Kornblith, and Geoffrey Hinton. “When does label smoothing help?” arXiv preprint arXiv:1906.02629 (2019).
  3. Liu, Chihuang, and Joseph JaJa. “Class-Similarity Based Label Smoothing for Confidence Calibration.” International Conference on Artificial Neural Networks. Springer, Cham, 2021.
  4. Fergus, Rob, et al. “Semantic label sharing for learning with many categories.” European Conference on Computer Vision. Springer, Berlin, Heidelberg, 2010.
  5. Dippel, Jonas, Steffen Vogler, and Johannes Höhne. “Towards Fine-grained Visual Representations by Combining Contrastive Learning with Image Reconstruction and Attention-weighted Pooling.” arXiv preprint arXiv:2104.04323 (2021).
  6. Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” arXiv preprint arXiv:2103.00020 (2021).
  7. Lin, Tsung-Yi, et al. “Focal loss for dense object detection.” Proceedings of the IEEE international conference on computer vision. 2017.
  8. Dogan, Ürün, et al. “Label-similarity curriculum learning.” European Conference on Computer Vision. Springer, Cham, 2020.
  9. Wu, Cinna, Mark Tygert, and Yann LeCun. “A hierarchical loss and its problems when classifying non-hierarchically.” arXiv preprint arXiv:1709.01062 (2017).
  10. Narayana, Pradyumna, et al. “Huse: Hierarchical universal semantic embeddings.” arXiv preprint arXiv:1911.05978 (2019).



We’re powering the next great retail disruption. Learn more about us —

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Binwei Yang

Binwei is a Distinguished Data Scientist at Walmart Global Tech. His current interests span across computer vision and tooling for better ROI on data science.