đź’ˇEmbedding learning for DLRM

What is embedding learning ?

Jaideep Ray
Better ML
1 min readJan 27, 2022

--

  1. Embedding learning is a technique used in deep recommendation models (DLRMs) to map categorical features to dense vectors.
  2. DLRM often require an extremely large embedding tables, which in turn require high memory capacity and memory bandwidth. To tackle this problem, distributed training solutions have been adopted to partition the embedding tables into multiple devices.
  3. However, the embedding tables can easily lead to imbalances if not carefully partitioned due to their diverse characteristics, leading to poor training efficiency.

What are the challenges ?

  1. Embedding memory footprints are in O(TB) and account for over 99% of the total model capacity. Embedding memory capacity and bandwidth demands for DLRMs have been growing super-linearly, exceeding the memory capacities available on typical hardware reserved for training.
  2. Embeddings work as hash tables for DLRM. As a result, they display interesting access patterns such as birthday paradox.
  3. The challenge is to devise a sharding strategy for a set of embeddings based on model training data distributions and underlying memory characteristics.
  4. An embedding planner optimizes embedding placement on devices to optimize for training time (QPS) and load balance for embedding.
A mix of Data and Model Parallel strategies for DLRM training.

References :

RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation

--

--