šŸ’”Challenges in bulk inference

What is bulk inference ?

Jaideep Ray
Better ML
1 min readJan 23, 2022

--

  • Model scaling (model sizes increasing significantly y/y) and compute scaling (Flops increasing in hardware accelerators like gpu), bulk inference poses interesting challenges.

Why do we care ?

  • Bulk inference (apply model on examples to extract predictions) is a fundamental operation required for model evaluation, comparison, debugging and model analysis.

Challenges in Bulk inference :

  • Can the model be loaded in single instance or require multiple devices ?
  • Can the setup be scaled out as needed i.e add more nodes to gain throughput ? Both single node & multi node bulk inference are supported seamlessly.
  • Is model apply being check-pointed i.e batches of prediction being uploaded to a permanent storage like hive or a file-system ?
  • Can resources like memory and cpu usage be monitored throughout ?
  • Can hardware accelerators (gpus / asic) be used with simple config changes and no boilerplate code changes ?

Components in Bulk inference :

  • Data loader: Can read examples in batch from a permanent storage like hive. This should be able to scale to a multi node setup to read data in parallel.
  • Model loader : Can load model in memory and prime it for bulk inference.
  • Prediction exporter : Upload predictions to permanent storage.

https://aws.amazon.com/blogs/architecture/batch-inference-at-scale-with-amazon-sagemaker/

--

--