š”Challenges in bulk inference
What is bulk inference ?
Published in
1 min readJan 23, 2022
- Model scaling (model sizes increasing significantly y/y) and compute scaling (Flops increasing in hardware accelerators like gpu), bulk inference poses interesting challenges.
Why do we care ?
- Bulk inference (apply model on examples to extract predictions) is a fundamental operation required for model evaluation, comparison, debugging and model analysis.
Challenges in Bulk inference :
- Can the model be loaded in single instance or require multiple devices ?
- Can the setup be scaled out as needed i.e add more nodes to gain throughput ? Both single node & multi node bulk inference are supported seamlessly.
- Is model apply being check-pointed i.e batches of prediction being uploaded to a permanent storage like hive or a file-system ?
- Can resources like memory and cpu usage be monitored throughout ?
- Can hardware accelerators (gpus / asic) be used with simple config changes and no boilerplate code changes ?
Components in Bulk inference :
- Data loader: Can read examples in batch from a permanent storage like hive. This should be able to scale to a multi node setup to read data in parallel.
- Model loader : Can load model in memory and prime it for bulk inference.
- Prediction exporter : Upload predictions to permanent storage.
https://aws.amazon.com/blogs/architecture/batch-inference-at-scale-with-amazon-sagemaker/