💡Challenges in bulk inference

What is bulk inference ?

Published in

Better ML

1 min readJan 23, 2022

--

Model scaling (model sizes increasing significantly y/y) and compute scaling (Flops increasing in hardware accelerators like gpu), bulk inference poses interesting challenges.

Why do we care ?

Bulk inference (apply model on examples to extract predictions) is a fundamental operation required for model evaluation, comparison, debugging and model analysis.

Challenges in Bulk inference :

Can the model be loaded in single instance or require multiple devices ?
Can the setup be scaled out as needed i.e add more nodes to gain throughput ? Both single node & multi node bulk inference are supported seamlessly.
Is model apply being check-pointed i.e batches of prediction being uploaded to a permanent storage like hive or a file-system ?
Can resources like memory and cpu usage be monitored throughout ?
Can hardware accelerators (gpus / asic) be used with simple config changes and no boilerplate code changes ?

Components in Bulk inference :

Data loader: Can read examples in batch from a permanent storage like hive. This should be able to scale to a multi node setup to read data in parallel.
Model loader : Can load model in memory and prime it for bulk inference.
Prediction exporter : Upload predictions to permanent storage.

https://aws.amazon.com/blogs/architecture/batch-inference-at-scale-with-amazon-sagemaker/

Jaideep Ray

Written by Jaideep Ray

Editor for

Better ML

Engineer | ML Platforms | Model lifecycle| https://www.linkedin.com/in/jaideepray/

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams