One possible method is to infer the model till the route selector across all batches then split the…

Apr 14, 2024

One possible method is to infer the model till the route selector across all batches then split the batches in accordance to the selector and infer across the individual experts seperately and then combine them back. This would ensure speedup even in batched inferencing and fairly easy to implement in vanilla pytorch aswell

Written by Chidhambararajan R