Uncovering MLFlow’s Spark UDF

How it works under the hoods for isolated conda environments and what are the caveats

Published in

Israeli Tech Radar

6 min readApr 24, 2023

In our previous article, we discussed an excellent solution to avoid the ML dependencies syncing black-hole. Which is, we showed a way to avoid the challenge of syncing between the dependencies used during model training and those used during model serving, by having the model and the inference service running together but completely isolated.

The end of the conflicts. Peace on earth.

In practical terms it meant three things:

Models will be registered to MLFlow Model Registry with their training environment defined in a conda file;
The inference will be made using Apache Spark;
The prediction in Spark will be executed using MLFlow’s Spark UDF with an isolated conda environment.

ML inference service with Spark and MLFlow on environment isolation mode — image by the author

How does the magic happen? What are the…

Uncovering MLFlow’s Spark UDF

How it works under the hoods for isolated conda environments and what are the caveats

Written by Yerachmiel Feltzman