Uncovering MLFlow’s Spark UDF
How it works under the hoods for isolated conda environments and what are the caveats
Published in
6 min readApr 24, 2023
In our previous article, we discussed an excellent solution to avoid the ML dependencies syncing black-hole. Which is, we showed a way to avoid the challenge of syncing between the dependencies used during model training and those used during model serving, by having the model and the inference service running together but completely isolated.
The end of the conflicts. Peace on earth.
In practical terms it meant three things:
- Models will be registered to MLFlow Model Registry with their training environment defined in a
conda
file; - The inference will be made using Apache Spark;
- The prediction in Spark will be executed using MLFlow’s Spark UDF with an isolated
conda
environment.
How does the magic happen? What are the…