Uncovering MLFlow’s Spark UDF

How it works under the hoods for isolated conda environments and what are the caveats

Yerachmiel Feltzman
Israeli Tech Radar

--

Photo by Kevin Bidwell on Pexels

In our previous article, we discussed an excellent solution to avoid the ML dependencies syncing black-hole. Which is, we showed a way to avoid the challenge of syncing between the dependencies used during model training and those used during model serving, by having the model and the inference service running together but completely isolated.

The end of the conflicts. Peace on earth.

Photo by mali maeder on Pexels

In practical terms it meant three things:

  1. Models will be registered to MLFlow Model Registry with their training environment defined in a conda file;
  2. The inference will be made using Apache Spark;
  3. The prediction in Spark will be executed using MLFlow’s Spark UDF with an isolated conda environment.
ML inference service with Spark and MLFlow on environment isolation mode — image by the author

How does the magic happen? What are the…

--

--