Cross-platform deployment of machine learning models in Python

--

My current workflow happens to be involving 3 different operating systems. I develop model on our internal cluster (Linux machines) which can only be accessed inside the corporate network. This model is then deployed on a Heroku server (also Linux) so that it is publicly accessible (via API). However the model binary cannot be transferred directly from the private network to the public network. So I have to download it from a Bitbucket repository on my local Windows machine, upload to OneDrive, then download it to my personal laptop (Mac OS), from which I publish to the Heroku server. The data flow looks like this:

EC2 machine (Linux) => BitBucket server => Local machine (Windows) => OneDrive => Personal machine (Mac OS) => Heroku server (Linux)

As you can imagine this is a recipe for compatibility issues across different platforms. Here’s what enabled the successful cross-platform deployment.

  • Use pickle.dump() with the highest protocol available (see example code below). This method has a default parameter of ‘protocol=0’, which uses ASCII protocol even if the file opening mode is binary. This causes problem when the pickled file is moved from Unix to Windows due to different end-of-line characters. To avoid this issue, use the highest protocol to make sure that that file is represented in binary format.
# model = ... a scikit-learn model ... 
with open('path_to_file.pkl', 'wb') as f:
pickle.dump(model, f, pickle.HIGHEST_PROTOCOL)
  • Use consistent package versions. This is especially important for machine learning models where objects in different versions of package (scikit-learn and scipy in my case) may be serialised differently. The best way to do this is with virtualenv or pipenv.

--

--

Trung Nguyen | ML | AI | Data Science

I write about data science, machine learning, productivity and other good stuffs