Using a SageMaker XGBoost model in scikit-learn

Julien Simon
May 11 · 1 min read

This is a quick post answering a question I get a lot: “how can I use in scikit-learn an XGBoost model that I trained on SageMaker?”.

Here it goes. Once you’ve trained your XGBoost model in SageMaker (examples here), grab the training job name and the location of the model artifact.

I’m using the CLI here, but you can of course use any of the AWS language SDKs.

$ export TRAINING_JOB_NAME='xgboost-190511-0830-010-14f41137'$ export MODEL_ARTIFACT=`aws sagemaker describe-training-job \
--training-job-name $TRAINING_JOB_NAME \
--query ModelArtifacts.S3ModelArtifacts \
--output text`


Then, download the artifact and extract the model.

$ aws s3 cp $MODEL_ARTIFACT .$ tar xvfz model.tar.gz
x xgboost-model

The model is a pickled Python object, so let’s now switch to Python and load the model.

$ python3
>>> import sklearn, pickle
>>> model = pickle.load(open("xgboost-model", "rb"))
>>> type(model)
<class 'xgboost.core.Booster'>

You’re done. From now on, you can use the model as if you’d trained it locally. For example, you can dump it and visualize it.

>>> model.dump_model('model.txt')
>>> exit()
$ head model.txt
0:[f2<512] yes=1,no=2,missing=1
1:[f1<3.5] yes=3,no=4,missing=3
3:[f2<1.5] yes=7,no=8,missing=7
7:[f42<0.5] yes=15,no=16,missing=15
4:[f52<0.5] yes=9,no=10,missing=9

See? That was super easy :)

Thanks for reading. Happy to answer questions here or on Twitter.

Julien Simon

Written by

Hacker. Headbanger. Harley rider. Hunter.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade