Kubeflow v0.5 simplifies model development with enhanced UI and Fairing library

Thea Lamkin
Apr 30, 2019 · 8 min read

The Kubeflow Product Management Working Group is excited to announce the release Kubeflow v0.5, which brings significant improvements to users’ model development experience!

New features in Kubeflow 0.5 include:

  • A Go binary, kfctl, to simplify configuring and deploying Kubeflow
  • An improved UI for managing notebooks that makes it easy to:
  • Run multiple notebooks simultaneously
  • Attach volumes to notebooks
  • The Fairing library to build, train, and deploy models from notebooks or your favorite Python IDE

Demo Kubeflow 0.5 by building, training and deploying an XGBoost model

Deploy Kubeflow

The screenshot below shows an example deployment form. Note that “Project field” is a GCP Project ID, with a deployment name of our choice. In this example, we opted to use Login with a Username/Password, and picked a username and password for the deployment to use. Also note that we left the Kubeflow version to the default v0.5.0. Then we clicked “Create Deployment,” kicking off the deployment of Kubeflow to the project. It will take roughly 10 minutes to be ready after you kick it off. Click on Show Logs to view the progress messages. In case you run into errors, please see detailed instructions for deployment.

Image for post
Image for post

Once the deployment is ready, the deployment web app page automatically redirects to the login page of the newly deployed Kubeflow cluster, as shown below.

Image for post
Image for post

Create a notebook server in Kubeflow

Image for post
Image for post

Notice the build version displayed at the bottom left of the dashboard. This gives a quick confirmation of the version of Kubeflow deployed in your cluster.

In this demo we’ll focus on notebooks. Clicking on Notebooks in the left nav takes us to the new Notebooks management interface:

Image for post
Image for post

This is a new Kubernetes Native web app developed by the Kubeflow Community to improve the experience of creating and managing Notebook Servers in a Kubeflow deployment.

We’ll create a new TensorFlow 1.13 notebook server using one of the pre-configured images in Kubeflow by clicking “New Server” at the top-right.

Image for post
Image for post

Now we’ll provide a name for the notebook server (myserver in this example), pick the default Kubeflow namespace, and pick one of the standard TensorFlow notebook server images. We picked 1.0 for CPU and 5.0Gi for the memory. The new UI makes it really easy to create and attach new volumes, as well as existing volumes, to the Notebook Server. If you have a pre-configured NFS Server volume (your Admin team might have done that), you can easily discover it and attach the existing volume(s).

Image for post
Image for post

Once configured, we click “Spawn” and wait for the notebook server to get ready.

Image for post
Image for post

At this point, the pod is getting ready and pulling the specified container image. Once ready, the “Connect” button is highlighted on the notebook server, as shown below.

Image for post
Image for post

Clicking on “Connect” takes us to the Jupyter notebooks:

Image for post
Image for post

Note that, initially, there are no notebooks or terminals running.

Run an example notebook with Kubeflow Fairing

For this example, we’ll try running through one of the new Fairing example notebooks. In order to do that easily, here are the steps we follow:

  1. Create a new terminal.
Image for post
Image for post

2. Clone the fairing repo in the terminal.

$ bash
$ git clone https://github.com/kubeflow/fairing
Image for post
Image for post

3. In the terminal, run the following commands:

$ cd fairing/examples/prediction
$ pip3 install -r requirements.txt

4. Switch back to the notebooks view. Notice the fairing directory that shows up.

Image for post
Image for post

5. Browse to faring/examples/prediction directory. Click on xgboost-high-level-apis.ipynb

Image for post
Image for post

6. This opens the notebook in your notebook server.

Image for post
Image for post

7. Study the notebook and run through the notebooks cells.

Explore the notebook

Here are some of the core features to note:

  1. You can iterate and develop your model within the Notebook container. You can train your model within the Notebook server running on Kubeflow with a subset of the data.
  2. When ready to train full scale model, Fairing enables you to easily switch to a Kubeflow Backend Configuration to kick off a training job as a separate set of containers within the Kubeflow deployment. You can use this to train single node XGBoost or a distributed TensorFlow job. Fairing takes care of the following transparent to the user with a single python call to train:
    • The Fairing library automatically extracts your training code.
    • It builds a Docker container image automatically without needing to write a Docker configuration file.
    • Once the updated docker image is ready, it kicks off a training job on the Kubeflow cluster.
  3. The notebook also illustrates how a trained model could be easily deployed as a service in the Kubeflow cluster using a single python call. This leverages Seldon to wrap the python model into a container image for a flask application exposing the prediction endpoint.
  4. Finally, you can easily make predictions from within the notebook against the model just deployed in the previous step.

We hope it’s now clear how Fairing allows users to work through the entire Build/Train/Deploy lifecycle of a model from within Jupyter notebooks itself. If you have any feedback on this tutorial or something didn’t work as expected, please let us know by filing an issue in the Fairing repo.

More Details on Kubeflow v0.5: TFJob, PyTorch, Katib

  • Support Status subresource in CRD (#927, #924)
  • Add ActiveDeadlineSeconds and BackoffLimit (#550)
  • Use pod group instead of PDB for gang scheduling (#916)
  • Supporting multiple versions of CRD (#932)

v0.5 also includes valuable operational updates and improvements to hyper-parameter tuning in Katib:

  • Katib status should return optimal parameter values (#356)
  • An end-to-end test (#1946)
  • Make Katib generic for operator support (#341)
  • Removing Operator specific handling during a StudyJob run (#387)
  • Katib v1alpha2 API for CRDs (#381)
  • Katib job status should contain all conditions (#344)

You can see everything included in this release in the Kubeflow CHANGELOG.

What’s next

For more information on what we’re working on, take a look at our Multi-User Critical User Journey (CUJ) and our roadmap for a stable and enterprise-ready Kubeflow 1.0.

Community-driven development

Finally, thanks to all who contributed to v0.5! Kubeflow is home to 100+ contributors from 20+ organizations working together to build a Kubernetes-native, portable and scalable ML stack, and we need even more help. Here’s how to get involved:

Thanks to Josh Bottum (Arrikto), Abhishek Gupta (Google), and Karthik Ramasamy (Google) for contributing to this post.

kubeflow

Official Kubeflow Blog.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store