Kubeflow’s 2nd Doc Sprint: 10+ new docs & samples ahead of Kubeflow 1.0
Sarah Maddox, Kubeflow technical writer
For three days in mid February, the Kubeflow community held its second Kubeflow Doc Sprint. Thirty people got together in Sunnyvale, in Kirkland, and around the world to work on the Kubeflow docs and code samples. Participating organizations included Arrikto, AWS, Cisco, Google, IBM, Red Hat, TeamDev, and more.
Our achievements during this doc sprint:
- More than 10 new docs and samples added (details below)
- 35 issues fixed
- 49 pull requests created
The focus of the doc sprint
Kubeflow is an open source platform for developing and deploying machine learning (ML) systems on Kubernetes. Kubeflow is for data scientists who want to build and experiment with ML pipelines. It’s also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving.
The launch of Kubeflow 1.0 is coming up fast. The doc sprint presented an ideal opportunity to update and improve the docs for this milestone release. We focused on the top priorities and frequently asked questions in the issue tracker.
We sent out a quick survey to Kubeflow users in January, asking people what type of documentation they find most useful and where they see a need for improvement. Thank you to everyone who completed the survey!
Based on the survey results, code samples with related tutorials are the most valuable type of document. In particular, people are looking for end-to-end training and serving of a particular model type (for example, image recognition or recommendation). The next most helpful type of tutorial is end-to-end deployment of Kubeflow on a particular platform. Survey respondents also found that the code samples are sometimes out of date or that they cause errors and thus cannot be completed.
For more about the survey results, see the lightning talk given at the doc sprint.
New and updated docs and samples
Addressing the feedback from the survey, we created or updated these docs during the sprint:
- Added a guide to use cases: the problems that Kubeflow helps you solve.
- Added an end-to-end tutorial for Kubeflow on AWS.
- Added a tutorial for deploying Kubeflow pipelines using GitHub Actions.
- Added a Jupyter Notebook as an end-to-end tutorial for training and serving an image recognition (MNIST) model on Google Cloud Platform (GCP).
- Clarified and tested the Kubeflow installation guides.
- Added docs for Kubeflow on OpenShift.
- Streamlined the docs for Kubeflow on IBM Cloud.
We created the following docs in response to frequently asked questions from users:
- Istio usage in Kubeflow.
- Helm and Kubeflow.
- How to use environment variables in Kubeflow Pipelines.
- An example of using the Kubeflow Pipelines API.
We added and improved docs to provide a shiny user experience for Kubeflow 1.0:
- Refactored guides to multi-tenancy in Kubeflow.
- An accessibility refresh for the Kubeflow home page.
- New reference for the Kubeflow Notebooks custom resource.
- A new guide to generating the Pipelines API reference docs.
And more! See all the action on the Doc Sprint Kanban board.
Learning on the sprint
A doc sprint provides the perfect opportunity to absorb technical writing and UX best practices! Tech writers and a UX researcher presented lightning talks, giving sprinters the opportunity to learn new techniques:
- Understanding points of view in engineering docs.
- Documentation survey results and best practices.
- How to cheat at technical writing.
- You got started. Now what?
Kubeflow 1.0 is on its way
Kubeflow is moving towards its version 1.0 launch. The website currently reflects the latest release candidate (RC) for Kubeflow 1.0. In particular, we’ve published a new Kubeflow overview, the Kubeflow versioning policies and support guide.
Try Kubeflow 1.0 for yourself, and give the community your feedback.
About the Kubeflow docs
The Kubeflow docs receive around 220,000 page views per month. Page views increased by 6.4% from Q3 to Q4 2019 (from 589,933 in Q3 to 630,228 in Q4).
This graph shows the number of commits to the Kubeflow docs on GitHub from February 2019 to February 2020. The peaks reflect the July 2019 doc sprint, the release of Kubeflow v0.7 in November 2019, and the February 2020 doc sprint.
Contributions continued high for a while after the July doc sprint. Time will tell whether the same happens as a result of the February doc sprint. I think contributions will remain higher than usual for a while, due partly to the doc sprint and partly to the activity around the Kubeflow 1.0 launch.
The doc sprint shows people that doc contributions are welcome, and gives people the opportunity to practice the contribution process in a welcoming environment.
Your contributions are welcome
The Kubeflow community welcomes contributors to the docs, samples, and code. Take a look at the issue tracker, in particular for issues labeled good first issue. Read the docs guide for tips on working with the docs and GitHub. Send us a pull request or log an issue!
Thank you to everyone who has contributed to the Kubeflow docs, either as part of the doc sprint or in general as an open source contribution. As the doc sprint organizer, I found it energizing to see how keen the contributors are to help improve the Kubeflow user experience while at the same time learning about open source, tech writing best practices, and Kubeflow.