Kubeflow Community User Survey Fall 2019 Results

Elvira Dzhuraeva
kubeflow
Published in
6 min readDec 20, 2019

Recently we ran our Fall 2019 Kubeflow user survey to try and better understand the trends and issues that were most important to our community and users. Going towards Kubeflow 1.0 we thought it made sense for us to reach out to our audience to hear from them, so that the results could be available to the Kubeflow community for feature development and better understanding of our priorities.

It is our second survey this year and it has received over 75 responses. The results for the first one can be found here. In this analysis we decided to focus on 50 responses which are from people who use Kubeflow in their current workflow. We are happy to announce that complete charts have been cleaned up and available here.

Who are our users?

The majority of survey respondents are Data Scientists and Machine Learning Engineers followed by DevOps and Software Engineers. In addition, we have a small percentage of non-technical people, such as product managers responding to our survey and interested in Kubeflow adoption.

We also wanted to know how the respondents are currently involved with Kubeflow. It appeared that about 60% of them are Kubeflow users and around 30% are Kubeflow integrators.

Pic. 1. What is your primary role? N = 50

The respondents represent two clearly distinguishable organization types: Enterprise companies with 5000+ employees and Mid-size Business with 50–500 employees. For further reference, let’s call them Enterprises and Disruptors.

Pic. 2. How many people work for your company / organization? N = 50

What is their Ops background?

Although Kubeflow is a Kubernetes native platform, it was still important to understand what type of infrastructure is utilized by our users. There are 3 leaders in this area: 50% on-premise, 46% GCP, and 42% AWS users.

Pic. 3. Where do you run your AI/ML workloads? (Multiple select)? N = 50

In 0.6 release Kubeflow has switched from ambassador to Istio. For us as a community it was crucial to understand how the decision resonated with our adopters. The results appeared to be promising. Thus, 70% of people get along with our decision and only 20% are not planning to use Istio in the near future.

Pic. 4. Do you use Istio? N = 50

Machine learning lifecycle is always bound to data usage and utilization. Survey respondents were asked about their storage solutions for training data. It was unexpected that almost 60% of our users utilize cloud provider storage for their datasets compared to 30% of those who prefer on-premise storage.

Pic. 5. Where do you store your training data? N = 50

Another key aspect of model training and development is the high demand in computer power. Clearly, our audience takes full advantage of hardware accelerators, 72% of them using GPUs for their AI/ML workloads.

Pic. 6. What hardware do you use for your AI/ML workloads? (Multiple select) N = 50

Enterprises vs Disruptors

As it was outlined above, there are two user groups: companies with 5000+ employees (Enterprises) and companies with 50–500 employees (Disruptors). So, we decided to take a closer look at data to tell the difference between the two types.

The first question was is there a distinction in their infrastructure type? At first, we assumed that on-premise usage in Enterprise should be higher than in mid-size organizations. Surprisingly enough, the results draw us a different picture — on-premise setup remained equal in both groups. As for cloud usage, Enterprises tend to use GCP while Disruptors prefer to use AWS.

Pic. 7. Where do you run your AI/ML workloads? (Multiple select) N = 33

Next question was what was the difference in Kubeflow components usage between 5000+ and 50–500 organizations. As expected, Pipelines and Notebooks remained top priority for both groups. However, Enterprises tend to take advantage of all Kubeflow components, whereas Disruptors focus on a smaller set of components.

Pic. 8. Critical Kubeflow components that you use in your current workflows. (Multiple select) N = 33

In Kubeflow 0.6 and 0.7, we started working on RBAC and had a lot of discussions around what technology stack to use and what architecture type would make Kubeflow available to different identity providers. As we can see it on the bar chart below, the majority of survey respondents utilize AD/LDAP (mostly used by Enterprise) and Github (mostly used by Disruptor).

Pic. 9. Please select the identity providers that you utilize/like to utilize with Kubeflow? (Multiple select) N = 33

From the scalability standpoint, it is quite clear that bigger organizations require to support more users per cluster for their workloads than smaller organizations.

Pic. 10. What is the maximum number of simultaneous users that you expect to support on a single Kubeflow cluster? N = 33

However, the trend for notebooks and pipelines is more or less equal and shows us that the majority of users expect to run on average 50 notebooks and pipelines on a single cluster simultaneously.

Pic. 11. What is the maximum number of simultaneous notebooks that you expect to support on a single Kubeflow cluster?? N = 30
Pic. 12. What is the maximum number of simultaneous pipeline runs that you expect to support on a single Kubeflow cluster? N = 31

Kubeflow today

To understand what are the most used components in Kubeflow, survey respondents were asked to choose the top 3 critical components in their current workload and in the following question they were asked to select the top 3 critical components they plan to use in the future.

Pic. 13. Which topic needs improved documentation the most? N = 50

Pipelines and Notebooks happened to be the undisputed leaders, 82% and 70% respectively. Followed by roughly equal distribution across other components.

Documentation is a vital part when it comes to an open-source project. Survey participants were asked to provide a list of documentation sections they think that need to be improved. Results showed us that people demand more kubeflow end-to-end tutorials and more pipelines tutorials. Interestingly, that “Other” section appeared to be at 3rd position, so let’s see what is in there:

  • Custom installation i.e. using VPC
  • Architecture and installation
  • Kubeflow on-premise installation
  • CI/CD workloads e.g. Gitlab

As it can be seen, documentation for installation is an area that the Community needs to improve and we will make sure that we reach out for volunteers in our upcoming Docs Sprint. Maybe more end-to-end tutorials with various use cases can partially cover this gap.

Pic. 14. Which topic needs improved documentation the most? N = 50

Summary

There are definitely positive signs that as a community we are going towards the right direction. There was a tremendous amount of work done in 0.7 release which makes us closer to end-user expectations. We will continue working on Kubeflow quality and improvements.

Key takeaways:

  • Data Scientists and Machine Learning Engineers remain our main target audience.
  • Kubeflow on-premise experience should be at the same level as it is on cloud these days.
  • Pipelines and Notebooks are key components for Kubeflow users. But Kubeflow users plan to use more targeted Kubeflow components in the future.
  • Kubeflow cluster should support an average of 50 pipelines and 50 notebooks.
  • End-to-end tutorials need more investments and installation should be intuitive and easy enough to lower the entry barrier.
  • Kubeflow needs to support GPU integration and the Community should invest in improved operations and documentation for GPUs.

Thanks to Josh Bottum (Arrikto), Gaurav Karnataki (Google), Katie O’Leary (Google), Thea Lamkin (Google), and Abhishek Gupta (Google) for contributing to this post and a tremendous work on the survey.

If you find this article helpful and would like to learn more about Kubeflow capabilities or become a contributor feel free to:

--

--

Elvira Dzhuraeva
kubeflow

Technical Product Engineer AI/ML at Cisco and Community Product Manager at Kubeflow https://www.linkedin.com/in/dzhuraeva/