Kubeflow Community User Survey Spring 2019

Published in

kubeflow

4 min readDec 20, 2019

Today we are excited to release our first Kubeflow community user survey results. The survey took place this spring and received overall 89 responses.

The goal was to get familiar with our users, understand their use cases and how Kubeflow helped them to overcome their problems.

Let’s get started!

Understanding our audience, their job duties, what companies they work for, what technology they use and what challenges they face is very important.

So, almost half of the survey respondents identified themselves as Data and Machine Learning Engineers, second major group appeared to be Software Engineers and the third place was shared between Data Scientists and DevOps Engineers.

Pic.1.How do you classify your primary role? Top 3: 1. Data or ML Engineer 2. Software Engineer 3. Data Scientist — Pic.1. How do you classify your primary role? N = 89

From organization size standpoint, prevailing interest came from Enterprise companies with almost 50% of the total responses, followed by Small and Medium Business and Startups.

Your organization type? Top 3: 1.Enterprise 2. Small/Medium Business 3. Startup — Pic. 2. Your organization type?

As for infrastructure type, on-premise claimed to be the most used across all users with close to 60% of replies. However, total usage of all clouds combined together showed us the same trend as for on-premise usage.

Where do you run your ML workloads? Top 3: On-premise, GCP, AWS — Pic. 3. Where do you run your ML workloads?

If we dig down a bit more and compare the infrastructure preferences between Enterprise and Startups organizations, we will be able to see that Enterprise prefer on-premise 53% over a cloud 18% and the rest choose hybrid. Opposite can be seen for Startups where cloud usage is dominant with 60% of answers and only 20% of pure on-premise users.

Where do you run your ML workloads (Enterprise)? Top 3: On-premise, AWS, Bare Metal — Pic. 4. Where do you run your ML workloads? (Enterprise). N = 40

Where do you run your ML workloads (Startup)? Top 3: GCP, AWS, On-premise — Pic. 5. Where do you run your ML workloads? (Startup). N = 17

One of the main goals of Kubeflow is to help Data Scientists, by providing them with the tools they can efficiently leverage at work. Support of AI/ML frameworks is one of them. We asked our users to choose frameworks they use most in their organizations and received the following picture.

The top used framework across all responses is Tensorflow which is leveraged by our users in almost 90% of cases. Second and third are Scikit Learn and Pytorch with approximately 60% of all replies . It is important to mention that Kubeflow supports all of these top tools including XGBoost.

What ML frameworks are typically used in your organization? Top 3: Tensorflow, Scikit Learn, Pytorch — Pic. 6. What ML frameworks are typically used in your organization?

Last but not least, we asked our users about their Kubeflow impression, blockers and visible improvements. From the results we have got, it is clearly seen that Kubeflow made a great job in alleviating ML infrastructure management problems, improved model deployment and enabled tracking of model and datasets.

What Kubeflow has improved when compared to your previous process? Top 3: Managing ML infra, Deploying models, Model Tracking — Pic. 7. Which, if any, of these activities has Kubeflow improved when compared to your previous process?

To summarize, Machine Learning Engineers and Data Scientists will remain our main target audience, as well as DevOps and Data Engineers. Back in Spring Kubeflow wasn’t ready to support on-premise infrastructure and was far from Enterprise grade level. But as we received the results, we decided to move towards this direction.

Today we are happy to see that Kubeflow is closer than ever before to embrace Enterprise readiness and support on-premise installation. Documentation has been drastically enriched and more tutorials can be found online. I am happy to see that we are trying to be data driven open source community.

Huge thanks for every survey participant, your response matters to us.

If you find this article helpful and would like to learn more about Kubeflow capabilities or become a contributor feel free to:

Visit our Kubeflow website or Kubeflow GitHub Page
Join the Kubeflow Slack channel
Join the kubeflow-discuss mailing list
Attend a weekly community meeting

Thanks to Josh Bottum (Arrikto), Thea Lamkin (Google), Katie O’Leary (Google), for contributing to the survey.

Kubeflow Community User Survey Spring 2019

Written by Elvira Dzhuraeva