Cross-cloud MLOps with Tanzu Application Platform and VMware Data Solutions: A 4-Minute Primer
tl;dr: With Tanzu Application Platform and VMware Data Solutions, ML engineers can implement end-to-end machine learning operations in a cloud-agnostic manner, using opensource-first tooling, industry-leading data products, and a GitOps-ready foundation.
MLOps: Streamlining the path to production-ready machine learning
Today, the world of AI must grapple with a strange paradox. Thanks to the emergence of state-of-the-art model architectures, ultra-large-scale data, and recent advances in compute, storage and networking, the impact of AI and machine learning on modern industry has been revolutionary. Yet, the operational process behind AI itself continues to lag significantly behind. Surveys show that up to 90% of ML models fail to make it to production, and those that do take up to three months to deploy (on average). Moving from pilot to production remains a formidable challenge for many enterprises.
MLOps was designed to try and address this problem. Using mechanisms like automation (complete and partial), continuous improvement, continuous monitoring and shared collaboration, it provides a framework for managing and deploying machine learning models with greater efficiency, agility and security. If this sounds familiar, it’s because it is: MLOps is actually an offshoot of DevOps. Back when DevOps was first conceived in the 2000s, enterprise developers were having similar struggles with releasing their application workloads to production at scale. Now, high-velocity, agile software development teams are much more commonplace. In turn, MLOps aims to reproduce DevOps successes from the software development world by incorporating the same foundational principles, while also managing ML-specific concerns like drift, governance, trust and continuous training.
Why Tanzu Application Platform for MLOps?
As a DevOps-native platform, Tanzu Application Platform (TAP) can be leveraged as a central hub for deploying and managing MLOps tools, pipelines and artifacts across teams. Some of its benefits include:
- Cross-cloud AI: TAP lets you use the same basic operations for managing ML workloads and artifacts, irrespective of the cloud environment. Unlike many proprietary-first platforms, TAP is cloud-agnostic, supporting on-premise, hybrid cloud and public cloud environments. It does this by leveraging Kubernetes under the hood, but with a convenient abstraction layer. Using tools like tanzu cli, Carvel tools, Bitnami Services and Crossplane with TAP, ML engineers can discover, deploy, build and/or integrate ML frameworks, tools, pipelines, custom apps and datasources without extensive knowledge of Kubernetes.
- Opensource-first: TAP is interoperable with major opensource MLOps solutions and tools like Kubeflow Pipelines, MLflow, Argo Workflows, Jupyterhub, Tensorflow Serving, etc. This aligns with the ML strategy of a growing number of enterprises. For many, the community-driven, rapid innovation, cost effectiveness and flexibility of opensource ML makes it a preferred approach to prevent vendor lock-in, especially as opensource ML options continue to proliferate and mature. TAP provides easy buttons for integration with ML accelerators, Carvel packages and VMware Application Catalog, and allows you to mix and match frameworks and tools from different vendors seamlessly.
- CICD-native: TAP provides several built-in primitives for automating the continuous deployment of ML apps, models and pipelines in a GitOps-friendly manner. Using Supply Chains, AppCR for lightweight pipelines, Tanzu Cloud-native Buildpacks for automated container builds, and Knative Services for serverless deployments, TAP allows ML engineers to supplement MLOps workflows and inferencing options with DevOps-ready automation for all of their ML workload deployments.
VMware Data Solutions: Integrating the twin pillars of data and ML
Data is not always treated as a first-class tenet of MLOps when the subject comes up. Yet, among the essentials for successful MLOps, perhaps none carries as much significance as data. Indeed, data and ML are inextricably linked in production. Even the most state-of-the-art models can only perform as well as the data that they are trained with. Factors like data size, distribution, scalability, recency, latency, consistency and locality have substantial bearing on the model’s power. In turn, many of these factors are directly (or indirectly) tied to the performance, capability and scalability of the underlying data platform. With VMware Data Solutions, data scientists and ML engineers can leverage an industry-leading portfolio of scalable, high-throughput, low-latency data platforms for many advanced AI / ML use cases, including MPP data warehousing, in-database analytics, scalable feature stores, real-time stream processing, and more. These solutions are also cloud-native, cloud-agnostic, and interoperable with TAP.
Stay tuned for a new blog series which will cover a deeper dive of the many possibilities of cross-cloud AI with TAP and VMware Data Solutions, from standard MLOps to LLMOps.