DEMO: Tackle AI Pipeline Challenges with Cloud-Native Tools

Intel

Published in

Intel Tech

4 min readJan 23, 2024

Help meet scalability, security, and sustainability demands.

Presented by Dr. Malini Bhandaru

Building an effective AI pipeline presents developers with unique scalability, security, and sustainability challenges. Developers must ensure that AI pipelines can scale to meet dynamic demands while balancing costs. Developers must also protect the data — which is often sensitive, proprietary, or regulated — used in AI models, as well as the valuable AI models themselves. Lastly, developers must build pipelines that use resources responsibly and that help advance corporate sustainability goals.

In this demo, we’ll show you how our reference solution addresses these challenges to help developers deliver improved scalability, security, and sustainability to AI pipelines for a variety of use cases. Watch the full demo here.

A complete set of resources

The Intel® reference solution combines a set of cloud-native tools to help overcome the challenges of building your AI pipeline.

· Scalability tools: Kubernetes Pod Autoscaling — both horizontal pod autoscaling and vertical pod autoscaling — improve scalability, and the Kubernetes Event-driven Autoscaling (KEDA) project allows developers to use operators to monitor multiple sources and react per policy.

· Security tools: When securing your container, we recommend using tools like Kubernetes role-based access control (RBAC), Kyverno, and container best practices to maintain strong security policies and practices. Additionally, confidential computing adds an extra layer of security to protect data in use via hardware-supported encrypted memory, integrity, attestation, and protection from access by privileged processes.

· Sustainability tools: The Kepler project gives you visibility into the energy consumption of pipelines, pods, and containers. Intel also offers special-purpose instructions, hardware, and software that leverage the hardware to enable greater energy efficiency, such as Intel® Advanced Matrix Extensions (Intel® AMX), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and Intel® QuickAssist Technology (Intel® QAT).

An overview of an AI pipeline

To illustrate an AI pipeline, let’s walk through the steps of a video stream equipped with object identification, which is common in retail and surveillance use cases.

First, cameras collect the video streams and feed them into a media transcoding gateway microservice. From there, each frame is queued to be processed. Next, an inference microservice processes the frames using more-purpose-driven inference engines, such as a facial recognition application or a retail application that identifies grocery store inventory. Finally, the microservices update the dashboard.

An example AI vision pipeline that includes microservices for media transcoding, frame queuing, and inference.

Demo: A vison-enabled retail camera

In an example video from a grocery store, the pipeline is running five inference engines. At the top left of the dashboard is the inference engine running in our confidential enclave, Intel® Trust Domain Extensions (Intel® TDX). With an inference speed of 25 frames per second (FPS), the engine is handling the 25 input frames with no latency. The engine uses runtime measurement registers (RTMRs) to capture the kernel measurement and BIOS measurement running in the other firmware, attesting that this is a genuine Intel enclave that is running the right code.

The other four inference engine instances on the lefthand side of the dashboard are standard containerized instances that are not confidential. Just like the secure enclave, each containerized instance also captures how many frames it’s receiving and how many it’s able to handle. You can see these instances are dropping more frames than Intel TDX.

On the right half of the dashboard, the top graph represents the number of frames the pipeline is handling and how well it’s handling them. The middle graph displays the number of active instances of the inference engine. When the number of streams increases or latency peaks, the pipeline launches another instance as part of the Kubernetes Pod Autoscaling and KEDA feedback loop. Lastly, the lower right quadrant of the dashboard displays the energy savings enabled by Intel AMX.

Our secure enclave (in the top left corner), based on Intel® TDX, has lower latency than standard containerized instances, while the Intel® AMX accelerator enables energy savings.

When there’s an uptick in foot traffic in the store, the pipeline adds another inference engine instance to process the additional input frames, and power consumption increases. As the number of input frames decreases, the number of active inference engines automatically drops back down to five, and power use decreases accordingly.

Get started

To enhance scalability, security, and sustainability for your AI pipeline, try experimenting with our reference solution on GitHub. You can also take advantage of our Confidential Cloud-Native Primitives (CCNP) project, which provides trust primitives that work across the full range of confidential compute deployment scenarios, including confidential virtual machines, confidential clusters, and confidential containers.

· Try our AI pipeline reference solution

· Visit the CCNP project

About the author

Dr. Malini Bhandaru, Open Source Cloud Architect, Intel

Malini Bhandaru is an innovator, leader, and problem solver delivering solutions and strategies that span IoT/edge, cloud, and hardware platforms. She’s effective at working with cross-functional and global teams and passionate about STEM, diversity, and mentoring.