AI is coming to the PC — AI PC Essentials

OpenVINO™ toolkit
OpenVINO-toolkit
Published in
7 min readApr 19, 2024

Authors: Dmitriy Pastushenkov, Ria Cheruvu, Max Domeika, Paula Ramos

AI applications are exploding in prevalence in our daily lives. Whether it’s using social networks or buying goods from online retailers, most of the content you see on your screen is likely recommended by AI.

While working from home and meeting with colleagues via collaboration tools, you can also be sure that many features, like background blurring, are implemented using AI.

Another popular AI application is chatbots. These assistants communicate with us effectively, solving many tasks such as answering questions, providing recommendations, doing summarization, and even writing technical blogs. 😉

These types of AI use cases can be conditionally classified into two groups — Conventional AI and Generative AI.

Conventional AI vs. Generative AI

Figure 1. Conventional AI vs Generative AI: Use case comparison.

Conventional AI has existed for many years in the form of machine learning (ML). Generative AI is relatively new but has surged in popularity in recent years after the introduction of foundation models; particularly popular are Large Language Models (LLMs) and Visual Models.

Let’s compare these two types of AI use cases and corresponding models. In the case of Conventional AI, the output prediction is based on the model itself and the input. For example, using a low-resolution input, the model can predict what a high-resolution video will look like. In contrast, Generative AI models can generate unique content like text, code, and video.

Conventional AI models are normally rather small, taking maybe only a maximum of 100s megabytes of memory on a hard disk and RAM. Meanwhile, Generative AI models can contain billions or even trillions of parameters and require 10s gigabytes of memory and RAM.

Furthermore, Generative AI applications are interactive with users. A user can ask a chatbot questions about a research document or generate a relevant image for their presentation and create a back-and-forth thread with the model. In this case, having a longer latency can be acceptable to achieve high-quality outputs. For Conventional AI, many algorithms must run in real-time, as we wouldn’t want to wait multiple seconds while our conference application performs background blurring during our meeting.

Now, in the past (the 2000s “pre-cloud” era) it was rather common to run conventional AI on PCs or the edge, but of course, these algorithms weren’t very complex.

With growing compute and memory requirements for AI models (particularly Gen AI), it has become rather common to run models in the cloud ☁️ , where huge compute resources can be allocated, as shown in Figure 2. PCs send requests to the cloud and display the results of the AI model, while the AI inference itself runs in the cloud.

Figure 2. AI workloads can span across client devices and cloud servers.

However, running AI workloads on the cloud leads to some limitations.

First: an Internet connection is mandatory, which hinders the portability of the AI assistant.

The second issue is high latency. Hosting an AI solution on the cloud causes a round trip from your PC to the cloud and back, which produces some latency, that can bother even the most patient user.

And third, which can be critical for many use cases, is data privacy. Imagine, that you are working on a very innovative product and just need to summarize the script of a meeting with your colleagues or external stakeholders. You’d want to ensure this data is not shared with any public service, per your company policy — because of this, transferring your private enterprise data to the cloud doesn’t always make sense.

But we have good news for you — AI is coming back to the PC, and we call it the AI PC. With the release of the Intel® Core™ Ultra processor in December 2023, we have ensured that users can run even advanced AI applications locally on their PCs, have more control over their data, and that it’s all on-device, and portable — you can take it with you on the go.

Now, imagine being able to take an AI assistant with you on the go, like a vacation-planning chatbot that you can use when on vacation, without needing an Internet connection, to plan the next stop you’ll visit. Your assistant can connect with the cloud when necessary to obtain additional information when it best suits your use case. You can accomplish this today with the AI PC.

How does the AI PC work?

Three compute engines work together to deliver the AI PC experience:

Figure 3. Exploring the AI Engines in the Intel® Core™ Ultra processor and their capabilities.
  1. The CPU: The backbone that allows us to run AI models flexibly, at a lower latency, allowing you to do more. This type of compute is optimal for workloads like YOLO-object detection, for detecting nearby pedestrians or when an object is removed from a shelf.
  2. The built-in GPU: The power-house compute, that allows us to run compute-intensive workloads fast to save time. This engine is perfect for larger Gen AI systems, like the image-generation models Stable Diffusion and Latent Consistency Models, generating high-quality visuals.
  3. The small and agile Neural Processing Unit (NPU): The compute that is specialized to help accelerate AI models so they’re fast and efficient, always running in the background and always ready to use. This type of compute is optimal for workloads like background blurring, or noise removal!

We can use these compute engines to maximize the throughput of our AI application, and make sure energy consumption is low when you need it, and high when you can afford it.

How do we leverage this compute to create intelligent apps on the AI PC?

Building Smart Apps on the AI PC

To get started and take advantage of the computing power the AI PC offers, we’ll first need to install the required drivers:

- NPU driver installation

- GPU driver installation

To run AI models on the AI PC, we can use the OpenVINO™ toolkit from Intel®, an open-source, free toolkit for running optimized AI models. AI models can come from different frameworks, like PyTorch and Tensorflow, and use OpenVINO™ to accelerate performance on different types of hardware.

We can use OpenVINO™ to create applications with the AI PC in three simple steps:

1. We start with the command ‘pip install openvino’ to install the latest version.

2. We can then clone the OpenVINO™ Notebooks Github repo, a repository that has 100s of examples you can run on the AI PC today.

3. Let’s select the example of tflite-selfie-segmentation/, to blur our background on a web conference call. Let’s run our simple AI application on the NPU.

Figure 4. A snippet of the OpenVINO Notebooks notebook 243-tflite-selfie-segmentation, for live background blurring.

We can look at the available devices by running this quick snippet of code, as shown below. In our case, we have an Intel® Core™ Ultra 7 processor, a built-in Intel® Arc™ GPU, and a NPU.

Figure 5. A snippet of the available hardware devices for running background blurring.

After completing the necessary model download, inference device selection, and model compilation steps, we can run the application.

Here is a screenshot of background blurring taking place. Below, we’ve opened Task Manager and you can see the NPU compute usage quickly peak in the plot as it is used to accelerate this workload. You can also run the same application, on the CPU or GPU, without any code modifications depending on your use case.

Figure 6. A demo of background blurring in action!

If you’re not sure which device to use, you can select the AUTO option, and let OpenVINO decide which device to use for the most efficient inference. To learn more about this feature, please refer to OpenVINO documentation.

What’s next?

Whether you’re a developer, customer, domain expert, or an executive decision-maker, you can get started with building today for your use case.

Join the AI Developer Program to stay updated, at this page. You can also try this example and many other tutorials yourself at the openvino_notebooks GitHub repository on the CPU, GPU, and selectively on the NPU.

We would love your feedback!

Stay tuned for our next blog, where we’ll see how to develop a Generative AI chatbot with Large Language Models and Retrieval Augmented Generation on the AI PC, and share more resources there.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

--

--

OpenVINO™ toolkit
OpenVINO-toolkit

Deploy high-performance deep learning productively from edge to cloud with the OpenVINO™ toolkit.