Building a Gen AI Assistant Chatbot on AI PC

OpenVINO™ toolkit
OpenVINO-toolkit
Published in
6 min readMay 13, 2024

Author: Dmitriy Pastushenkov, Ria Cheruvu
Contributors:
Max Domeika, Paula Ramos

Travelling quite a lot, either for business or leisure, I (Dmitriy) always have some ideas in mind about what my next trip could be. While it’s easy to plan ahead, spontaneously planning my next stop when I’m in a new area can be difficult, due to Internet connectivity and so many other issues. I would also like to get personalized recommendations for attractions that might interest me and learn more, without the cost of having to book time with a travel agent.

What if we could have a virtual travel advisor on our AI PC? We can ask questions and create trip plans on the go while we’re on vacation, just using our laptop! ✈️

Let’s go ahead and look at how we can do this: Creating a virtual travel assistant that can help plan my next travel to Rome!

Figure 1: Photograph: “Sunlight is waking up Rome” by Jorgen Hendriksen

Steps to Create a Virtual Travel Assistant on AI PC

First, we’ll use the OpenVINO™ toolkit from Intel©, a free open-source toolkit, to build our application on the AI PC.

Step 1: Clone the OpenVINO Notebooks repo and use the llm-chatbot notebook. We’ll retrieve the llm-chatbot notebook from the OpenVINO™ notebooks repo, where we’ve got 100s of examples including Gen AI models such as chatbots, image generation models like stable diffusion, and much more.

Step 2: We’ll then start with selecting the LLM model we’ll use for this application, in this case let’s select Llama-3–8b-instruct, a new and popular AI model used for chatbots. Llama3 is an incredibly powerful model that can provide detailed answers on almost any topic with great accuracy, so it makes perfect sense to use for our travel assistant application.

Figure 2: List of LLM model options available in OpenVINO — snippet from llm-chatbot notebook in OpenVINO Notebooks repository.

But to fit this model locally and have it run on the go; we’ll need to perform some optimizations.

Step 3: To optimize our Llama3 model to make it memory-efficient and fast, we can quantize it!

Quantization is the process of compressing a model to reduce its model size and footprint. For the quantization process, we’ll use the OpenVINO™ Neural Network Compression Framework, where we apply a weight compression technique. This allows us to preserve the model’s accuracy while improving its speed and reducing its size.

Let’s look at quantizing our model to an INT4 precision, which we can accomplish by using the code snippet below (shortened for brevity).

In this case, quantizing our Llama-3–8b-instruct model to INT4 precision leads to a model size of 5149.7 MB.

Step 4: As part of our penultimate step, we’ll need to select our compute engine (the CPU, GPU, or Neural Processing Unit) that we’ll use to load our AI models onto. In this case, we’ll execute our LLM model on our CPU as mentioned above, but we could also easily execute it on the GPU!

Figure 3: (Left) The available devices for executing our model. (Right) Selecting the compute engine we’d like to run.

Step 5: Finally, we’re ready to start execution, and run our chatbot! To load and run inference with our model, we can use the following code snippet — notice our use of the OVModelForCausalLM instantiation, to use our now optimized model.

The llm-chatbot notebook wraps up our model in a neat Gradio interface. Let’s go ahead and ask our optimized model some questions! Figures 4 and 5 show GIFs of the model in action — please note the GIFs have been trimmed.

Figure 4: Asking the optimized LLM model about a nearby museum to visit (GIF trimmed for brevity).
Figure 5: Asking the optimized LLM model to create a three-hour travel plan (GIF trimmed for brevity).

Figure 6 presents the final three-hour travel plan the model created:

Figure 6: Final-three hour travel plan created by the LLM model.

What’s next?

Here’s a few places to get started:

  • Try this chatbot on your setup today by cloning the notebook here! You might consider enhancing the chatbot with weather forecast or accommodation recommendations, or even an external knowledge base (stay tuned for a blog on that shortly!)
  • Find other examples on the OpenVINO™ Notebooks repo, all of which are open-source and free, including LLMs with RAG, multimodal LLMs, and LLM AI Agents!
  • More resources? To read more about OpenVINO™ toolkit and LLMs, you can also check out our new whitepaper here.

As you’re building your AI solutions with Intel, we’re here to help!

You can reach out via the OpenVINO Notebooks Github repository to ask questions and provide examples of your projects, and join the AI PC Developer Program to stay updated on new resources, at this page!

We’re excited to see the extraordinary things you can accomplish with AI PC. 🚀

About Dmitriy Pastushenkov

Dmitriy Pastushenkov is a passionate AI PC Evangelist at Intel Germany with more than 20 years of comprehensive and international experience in the industrial automation, industrial Internet of Things (IIoT) and real-time operating systems and AI. Dmitriy has held various roles in software development and enablement, software architecture and technical management. As an AI PC Evangelist Dmitriy focuses on OpenVINO and other components of AI Software Stack for the new Intel AI PC. Dmitriy has a Master’s degree in Computer Science from Moscow Power Engineering Institute (Technical University).

About Ria Cheruvu

Ria Cheruvu is an AI Software Architect and Evangelist at Intel. She has a master’s degree in data science from Harvard University and is an instructor of data science curricula. Ria holds multiple patents and publications on AI and trustworthy AI and is an accomplished industry speaker, having delivered keynotes and technical talks for Women in Data Science, QS EduData Summit, TEDx, DEF CON loT Village, and other communities to inform on challenges and solutions in the space. As a technical pathfinder, she is passionate about the importance of open-source communities and women in STEM, and enjoys learning about and contributing to disruptive technology spaces.

About Max Domeika

Max Domeika is a principal engineer at Intel focusing on AI Software Application Development. Max holds multiple patents as a result of his innovation work and is the author of two books, “Software Development for Embedded Multi-core Systems” from Elsevier and “Break Away with Intel Atom Processors” from Intel Press. Max earned a BS in Computer Science from the University of Puget Sound, an MS in Computer Science from Clemson University, and an MS in Management in Science and Technology from Oregon Graduate Institute. Max has been with Intel for 28 years.

About Paula Ramos

Paula Ramos has a PhD in Computer Vision, with more the 19 years of experience in the technological field. She has been working developing novel integrated engineering technologies, mainly in the field of Computer Vision, robotics and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and postgrad research, she deployed multiple low-cost, smart edge & IoT computing technologies that can be operated without expertise in computer vision systems such as farmers. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry. Currently, she’s an AI Evangelist at Intel. Here you could find her LinkedIn profile, and her most recent talks.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

--

--

OpenVINO™ toolkit
OpenVINO-toolkit

Deploy high-performance deep learning productively from edge to cloud with the OpenVINO™ toolkit.