Introducing OpenVINO™ 2024.1: Enable Your Generative AI Workloads with Enhanced LLM Performance and Broadened Support

Published in

OpenVINO-toolkit

4 min readMay 6, 2024

Welcome to the latest advancement in the OpenVINO™ toolkit, where each new release not only broadens the horizon of possibilities but also demonstrates its position in the ever-evolving landscape of AI inferencing and deployment. As we unveil the newest features in OpenVINO™ 2024.1, we continue our commitment to providing enhanced performance, flexibility, and ease of use for developers worldwide.

The release of OpenVINO™ 2024.1 is shaped by robust community feedback and a clear vision to empower developers with further large language model (LLM) performance improvements, expanded model support, and more streamlined deployment across diverse platforms, in the cloud or locally. Let’s walk through the most important updates that we have made.

Improvements in Large Language Models inference

The development of LLMs is still advancing at an astonishing speed. While the already formidable capabilities of LLMs are supercharged with remarkable performance gains, OpenVINO’s optimization and inference acceleration refine the execution of these complex models. This enables faster andmore efficient processing, less computational overhead and maximum hardware potential, which directly translates into LLMs achieving higher throughput and lower latencies.

LLM compilation time and memory footprint is reduced through additional optimizations with compressed embeddings. Improved 1st token performance of LLMs is achieved on 4th and 5th generations of Intel® Xeon® Platforms with Intel® Advanced Matrix Extensions (Intel® AMX) and on Intel® Arc™ GPUs.

Better LLM compression and improved performance is attained with oneDNN. Now, LLMs with INT4 and INT8 precision after quantization or compression are supported on Intel® Arc™ GPUs. Significant memory reduction for select smaller GenAI models is achieved on Intel® Core™ Ultra processors with integrated GPU.

Also, it is now possible to apply fine-tuning on INT8 PyTorch models after Post-training Quantization to improve model accuracy and make it easier to move from post-training to training-aware quantization. An example demonstrating it has been added.

More Gen AI coverage and framework integrations

Diving deeper into the realm of Generative AI with OpenVINO™, in this new release, OpenVINO™ has broadened the horizons for Generative AI, enabling coverage across a wider spectrum of neural architectures and applications.

The newly released state-of-the-art Llama 3 model and Phi3, are supported and optimized by OpenVINO™. Mixtral, the LLM with The Mixture of Expert (MoE) architectures, and URLNet models are optimized for performance improvements on Intel® Xeon® Processors. The text-to-image model, Stable Diffusion 1.5, and the LLMs, ChatGLM3–6b, and Qwen-7B models, are optimized for improved inference speed on Intel® Core™ Ultra processors with integrated GPU.

Support for Falcon-7B-Instruct, a GenAI LLM ready-to-use chat/instruct model with superior performance metrics is now available with OpenVINO™.

Other models that are now supported include YOLOv9, YOLOv8 Oriented Bounding Boxes Detection (OOB), Stable Diffusion in Keras, MoblieCLIP, RMBG-v1.4 Background Removal, Magika, TripoSR, AnimateAnyone, LLaVA-Next, and RAG system with OpenVINO and LangChain, for which we also provided Jupyter Notebook examples in the OpenVINO Notebooks repository.

Changes for new platforms and enhancements for existing ones

The preview NPU plugin for Intel® Core™ Ultra processors is now available in the OpenVINO open-source GitHub repository, in addition to the main OpenVINO package on PyPI.

The JavaScript API is now more easily accessible through the npm repository, enabling JavaScript developers’ seamless access to the OpenVINO API. Documentation has been extended to help developers start integrating their JavaScript applications with OpenVINO™.

FP16 inference on ARM processors is now enabled for the Convolutional Neural Network (CNN) by default. Performance has been improved significantly for a wide set of models on ARM devices. FP16 inference precision is now the default for all types of models on ARM devices. CPU architecture- agnostic build has been implemented, to enable unified binary distribution on different ARM devices.

New and modified notebooks

OpenVINO notebooks remain a valuable resource for demonstrating the utilization of OpenVINO in the most important advancements in the AI field. Recently, we made some changes to the OpenVINO notebooks repository, including changing the default branch from ‘main’ to ‘latest’, and improving the naming structure of the notebooks within the “notebooks” folder.

Use the local README.md file and OpenVINO™ Notebooks at GitHub Pages to navigate through the content.

The following notebooks have been updated or newly added:

· Grounded Segment Anything

· Visual Content Search with MobileCLIP

· yolov9-optimization

· Yolo V8 Oriented Bounding Box Detection Optimization

· Magika: AI powered fast and efficient file type identification

· Keras Stable Diffusion

· RMBG background removal

· AnimateAnyone: pose guided image to video generation

· LLaVA-Next visual-language assistant

· TripoSR: single image 3d reconstruction

· RAG system with OpenVINO and LangChain

· Hello, NPU!

Thank you to our contributors!

As we celebrate the latest milestones achieved with OpenVINO™, it’s the collective effort of our contributors that truly deserves the spotlight. Your invaluable contributions have not only enriched the OpenVINO™ toolkit but have also propelled the community forward, fostering an environment of innovation and collaboration. We extend our heartfelt gratitude to every individual who has contributed, whether through direct code submissions or the vibrant exchange of ideas in our community.

As we look ahead, we are more excited than ever to continue this journey with you. We encourage developers, both seasoned and new, to keep contributing. Your unique perspectives and innovative ideas are what shape OpenVINO™ into a great tool that empowers developers to turn their AI visions into reality.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.