Part One: Crafting an AI-Powered Medical Assistant: Transforming Healthcare with OpenVINO™

Published in

OpenVINO-toolkit

6 min readAug 1, 2024

Authors: Anisha Udayakumar and Zhuo Wu

We’ve all been there sitting in a doctor’s waiting room, time dragging on, anxiety building with each passing minute. For patients, this wait can be particularly stressful, especially those with serious or chronic conditions. Meanwhile, doctors face their own challenges: tight schedules and the pressure to provide thorough care in a limited time. This is where AI can make a significant impact. Imagine an AI assistant that engages with patients before they enter the doctor’s office. This assistant listens to their concerns, gathers comprehensive information about their symptoms and medical history, and provides a detailed summary for the doctor. By the time the patient sees the doctor, much of the groundwork is already laid out, making the consultation faster and more focused.

Creating such an AI assistant involves deploying multiple advanced AI models, including Automatic Speech Recognition (ASR) and Large Language Models (LLM). These models are computationally intensive and require significant resources to run efficiently. Without optimization, running these models in real-time can be slow and resource-heavy, which is not practical in a fast-paced healthcare setting. This is where model optimization and efficient deployment become crucial. To address these challenges, we leverage the OpenVINO™ toolkit. This powerful framework optimizes and deploys AI models on Intel hardware, boosting performance and ensuring efficient use of resources. OpenVINO™ also provides the flexibility to deploy these models across various devices, making it a versatile solution for real-world applications. Here’s how we built a custom AI medical assistant using this toolkit.

Step-by-Step Implementation

Given the functionalities of this healthcare AI assistant, as mentioned above, its workflow could be described as follows:

Hence, the step-by-step instructions for setting up the environment and running the application work as follows.

1. Setting the Stage

First things first: the environment. The assistant runs on Python 3.8 or higher. The following instructions cover setting up on both Ubuntu and Windows.

For Ubuntu, you’ll need to install essential libraries and tools:

sudo apt install git gcc python3-venv python3-dev

NOTE: If you are using Windows, you may also need to install Microsoft Visual C++ Redistributable.

2. Creating a Virtual Environment

To keep things clean and manageable, we create a virtual environment. This isolated environment ensures that our dependencies are neatly contained.

For Ubuntu:

python3 -m venv venv
source venv/bin/activate

For Windows:

python3 -m venv venv
venv\Scripts\activate

3. Cloning the Repository

Next, we clone the repository housing our project. This step is consistent across both operating systems:

git clone https://github.com/openvinotoolkit/openvino_build_deploy.git
cd openvino_build_deploy/ai_ref_kits/custom_ai_assistant

4. Installing Dependencies

With the virtual environment activated, we install the necessary packages:

python -m pip install --upgrade pip 
pip install -r requirements.txt

5. Accessing and Setting Up Models

We utilize Meta’s LlaMA model for natural language understanding. Accessing the model involves authentication via Hugging Face:

huggingface-cli login

Follow the prompts and authenticate using the same email you used for Meta AI’s website. This step is crucial for downloading and utilizing the LlaMA model.

6. Conversion and Optimization with OpenVINO™

To make the models efficient for real-world applications, we need to convert and optimize them. Here’s how we did it:

Automated Speech Recognition (ASR) Model:

python convert_and_optimize_asr.py --asr_model_type distil-whisper-large-v2 --precision int8

This script converts and optimizes the ASR model, performing weights quantization for enhanced performance.

Chat Model (LlaMA):

python convert_and_optimize_chat.py --chat_model_type llama3-8B --precision int4

By quantizing the weights, this script ensures the chat model runs efficiently on Intel hardware.

7. Running the Application

With models ready, we launch the application using Gradio, a user-friendly interface for interacting with our AI assistant:

python app.py --asr_model_dir path/to/asr_model --chat_model_dir path/to/chat_model

Gradio provides a local URL, typically http://127.0.0.1:XXXX, to interact with the assistant. For public accessibility, use the --public_interface flag.

Navigate to the Gradio URL, and you’ll find an interface with a microphone icon. Click it, speak your query, and watch as the assistant processes and responds in text. This interactive experience showcases the assistant’s ability to understand and engage in meaningful dialogue.

8. Extension for multi-lingual support

Given the extensive support OpenVINO provided to optimize and accelerate inferencing for generative AI models, the healthcare AI assistant could be easily extended to support more languages than just English. Take Chinese for example, the following steps show how you could extend to make the AI assistant work in Chinese.

Adding a New Model for ASR and Chat:
To change the ASR model, follow these steps:

Modify MODEL_MAPPING: Add the desired model to the MODEL_MAPPING dictionary.

MODEL_MAPPING = {
    "distil-whisper-large-v2": "distil-whisper/distil-large-v2",    
    "new-model": "path/to/your/new-model",
}

Modify configuration for model selection

parser.add_argument("--asr_model_type", type=str, choices=["distil-whisper-large-v2", "path/to/your/new-model"],
                        default="distil-whisper-large-v2", help="Speech recognition model to be converted")

To run the ASR model for speech recognition in Chinese, you could use the following command:

python convert_and_optimize_asr.py --asr_model_type belle-distilwhisper-large-v2-zh --precision int8

Similar changes could be made to the chat model to add your new models, and to run the chat model in Chinese, you could run the command as:

python convert_and_optimize_chat.py --chat_model_type qwen2-7B --precision int4

Finally, for the AI assistant to work in Chinese, run app.py with the following command:

python app.py --asr_model_dir path/to/belle-distilwhisper-large-v2-zh --chat_model_dir path/to/qwen2-7B

Conclusion

You now have the foundational steps for launching and running your AI-powered medical assistant. If you’re interested in learning how to customize this assistant for different industries and integrate various models using the foundational kit, check out Part Two for detailed tips and additional guidance. Happy coding!

Additional Resources

Part Two: Customizing the AI-Powered Medical Assistant with OpenVINO™
Edge AI Reference Kit
OpenVINO™ Model Server GitHub repository
OpenVINO Documentation
Jupyter Notebooks
Installation and Setup
Product Page

About the Authors

Anisha Udayakumar is an AI Software Evangelist at Intel, specializing in the OpenVINO™ toolkit. At Intel, she enriches the developer community by showcasing the capabilities of OpenVINO, helping developers elevate their AI projects. With a background as an Innovation Consultant at a leading Indian IT firm, she has guided business leaders in leveraging emerging technologies for innovative solutions. Her expertise spans AI, Extended Reality, and 5G, with a particular passion for computer vision. Anisha has developed vision-based algorithmic solutions that have advanced sustainability goals for a global retail client. She is a lifelong learner and innovator, dedicated to exploring and sharing the transformative impact of technology. Connect with her on LinkedIn.

Zhuo Wu is an AI evangelist at Intel focusing on the OpenVINO™ toolkit. Her work ranges from deep learning technologies to 5G wireless communication technologies. She has made contributions to computer vision, machine learning, edge computing, IoT systems, and wireless communication physical layer algorithms. She has delivered end-to-end machine learning and deep learning-based solutions to business customers in different industries, such as automobile, banking, insurance, etc. She also has carried out extensive research in 4G-LTE and 5G wireless communication systems, and filed multiple patents when she was working as a research scientist at Bell Labs (China). She has led several research projects as the principal investigator when she was an associate professor at Shanghai University.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.