Building a Medical Diagnosis Assistant A Cloud-Powered Q&A System

- Using Google Cloud Services

Bhavyareddyseerapu
Google Cloud - Community
5 min readAug 5, 2024

--

Introduction:

Have you ever wondered what it’s like to walk in a doctor’s shoes? Last month, I had the incredible opportunity to shadow a physician, and it was an eye-opener. The sheer volume of information they need to recall at a moment’s notice is staggering. This experience sparked an idea: what if there was a tool to help medical professionals and students practice and reinforce their knowledge through scenario-based questions?

This idea aligned perfectly with my participation in Code Vipassana Season 6 alongside my friend Vamsi Kummaragunta. We delved into Large Language Models (LLMs) and their practical applications, realizing we could create something impactful by combining our medical assistance idea with Google Cloud and Vertex AI.

Thus, our journey began to develop a voice-enabled Medical Q&A simulator. Our goal was to create an interactive platform that generates realistic medical scenarios, asks pertinent questions, and provides instant feedback, all through voice interaction.

In this blog post, we’re excited to share our experience building this application. We’ll explore how we used Flask, integrated Google Cloud AI services, implemented text-to-speech and speech-to-text functionalities, and deployed our application using Cloud Run.

So, whether you’re a developer exploring AI-powered web applications, a medical professional interested in innovative learning tools, or simply curious about the intersection of technology and healthcare, buckle up! We’re about to embark on an exciting journey through the development of our Voice Med Q&A Simulator.

Design:

Our application is designed with a user-friendly interface that allows for seamless interaction through both text and voice. The architecture combines frontend technologies (HTML, CSS, JavaScript) with a robust backend powered by Flask and Google Cloud services.

Key components of the design include:

1. Frontend: A responsive web interface built with HTML, CSS, and JavaScript, featuring a chat-like UI for Q&A interactions.
2. Backend: A Flask server that handles user requests, generates medical scenarios, and evaluates user responses.
3. AI Integration: Utilization of Google’s Gemini Pro for generating medical scenarios and evaluating answers.
4. Voice Capabilities: Integration of Google Cloud Speech-to-Text for transcribing user voice inputs and Text-to-Speech for audio playback of scenarios and evaluations.
5. Data Management: Use of Google Cloud Storage for storing and accessing a medical dataset.

This design allows for a highly interactive and accessible application, catering to various learning styles and preferences in medical education.

Prerequisites

Before starting, ensure you have the following:

You have our code in your hands. You can get it from our GitHub link.

Please create your Google Cloud Platform account.

You can download the sample dataset from Kaggle to use for this application.

Step-by-step Instructions:

Please open the Google Cloud console create a project and set your project

  1. Once connected to Cloud Shell, you check that you’re already authenticated and that the project is set to your project ID using the following commands:
gcloud auth list
gcloud config set project <YOUR_PROJECT_ID>

2. Then clone the git repository using the below command:

git clone https://github.com/seerapubhavyareddy/medicalQ-A.git

3. Change to the corresponding project directory

cd medicalQ-A

First, let’s set up our project structure:

```
medicalQ-A/

├── app.py
├── templates/
│ └── index.html
├── static/
│ └── styles.css
└── uploads/
```

4. Enable the required API

Cloud Storage, Text-to-Speech, Speech-to-Text, Cloud Run

or you can manually enable them from the shell.

gcloud services enable storage.googleapis.com
gcloud services enable texttospeech.googleapis.com
gcloud services enable speech.googleapis.com
gcloud services enable run.googleapis.com

5. Create a service account and download the keys.json file.

6. Configure Environment:

Place the keys.json file in the project root.

Set the GOOGLE_APPLICATION_CREDENTIALS environment variable:

Copyexport GOOGLE_APPLICATION_CREDENTIALS=/path/to/keys.json
Set GEMINI_API_KEY = 'YOUR_API_KEY'

7. Prepare the Dataset:

Upload your healthcare_dataset.csv to a Google Cloud Storage bucket and keep it as public.

Update the gcs_path variable in app.py with your bucket path

8. Build and Run with Docker:

docker build -t medicalq-a .
docker run -p 8080:8080 medicalq-a

Access the application at http://localhost:8080

9. Cloud Run Deployment:

After building and testing your application locally, follow these steps to deploy it on Google Cloud Run:

#Build your Docker image and tag it for Google Container Registry:
docker build -t gcr.io/[YOUR-PROJECT-ID]/medicalq-a .
#Push the image to Google Container Registry
docker push gcr.io/[YOUR-PROJECT-ID]/medicalq-a
#Deploy the image to Cloud Run
gcloud run deploy medicalq-a --image gcr.io/[YOUR-PROJECT-ID]/medicalq-a --platform managed --region [YOUR-REGION] --allow-unauthenticated

Cloud Run will provide you with a URL once deployment is complete. You can access your application through this URL.

Your application should look something like this with the generated URL:

Application image

Deep Dive into Google Cloud Services:

Google Cloud Storage:

We use this to store and access our medical dataset. It provides scalable, secure, and reliable storage for our application data.

Google Cloud Text-to-Speech:

This service converts our generated text scenarios and evaluations into natural-sounding audio, enhancing accessibility and user experience.

Google Cloud Speech-to-Text:

Enables voice input functionality, converting user’s spoken words into text for processing.

Google’s Generative AI (Gemini Pro):

The powerhouse behind our intelligent Q&A system. It generates contextual medical scenarios, formulates relevant questions, and evaluates user responses.

Demo

Upon completing this tutorial, you’ll have a fully functional voice-enabled Medical Q&A application. Users can:

  1. Select a medical condition via text or voice input.
    2. Receive a generated medical scenario and question.
    3. Respond to the question using text or voice.
    4. Get an evaluation of their answer.
    5. Listen to audio versions of scenarios and evaluations.

Attaching a video demo of our application:

https://drive.google.com/file/d/14i_w7Rk6fK_o_euUPpBLPwppCLNvJCwx/view?usp=sharing

The application demonstrates:
- Integration of multiple Google Cloud AI services
- Real-time voice-to-text and text-to-speech capabilities
- Dynamic content generation using AI
- Interactive web-based user interface

Challenges and Learnings:

The main challenge was needing more PyAudio support when coding Python in GCP. To overcome this, you:

1) Created a VM instance

2) Connected the SSH of that instance to Visual Studio Code

3) Completed the entire coding process in VS Code

This workaround allowed you to proceed with your project despite the PyAudio support issue in the GCP environment. It was described as the biggest challenge you faced during the process.

Integrating multiple Google Cloud services required careful management of authentication and permissions.

Optimizing the performance of the Generative AI to provide quick responses while maintaining accuracy was crucial.

Designing a user-friendly interface that works seamlessly with both text and voice inputs presented interesting UX challenges.

What’s Next?

To further enhance your Medical Q&A app, consider:

  1. Implementing user authentication and session management.
    2. Expanding the medical dataset and improving scenario generation.
    3. Adding a feature to track user performance over time.
    4. Integrating more advanced NLP techniques for better answer evaluation.
    5. Developing a mobile version of the application using frameworks like React Native or Flutter.

If you have any questions. Please feel free to reach out to us:

@Bhavya Reddy Seerapu

@Vamsi Kummaragunta

Call to Action

To learn more about Google Cloud services and to create an impact for the work you do, get around to these steps right away:
* Register for Code Vipassana sessions
* Explore more Google API’s on Google Cloud Console
* Sign up to become a Google Cloud Innovator

--

--