Multimodal-Medical-Chatbot

An AI-powered medical assistant that accepts patient voice and medical image inputs, processes them through a multimodal RAG pipeline, and returns realistic doctor-like spoken responses. Try it here

Overview

This project demonstrates a full-stack, multimodal chatbot designed for simulated medical consultations. It leverages state-of-the-art models for speech-to-text (STT), image-based diagnostic reasoning, and text-to-speech (TTS) to create an interactive, human-like AI doctor assistant.

Features

Voice Input: Transcribe patient speech into text using Groq's Whisper API.
Image Diagnosis: Analyze medical images (e.g., X-rays, dermatology photos) with LLaMA-4 Scout via Groq for diagnostic insights.
Speech Output: Convert AI-generated responses into natural-sounding doctor voices with ElevenLabs.
Web Interface: User-friendly Gradio UI for recording audio and uploading images to receive text and audio feedback.
Prompt Engineering: Tailored system prompts to ensure concise, human-like, and clinically appropriate responses.

Tech Stack

Backend: Python 3.10
Frontend: Gradio
STT: Groq Whisper (whisper-large-v3)
Image Analysis: meta-llama/llama-4-scout-17b-16e-instruct on Groq
TTS: ElevenLabs API (eleven_turbo_v2)
Containerization: Docker (optional for GPU-based Space)
Deployment: Hugging Face Spaces

Installation

Clone the repository

    git clone https://github.com/yourusername/multimodal-medical-chatbot.git
    cd multimodal-medical-chatbot

Create and activate a virtual environment

python -m venv venv
source venv/bin/activate    # Linux / macOS
venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables: Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

Usage

Start the Gradio app locally:

python app.py

Open your browser at http://localhost:7860, then:

Upload or record patient audio.
Upload a medical image.
View the transcribed text and AI doctor’s diagnosis.
Listen to the spoken response.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
image		image
src		src
.gitignore		.gitignore
README.md		README.md
output.mp3		output.mp3
output.wav		output.wav
output_path		output_path
patient_voice.mp3		patient_voice.mp3
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal-Medical-Chatbot

Overview

Features

Tech Stack

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Yanmi01/Multimodal-Medical-Chatbot

Folders and files

Latest commit

History

Repository files navigation

Multimodal-Medical-Chatbot

Overview

Features

Tech Stack

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages