Skip to content

Yanmi01/Multimodal-Medical-Chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal-Medical-Chatbot

An AI-powered medical assistant that accepts patient voice and medical image inputs, processes them through a multimodal RAG pipeline, and returns realistic doctor-like spoken responses. Try it here

Overview

This project demonstrates a full-stack, multimodal chatbot designed for simulated medical consultations. It leverages state-of-the-art models for speech-to-text (STT), image-based diagnostic reasoning, and text-to-speech (TTS) to create an interactive, human-like AI doctor assistant.

Features

  • Voice Input: Transcribe patient speech into text using Groq's Whisper API.

  • Image Diagnosis: Analyze medical images (e.g., X-rays, dermatology photos) with LLaMA-4 Scout via Groq for diagnostic insights.

  • Speech Output: Convert AI-generated responses into natural-sounding doctor voices with ElevenLabs.

  • Web Interface: User-friendly Gradio UI for recording audio and uploading images to receive text and audio feedback.

  • Prompt Engineering: Tailored system prompts to ensure concise, human-like, and clinically appropriate responses.

Tech Stack

  • Backend: Python 3.10

  • Frontend: Gradio

  • STT: Groq Whisper (whisper-large-v3)

  • Image Analysis: meta-llama/llama-4-scout-17b-16e-instruct on Groq

  • TTS: ElevenLabs API (eleven_turbo_v2)

  • Containerization: Docker (optional for GPU-based Space)

  • Deployment: Hugging Face Spaces

Installation

  1. Clone the repository
        git clone https://github.com/yourusername/multimodal-medical-chatbot.git
        cd multimodal-medical-chatbot
  2. Create and activate a virtual environment
    python -m venv venv
    source venv/bin/activate    # Linux / macOS
    venv\Scripts\activate
  3. Install dependencies
    pip install -r requirements.txt
  4. Set up environment variables: Create a .env file in the project root:
    GROQ_API_KEY=your_groq_api_key
    ELEVENLABS_API_KEY=your_elevenlabs_api_key

Usage

Start the Gradio app locally:

python app.py

Open your browser at http://localhost:7860, then:

  1. Upload or record patient audio.

  2. Upload a medical image.

  3. View the transcribed text and AI doctor’s diagnosis.

  4. Listen to the spoken response.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages