diff --git a/README.md b/README.md index 087ff2d..8329066 100644 --- a/README.md +++ b/README.md @@ -1,144 +1,96 @@ -
- License - - Documentation build - - Docker build -
+# MultiMeditron -MultiMeditron +MultiMeditron pitch, link to paper on arxiv, etc -**MultiMeditron** is a **modular multimodal large language model (LLM)** built by students and researchers from [**LiGHT Lab**](https://www.light-laboratory.org/). -It is designed to seamlessly integrate multiple modalities such as text, images, or other data types into a single unified model architecture. +## Install Dependencies +Build and run our Docker image. +All scripts assume execution from the repository root inside the container. -## 🚀 Key Features - -* **🔗 Modular Design:** - Easily plug in new modalities by following our well-documented interface. Each modality embedder (e.g., CLIP, Whisper, etc.) can be independently developed and added to the model. - -* **🧩 Modality Interleaving:** - Supports interleaved multimodal inputs (e.g., text-image-text sequences), enabling complex reasoning across different data types. - -* **⚡ Scalable Architecture:** - Designed for distributed and multi-node environments — ideal for large-scale training or inference. - -* **🧠 Flexible Model Backbone:** - Combine any modality embedder (like CLIP or SigLIP) with any LLM (like Llama, Qwen, or custom fine-tuned models). +``` +docker build -t project-name -f docker/Dockerfile . +docker run --gpus all -it \ + -v $(pwd):/workspace \ + project-name +``` +If Docker is not used, install dependencies manually: +``` +pip install -r requirements.txt +``` -## 🏗️ Model Architecture +Note: Maybe it would be nice to offer this as a python package? +Do we already have `pyproject.toml`? Then the instructions here could be `pip install -e .` -
- MultiMeditron architecture -
+## Running our Code -## ⚙️ Setup +Model checkpoints are published on huggingface. To run download a checkpoint and run generate a reply, use this `generate` helper script. +``` +generate.sh examples/sample_input.? +``` +## Reproduce the Paper -### Using Docker (recommended) +All experiments are configured using Hydra. Configuration files are stored in the `cookbooks/` directory. -On AMD64 architecture: +- The **main recipe**, `cookbooks/main.yaml` represents the final model configuration reported in the paper. +- **Ablation recipes** live in `cookbooks/ablations/`. +- Evaluation-specific settings live under `cookbooks/eval/`. -```bash -docker pull michelducartier24/multimeditron-git:latest-amd64 +Before you can run through the training, you need to download our dataset via ``` - -On ARM64 architecture: - -```bash -docker pull michelducartier24/multimeditron-git:latest-arm64 +download.sh ``` -### Using uv - -**Prerequisite:** To install the right version of torch with your CUDA driver, please refer to [this documentation](https://pytorch.org/get-started/locally/) - - Install [uv](https://docs.astral.sh/uv/): - +This main training writes checkpoints to `checkpoints/main/`. You can use our `main.yaml` configuration to reproduce the training run of the MultiMeditron paper, or provide your own configuration. ```bash -curl -LsSf https://astral.sh/uv/install.sh | sh +bash scripts/train.sh cookbooks/main.yaml ``` -Clone the repository: - +Ablations are designed to execute a well defined set of ablations and write write checkpoints to a separate subdirectory `checkpoints/ablations//` ```bash -git clone https://github.com/EPFLiGHT/MultiMeditron.git -cd MultiMeditron +bash scripts/ablate.sh ``` -Install dependencies: - +Evaluation is handled by a single entry point: ```bash -uv pip install -e ".[flash-attn]" +bash scripts/eval.sh ``` +By default, this script: + - Finds the latest checkpoint for the main model and all ablations + - Runs all available benchmarks + - Saves raw evaluation outputs to `data/eval/` -## 💬 Inference Example - -Here’s an example showing how to use **MultiMeditron** with **Llama 3.1 (8B)** and a single image input. - -```python -import torch -from transformers import AutoTokenizer -import os -from multimeditron.dataset.preprocessor import modality_preprocessor -from multimeditron.dataset.loader import FileSystemImageLoader -from multimeditron.model.model import MultiModalModelForCausalLM -from multimeditron.dataset.preprocessor.modality_preprocessor import ModalityRetriever, SamplePreprocessor -from multimeditron.model.data_loader import DataCollatorForMultimodal - -ATTACHMENT_TOKEN = "<|reserved_special_token_0|>" - -# Load tokenizer -tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct", dtype=torch.bfloat16) -tokenizer.pad_token = tokenizer.eos_token -tokenizer.add_special_tokens({'additional_special_tokens': [ATTACHMENT_TOKEN]}) -attachment_token_idx = tokenizer.convert_tokens_to_ids(ATTACHMENT_TOKEN) - -# Load model -model = MultiModalModelForCausalLM.from_pretrained("path/to/trained/model") -model.to("cuda") - -# Define input -modalities = [{"type": "image", "value": "path/to/image"}] -conversations = [{ - "role": "user", - "content": f"{ATTACHMENT_TOKEN} Describe the image." -}] -sample = {"conversations": conversations, "modalities": modalities} +Evaluation does not produce plots or tables directly. It only generates structured data. +Analysis and visualization are intentionally separated from evaluation. -loader = FileSystemImageLoader(base_path=os.getcwd()) - -collator = DataCollatorForMultimodal( - tokenizer=tokenizer, - tokenizer_type="llama", - modality_processors=model.processors(), - modality_loaders={"image": loader}, - attachment_token_idx=attachment_token_idx, - add_generation_prompt=True, -) - -batch = collator([sample]) - -with torch.no_grad(): - outputs = model.generate(batch=batch, temperature=0.1) - -print(tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=True)[0]) +Aggregate and post-process results: +```bash +python scripts/analyze.py ``` +Generate plots and figures: +```bash +python scripts/plot.py +``` -## 🧩 Adding a New Modality - -MultiMeditron’s architecture is fully **extensible**. -To add a new modality, see the [developer documentation](https://epflight.github.io/MultiMeditron/guides/add_modality.html) for a step-by-step guide. +All plots should be generated solely from files in `data/eval`, ensuring full reproducibility. +## The Data Pipeline -## ⚖️ License +You can download preprocessed data via `download.sh`. This will generate ready-to-use data formatted to be compatible with our training code. -This project is licensed under the Apache 2.0 License, see the [LICENSE 🎓](LICENSE) file for details. +Alternatively you can run `download_raw.sh` and `python preprocess.py` to download and preprocess the data yourself. +The preprocessing uses thirdparty LLM tools: + - it requires an OpenAI API key in an env variable + - it is not fully deterministic. The data produced by `download.sh` and `download_raw.sh && python preprocess.py` may differ significantly due to changes in thirdparty APIs. +## Extend the Paper -## 📖 Cite us +MultiMeditron mini pitch v2. It's designed to be reproducible, extensible, modular. +You can for example write your own modality projectors. +Point to interesting code files. +Point to API docs. -TODO +etc.