-
Notifications
You must be signed in to change notification settings - Fork 5
Revise README with installation and usage details #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lhk
wants to merge
1
commit into
master
Choose a base branch
from
lhk-README-patch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,144 +1,96 @@ | ||||||
| <div align="center"> | ||||||
| <img alt="License" src="https://img.shields.io/badge/license-Apache 2.0-blue?style=for-the-badge"> | ||||||
| <a href="https://epflight.github.io/MultiMeditron/index.html"> | ||||||
| <img alt="Documentation build" src="https://img.shields.io/github/actions/workflow/status/EPFLiGHT/MultiMeditron/docs.yml?style=for-the-badge&label=Documentation"> | ||||||
| </a> | ||||||
| <img alt="Docker build" src="https://img.shields.io/github/actions/workflow/status/EPFLiGHT/MultiMeditron/docker.yml?style=for-the-badge&label=Docker"> | ||||||
| </div> | ||||||
| # MultiMeditron | ||||||
|
|
||||||
| <img src="assets/multimeditron.png" alt="MultiMeditron"> | ||||||
| MultiMeditron pitch, link to paper on arxiv, etc | ||||||
|
|
||||||
| **MultiMeditron** is a **modular multimodal large language model (LLM)** built by students and researchers from [**LiGHT Lab**](https://www.light-laboratory.org/). | ||||||
| It is designed to seamlessly integrate multiple modalities such as text, images, or other data types into a single unified model architecture. | ||||||
| ## Install Dependencies | ||||||
|
|
||||||
| Build and run our Docker image. | ||||||
| All scripts assume execution from the repository root inside the container. | ||||||
|
|
||||||
| ## 🚀 Key Features | ||||||
|
|
||||||
| * **🔗 Modular Design:** | ||||||
| Easily plug in new modalities by following our well-documented interface. Each modality embedder (e.g., CLIP, Whisper, etc.) can be independently developed and added to the model. | ||||||
|
|
||||||
| * **🧩 Modality Interleaving:** | ||||||
| Supports interleaved multimodal inputs (e.g., text-image-text sequences), enabling complex reasoning across different data types. | ||||||
|
|
||||||
| * **⚡ Scalable Architecture:** | ||||||
| Designed for distributed and multi-node environments — ideal for large-scale training or inference. | ||||||
|
|
||||||
| * **🧠 Flexible Model Backbone:** | ||||||
| Combine any modality embedder (like CLIP or SigLIP) with any LLM (like Llama, Qwen, or custom fine-tuned models). | ||||||
| ``` | ||||||
| docker build -t project-name -f docker/Dockerfile . | ||||||
| docker run --gpus all -it \ | ||||||
| -v $(pwd):/workspace \ | ||||||
| project-name | ||||||
| ``` | ||||||
|
|
||||||
| If Docker is not used, install dependencies manually: | ||||||
| ``` | ||||||
| pip install -r requirements.txt | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ``` | ||||||
|
|
||||||
| ## 🏗️ Model Architecture | ||||||
| Note: Maybe it would be nice to offer this as a python package? | ||||||
| Do we already have `pyproject.toml`? Then the instructions here could be `pip install -e .` | ||||||
|
|
||||||
| <div align="center"> | ||||||
| <img src="./assets/architecture.png" alt="MultiMeditron architecture"> | ||||||
| </div> | ||||||
| ## Running our Code | ||||||
|
|
||||||
| ## ⚙️ Setup | ||||||
| Model checkpoints are published on huggingface. To run download a checkpoint and run generate a reply, use this `generate` helper script. | ||||||
| ``` | ||||||
| generate.sh examples/sample_input.? | ||||||
| ``` | ||||||
|
|
||||||
| ## Reproduce the Paper | ||||||
|
|
||||||
| ### Using Docker (recommended) | ||||||
| All experiments are configured using Hydra. Configuration files are stored in the `cookbooks/` directory. | ||||||
|
|
||||||
| On AMD64 architecture: | ||||||
| - The **main recipe**, `cookbooks/main.yaml` represents the final model configuration reported in the paper. | ||||||
| - **Ablation recipes** live in `cookbooks/ablations/`. | ||||||
| - Evaluation-specific settings live under `cookbooks/eval/`. | ||||||
|
|
||||||
| ```bash | ||||||
| docker pull michelducartier24/multimeditron-git:latest-amd64 | ||||||
| Before you can run through the training, you need to download our dataset via | ||||||
| ``` | ||||||
|
|
||||||
| On ARM64 architecture: | ||||||
|
|
||||||
| ```bash | ||||||
| docker pull michelducartier24/multimeditron-git:latest-arm64 | ||||||
| download.sh | ||||||
| ``` | ||||||
|
|
||||||
| ### Using uv | ||||||
|
|
||||||
| **Prerequisite:** To install the right version of torch with your CUDA driver, please refer to [this documentation](https://pytorch.org/get-started/locally/) | ||||||
|
|
||||||
| Install [uv](https://docs.astral.sh/uv/): | ||||||
|
|
||||||
| This main training writes checkpoints to `checkpoints/main/`. You can use our `main.yaml` configuration to reproduce the training run of the MultiMeditron paper, or provide your own configuration. | ||||||
| ```bash | ||||||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||||||
| bash scripts/train.sh cookbooks/main.yaml | ||||||
| ``` | ||||||
|
|
||||||
| Clone the repository: | ||||||
|
|
||||||
| Ablations are designed to execute a well defined set of ablations and write write checkpoints to a separate subdirectory `checkpoints/ablations/<ablation_name>/` | ||||||
| ```bash | ||||||
| git clone https://github.com/EPFLiGHT/MultiMeditron.git | ||||||
| cd MultiMeditron | ||||||
| bash scripts/ablate.sh | ||||||
| ``` | ||||||
|
|
||||||
| Install dependencies: | ||||||
|
|
||||||
| Evaluation is handled by a single entry point: | ||||||
| ```bash | ||||||
| uv pip install -e ".[flash-attn]" | ||||||
| bash scripts/eval.sh | ||||||
| ``` | ||||||
|
|
||||||
| By default, this script: | ||||||
| - Finds the latest checkpoint for the main model and all ablations | ||||||
| - Runs all available benchmarks | ||||||
| - Saves raw evaluation outputs to `data/eval/` | ||||||
|
|
||||||
| ## 💬 Inference Example | ||||||
|
|
||||||
| Here’s an example showing how to use **MultiMeditron** with **Llama 3.1 (8B)** and a single image input. | ||||||
|
|
||||||
| ```python | ||||||
| import torch | ||||||
| from transformers import AutoTokenizer | ||||||
| import os | ||||||
| from multimeditron.dataset.preprocessor import modality_preprocessor | ||||||
| from multimeditron.dataset.loader import FileSystemImageLoader | ||||||
| from multimeditron.model.model import MultiModalModelForCausalLM | ||||||
| from multimeditron.dataset.preprocessor.modality_preprocessor import ModalityRetriever, SamplePreprocessor | ||||||
| from multimeditron.model.data_loader import DataCollatorForMultimodal | ||||||
|
|
||||||
| ATTACHMENT_TOKEN = "<|reserved_special_token_0|>" | ||||||
|
|
||||||
| # Load tokenizer | ||||||
| tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct", dtype=torch.bfloat16) | ||||||
| tokenizer.pad_token = tokenizer.eos_token | ||||||
| tokenizer.add_special_tokens({'additional_special_tokens': [ATTACHMENT_TOKEN]}) | ||||||
| attachment_token_idx = tokenizer.convert_tokens_to_ids(ATTACHMENT_TOKEN) | ||||||
|
|
||||||
| # Load model | ||||||
| model = MultiModalModelForCausalLM.from_pretrained("path/to/trained/model") | ||||||
| model.to("cuda") | ||||||
|
|
||||||
| # Define input | ||||||
| modalities = [{"type": "image", "value": "path/to/image"}] | ||||||
| conversations = [{ | ||||||
| "role": "user", | ||||||
| "content": f"{ATTACHMENT_TOKEN} Describe the image." | ||||||
| }] | ||||||
| sample = {"conversations": conversations, "modalities": modalities} | ||||||
| Evaluation does not produce plots or tables directly. It only generates structured data. | ||||||
| Analysis and visualization are intentionally separated from evaluation. | ||||||
|
|
||||||
| loader = FileSystemImageLoader(base_path=os.getcwd()) | ||||||
|
|
||||||
| collator = DataCollatorForMultimodal( | ||||||
| tokenizer=tokenizer, | ||||||
| tokenizer_type="llama", | ||||||
| modality_processors=model.processors(), | ||||||
| modality_loaders={"image": loader}, | ||||||
| attachment_token_idx=attachment_token_idx, | ||||||
| add_generation_prompt=True, | ||||||
| ) | ||||||
|
|
||||||
| batch = collator([sample]) | ||||||
|
|
||||||
| with torch.no_grad(): | ||||||
| outputs = model.generate(batch=batch, temperature=0.1) | ||||||
|
|
||||||
| print(tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=True)[0]) | ||||||
| Aggregate and post-process results: | ||||||
| ```bash | ||||||
| python scripts/analyze.py | ||||||
| ``` | ||||||
|
|
||||||
| Generate plots and figures: | ||||||
| ```bash | ||||||
| python scripts/plot.py | ||||||
| ``` | ||||||
|
Comment on lines
+68
to
+76
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't really have a lot of plots and figures at the moment (just tables), but maybe for the future, it would be great |
||||||
|
|
||||||
| ## 🧩 Adding a New Modality | ||||||
|
|
||||||
| MultiMeditron’s architecture is fully **extensible**. | ||||||
| To add a new modality, see the [developer documentation](https://epflight.github.io/MultiMeditron/guides/add_modality.html) for a step-by-step guide. | ||||||
| All plots should be generated solely from files in `data/eval`, ensuring full reproducibility. | ||||||
|
|
||||||
| ## The Data Pipeline | ||||||
|
|
||||||
| ## ⚖️ License | ||||||
| You can download preprocessed data via `download.sh`. This will generate ready-to-use data formatted to be compatible with our training code. | ||||||
|
|
||||||
| This project is licensed under the Apache 2.0 License, see the [LICENSE 🎓](LICENSE) file for details. | ||||||
| Alternatively you can run `download_raw.sh` and `python preprocess.py` to download and preprocess the data yourself. | ||||||
| The preprocessing uses thirdparty LLM tools: | ||||||
| - it requires an OpenAI API key in an env variable | ||||||
| - it is not fully deterministic. The data produced by `download.sh` and `download_raw.sh && python preprocess.py` may differ significantly due to changes in thirdparty APIs. | ||||||
|
|
||||||
| ## Extend the Paper | ||||||
|
|
||||||
| ## 📖 Cite us | ||||||
| MultiMeditron mini pitch v2. It's designed to be reproducible, extensible, modular. | ||||||
| You can for example write your own modality projectors. | ||||||
| Point to interesting code files. | ||||||
| Point to API docs. | ||||||
|
|
||||||
| TODO | ||||||
| etc. | ||||||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the user to build their own Docker image? We have a CI that builds a new one on every push on the master (for both ARM64 and AMD64 architecture), if a user wants to revert back to an old version of the master, they can do so by updating the tag. It would be better for them to just: