MultiModal RAG with Graph Understanding

This project implements a Multimodal Retrieval-Augmented Generation (RAG) system that extracts and processes text, tables, and images (including graphs and charts) from PDF documents. It generates captions for images using OpenAI’s GPT-4o, builds a vector index over all extracted content, and answers user queries by retrieving relevant multimodal information.

Features

Extracts text content from PDFs using LangChain’s PyPDFLoader.
Extracts tables from PDFs using pdfplumber and converts them into textual documents.
Extracts images from PDFs using PyMuPDF (fitz), saves them locally, and generates captions describing charts, plots, or images with OpenAI GPT-4o.
Combines text, tables, and image captions into a unified document corpus.
Creates a FAISS vector index with OpenAI embeddings for efficient retrieval.
Uses a retrieval-augmented generation chain with GPT-4o to answer queries based on multimodal PDF content.
Specifically designed to understand and explain graphical data and visual elements within PDFs.

Installation

pip install langchain openai pdfplumber pymupdf pillow faiss-cpu python-dotenv

Environment File

Create a .env file in the project root with your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

Usage

Put your PDF file in the project directory and update the pdf_file variable in the script.
Run the script:

python main.py

The script will:
- Extract text, tables, and images with captions from the PDF.
- Index all extracted documents.
- Answer queries such as "What do the graphs show in this PDF?".
- Print the generated answer.

Contact

For any questions or clarifications, please contact Raza Mehar at [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.ipynb		main.ipynb
pdf.pdf		pdf.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MultiModal RAG with Graph Understanding

Features

Installation

Environment File

Usage

Contact

About

Uh oh!

Releases

Packages

Languages

License

razamehar/MultiModal-RAG-with-Graph-Understanding

Folders and files

Latest commit

History

Repository files navigation

MultiModal RAG with Graph Understanding

Features

Installation

Environment File

Usage

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages