RAG project: from Relational Tables to Knowledge Graphs and LLM Query Optimization

Project Overview

This project is a part of a two-week internship focused on implementing RAG techniques using relational data.

The process involves extracting data from a PostgreSQL database, then converting it to a Neo4j graph, highlighting the intuition behind the data. And using said graph to optimize LLM prompts for better, more coherent results.

Workflow Summary

1. ETL (Extract, Transform, Load)

Tools: psycopg2, neo4j Python driver.

Process: Data was mapped from a PostgreSQL database to a Neo4j graph, the process involved mapping records to nodes, and foreign key constraints to edges (relationships).

2. Summarization

Tools: Hugging Face Hub Inference API.

Process: Summarizing a document by providing the nodes and edges making up the knowledge graph representing it.

3. Embeddings

Tools: Hugging Face API, NumPy, Pandas.

Process: Generating the embeddings for a given user input, and comparing it to the k most semantically similar inputs in store for better relevancy.

4. LLM Prompt Optimization

Tools: Hugging Face API

Process: Using the graph database schema, and the examples most similar to a user input, an optimized prompt is given to an LLM to get a corresponding Cypher query to the question.

5. Use of Vector Database for Document Summarization

Tools: Qdrant, Hugging Face API

Process: Chunking the input text, generating an embedding for each chunk as well as the entire document, and using the most similar chunks to the document in order to generate a summary for said document.

Usage

Start Neo4J Database

mkdir data
mkdir conf 
mkdir import
mkdir logs
mkdir plugins
sudo chmod 777 data conf import logs plugins
docker compose up

Access web interface at http://localhost:7474/browser/

IMPORTANT The authentication is disabled

Qdrant will be available at :

Postgres import

cd /import 
createdb testdb -U postgres -h db # password `postgres` 
pg_restore -e -v -O -x -d testdb --no-owner postgres.dmp -U postgres

Load data into Neo4j graph

cd /src 
python load_data.py

Answer user input relevant to the graph

cd /src
python rag_neo4j.py

Summarize graph

cd /src
python summarize_graph.py

Generate embeddings

Replace/modify the examples.json file to include input-query pairs relevant to the data

cd /data/embeddings

Run the embeddings generator

cd /src
python gen_embeddings.py

The embeddings are now available in CSV format in the /data/embeddings/embeddings.csv file.

Retrieve the most similar examples from examples.json

cd /src
python get_most_similar.py

Summarize an input text document

cd /src
python summarize_document.py

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.devcontainer		.devcontainer
conf		conf
data		data
docs		docs
import		import
logs		logs
plugins		plugins
src		src
.gitignore		.gitignore
README.md		README.md
postgres.sql		postgres.sql
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG project: from Relational Tables to Knowledge Graphs and LLM Query Optimization

Project Overview

Workflow Summary

1. ETL (Extract, Transform, Load)

2. Summarization

3. Embeddings

4. LLM Prompt Optimization

5. Use of Vector Database for Document Summarization

Usage

Start Neo4J Database

Postgres import

Load data into Neo4j graph

Answer user input relevant to the graph

Summarize graph

Generate embeddings

Retrieve the most similar examples from examples.json

Summarize an input text document

Useful links

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

transformatek/rag-tutorials

Folders and files

Latest commit

History

Repository files navigation

RAG project: from Relational Tables to Knowledge Graphs and LLM Query Optimization

Project Overview

Workflow Summary

1. ETL (Extract, Transform, Load)

2. Summarization

3. Embeddings

4. LLM Prompt Optimization

5. Use of Vector Database for Document Summarization

Usage

Start Neo4J Database

Postgres import

Load data into Neo4j graph

Answer user input relevant to the graph

Summarize graph

Generate embeddings

Retrieve the most similar examples from examples.json

Summarize an input text document

Useful links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages