This project is a part of a two-week internship focused on implementing RAG techniques using relational data.
The process involves extracting data from a PostgreSQL database, then converting it to a Neo4j graph, highlighting the intuition behind the data. And using said graph to optimize LLM prompts for better, more coherent results.
Tools: psycopg2, neo4j Python driver.
Process: Data was mapped from a PostgreSQL database to a Neo4j graph, the process involved mapping records to nodes, and foreign key constraints to edges (relationships).
Tools: Hugging Face Hub Inference API.
Process: Summarizing a document by providing the nodes and edges making up the knowledge graph representing it.
Tools: Hugging Face API, NumPy, Pandas.
Process: Generating the embeddings for a given user input, and comparing it to the k most semantically similar inputs in store for better relevancy.
Tools: Hugging Face API
Process: Using the graph database schema, and the examples most similar to a user input, an optimized prompt is given to an LLM to get a corresponding Cypher query to the question.
Tools: Qdrant, Hugging Face API
Process: Chunking the input text, generating an embedding for each chunk as well as the entire document, and using the most similar chunks to the document in order to generate a summary for said document.
mkdir data
mkdir conf
mkdir import
mkdir logs
mkdir plugins
sudo chmod 777 data conf import logs plugins
docker compose upAccess web interface at http://localhost:7474/browser/
IMPORTANT The authentication is disabled
Qdrant will be available at :
- Web UI (http://localhost:6333/dashboard)
- REST API (http://localhost:6333)
- GRPC API (http://localhost:6334)
cd /import
createdb testdb -U postgres -h db # password `postgres`
pg_restore -e -v -O -x -d testdb --no-owner postgres.dmp -U postgrescd /src
python load_data.pycd /src
python rag_neo4j.pycd /src
python summarize_graph.py- Replace/modify the examples.json file to include input-query pairs relevant to the data
cd /data/embeddings- Run the embeddings generator
cd /src
python gen_embeddings.pyThe embeddings are now available in CSV format in the /data/embeddings/embeddings.csv file.
cd /src
python get_most_similar.pycd /src
python summarize_document.py