CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

This repository contains the code and resources for the paper CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era by Yanlin Feng, Simone Papicchio, and Sajjadur Rahman.

You might find this repository useful if you are interested in:

Building Text-to-Cypher models, including:
- Running baselines on CypherBench [code] [doc]
- Fetching structured schema from a Neo4j database [code]
- Metrics for measuring Text-to-Cypher performance [code]
Creating domain knowledge graphs from Wikidata [code] [doc]
Generating Text-to-Cypher tasks for your own Neo4j graphs [code] [doc]

🔥 Updates

[Jun 18, 2025] We have released the Wikidata-to-Property-Graph conversion engine! Check out the instructions below!
[Jun 17, 2025] We have released the text-to-cypher task generation pipeline! See the instructions below!
[May 15, 2025] Our paper has been accepted to ACL 2025 main conference! See you in Vienna!
[Feb 20, 2025] We updated the graph deployment configuration to reduce RAM usage.
[Feb 19, 2025] We have released the evaluation scripts and the EX and PSJS implementations!
[Feb 14, 2025] We have released the text-to-cypher baseline code! See the instructions below on how to run gpt-4o-mini on CypherBench.
[Feb 13, 2025] The 11 property graphs are now available on 🤗HuggingFace! We also make it super easy to deploy them (see the instructions below).
[Dec 27, 2024] We have deployed a demo NBA graph(password: cypherbench) at Neo4j AuraDB! Check it out! You can run Cypher queries like MATCH (n:Player {name: 'LeBron James'})-[r]-(m) RETURN *.
[Dec 27, 2024] The training and test sets are now available on 🤗HuggingFace!

🚀 Quickstart

1. Installation

conda create -n cypherbench python=3.11
conda activate cypherbench

git clone https://github.com/megagonlabs/cypherbench.git
cd cypherbench
pip install -e .

2. Download the dataset

To download the dataset (including both the graphs and text-to-cypher tasks), simply clone the HuggingFace dataset repository:

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

# Clone the dataset repo from HuggingFace and save it as the `benchmark` directory
git clone https://huggingface.co/datasets/megagonlabs/cypherbench benchmark

3. Deploy the graphs using Docker

⚠️ Deploying the graphs requires significant memory. We recommend using a machine with at least 64GB of RAM when deploying the 7 test graphs and 128GB when deploying all 11 graphs. Additionally, ensure that Docker is installed (Docker installation instructions) before proceeding.

Now, you can deploy the 7 test graphs with a single Docker Compose command using our custom Neo4j Docker image and our Docker Compose configuration:

cd docker/
bash start_neo4j_test.sh  #  This script first checks if required files exist, then runs the docker-compose command
cd .. 

# check if the graphs are fully loaded (it typically takes at least 10 minutes).
python scripts/print_db_status.py

To stop the Neo4j databases, run bash stop_neo4j_test.sh.

4. Run `gpt-4o-mini` on CypherBench

Running gpt-4o-mini on the CypherBench test set costs around $0.3. First, make sure you have set the OPENAI_API_KEY environment variable to use the OpenAI API.

python -m cypherbench.baseline.zero_shot_nl2cypher --llm gpt-4o-mini --result_dir output/gpt-4o-mini/

There are two ways to fetch the graph schemas when running text-to-cypher:

(default) --load_schema_from json loads the schema from the local JSON files stored in the benchmark/graphs/schemas directory. When using this option, the Neo4j databases are not used during text-to-cypher.
--load_schema_from neo4j fetches the schema from the Neo4j database by executing special Cypher queries*. This option requires the Neo4j databases to be fully loaded.

*We don't use apoc.meta.data() by default, see Appendix A.4 in the paper for details.

5. Evaluate metrics

python -m cypherbench.evaluate --result_dir output/gpt-4o-mini/  --num_threads 8  # Adjust the number of threads as needed

Metric implementation:

Execution Accuracy (EX): execution_accuracy.py
Provenance Subgraph Jaccard Similarity (PSJS): provenance_subgraph_jaccard_similarity.py
Executable Percentage: executable.py

Reference performance for gpt-4o-mini:

{
  "overall": {
    "execution_accuracy": 0.3143,
    "psjs": 0.4591,
    "executable": 0.8739
  },
  "by_graph": {
    "flight_accident": 0.4603,
    "fictional_character": 0.3273,
...

🌐 Wikidata-to-Property-Graph Conversion Engine

We open-source our Wikidata-to-Property-Graph conversion engine in the cypherbench.wd2neo4j package. You can create a domain knowledge graph from Wikidata by just defining the graph schema!

Quick Tutorial

The first step is to define the graph schema in a JSON file. The schema should define the entity and relation types, along with their corresponding Wikidata QID/PIDs. We provide a sample mini NBA schema with a single relationship partOfDivision between Team and Division. For complete details on the schema format, see the WDNeo4jSchema data structure.

Next, you can run the conversion engine by:

python -m cypherbench.wd2neo4j --neo4j_schema wd2neo4j_schemas/nba_mini.json --output_dir output/nba_mini/
# output graph at output/nba_mini/nba_mini-graph.json

The engine will automatically issue SPARQL queries to Wikidata and assemble the retrieved data into a property graph.

If your graphs are too large (e.g. > 100k entities), you might get timeout errors because the official Wikidata SPARQL endpoint has a time limit of 60 seconds per query. In this case, you can deploy your own Wikidata SPARQL endpoint (documentation coming soon!) and pass in the url using the --sparql_url argument.

At this point, the property graph is saved in the WikidataKG format which contains Wikidata-dependent fields like wikidata_qid. We recommend converting it into the SimpleKG format, the generic property graph format used by the CypherBench graphs:

python -m cypherbench.wd2neo4j.wd2simplekg --input_path output/nba_mini/nba_mini-graph.json --output_path output/nba_mini/nba_mini-graph_simplekg.json

The property graph can now be deployed using our custom Neo4j Docker image:

docker run -d \
  --name cypherbench-nba-mini \
  -p 15095:7687 \
  -p 7474:7474 \
  -v $(pwd)/output/nba_mini/nba_mini-graph_simplekg.json:/init/graph.json \
  -e NEO4J_AUTH="neo4j/cypherbench" \
  -e NEO4J_PLUGINS='["apoc", "graph-data-science"]' \
  megagonlabs/neo4j-with-loader:2.4

🏭 Text-to-Cypher Task Generation Pipeline

We also open-source the text-to-cypher task generation pipeline in the cypherbench.taskgen package. You can generate as many text-to-cypher tasks as you want for your own Neo4j graphs! Simply pass in the Neo4j graph endpoint (host + port) to the task generator. You can also create your own templates.

Reproducing CypherBench

For the CypherBench graphs, the task generation pipeline requires a set of sampled subgraphs of the original full-scale graphs for efficient template instantiation. The graphs are already uploaded to the HuggingFace repo (if you have previously cloned the repo, run a git pull under benchmark/, otherwise, follow the instructions in the Download the dataset section) and can be deployed using the following commands:

cd docker/
bash start_neo4j_sampled.sh  
cd .. 

# check if the graphs are fully loaded
python scripts/print_db_sampled_status.py

After the graphs have been fully loaded, you can run the task generation pipeline by:

bash scripts/run_benchmark_generation.sh output/taskgen/

The task generation pipeline takes the following files as input:

nl2cypher_generator_config.json - The file that defines the question and Cypher templates (MATCH/RETURN patterns). You can create your own templates by following the syntax in the file.
neo4j_info.json - The file that specifies the host and port of the Neo4j databases, including both the full-scale graphs and sampled graphs.
graph_info.json - The file that specifies human-annotated characteristics (e.g. cardinality, participation, etc.) of the relations, which are used to detect semantically unrealistic questions (see Section 4.4.3 in the paper).

Under the hood, the pipeline generates the tasks by the following steps:

generate_benchmark.py - Intantiate the templates on the sampled Neo4j graphs.
filter_long_running.py - Filter out the tasks that take more than 30 seconds to execute on the full-scale graphs.
rewrite_question_llm.py - Rewrite the questions into more natural language using LLMs.
verify_question_llm.py - Verify the rewritten questions using LLMs.

📅 Future Release Plan

Text-to-Cypher tasks
11 property graphs and graph deployment docker
Text-to-Cypher baseline code
EX/PSJS implementation and evaluation scripts
Text-to-Cypher task generation pipeline
Wikidata RDF-to-property-graph engine
Additional resources

Please open a Github issue if you have any questions, find any bugs, or need anything else that is not yet open-sourced!

📚 Citation

@article{feng2024cypherbench,
  title={CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era},
  author={Feng, Yanlin and Papicchio, Simone and Rahman, Sajjadur},
  journal={arXiv preprint arXiv:2412.18702},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
assets		assets
cypherbench		cypherbench
docker		docker
scripts		scripts
wd2neo4j_schemas		wd2neo4j_schemas
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
graph_info.json		graph_info.json
neo4j_info.json		neo4j_info.json
nl2cypher_generator_config.json		nl2cypher_generator_config.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

🔥 Updates

🚀 Quickstart

1. Installation

2. Download the dataset

3. Deploy the graphs using Docker

4. Run `gpt-4o-mini` on CypherBench

5. Evaluate metrics

🌐 Wikidata-to-Property-Graph Conversion Engine

Quick Tutorial

🏭 Text-to-Cypher Task Generation Pipeline

Reproducing CypherBench

📅 Future Release Plan

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

megagonlabs/cypherbench

Folders and files

Latest commit

History

Repository files navigation

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

🔥 Updates

🚀 Quickstart

1. Installation

2. Download the dataset

3. Deploy the graphs using Docker

4. Run gpt-4o-mini on CypherBench

5. Evaluate metrics

🌐 Wikidata-to-Property-Graph Conversion Engine

Quick Tutorial

🏭 Text-to-Cypher Task Generation Pipeline

Reproducing CypherBench

📅 Future Release Plan

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

4. Run `gpt-4o-mini` on CypherBench

Packages