GitHub - abehou/GapHalu: This is the repo for the paper "Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations"

Official Repo for the paper: Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations

News

This paper and CLERC will be presented at NLLP 2024 (EMNLP) at Miami, Florida!

Pipeline to generate the detection datasets and run the detection experiments

Generate prompts for the GPT4o-based detector:

python prompt_loader.py [train, test, small, large]

train and test modes each generates 10 examples of the four models. small generates 100 per model. large runs over the entire dataset.

Run the GPT4o detector via OpenAI API

python gpt_detect.py [train, test, small, large]

This runs detector over the instances and outputs a detection dataset (in HuggingFace format).

Note that you need to supply your own OPENAI_API_KEY in .bashrc

export OPENAI_API_KEY="YOUR KEY"

If you are replicating train/test results from the original paper, do:

python gpt_detect.py train
python gpt_detect.py test

Parse human annotations and add the labels to the detection dataset

python parse_human_annotations.py --detect_dataset DATASET_PATH

This outputs a DETECTION_HUMAN dataset with parsed human labels

Evaluate detection accuracy

python evaluate_detection.py DETECTION_HUMAN_DATASET_PATH

where DETECTION_HUMAN_DATASET_PATH is from step 3.

If you want to evaluate new generations, do:

python score_analysis.py DATASET_DIR

where DATASET_DIR contains a list of huggingface datasets to analyze the GapScore and GapHalu for.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
system_prompts		system_prompts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
detectors.py		detectors.py
evaluate_detection.py		evaluate_detection.py
gpt_detect.py		gpt_detect.py
instructions.md		instructions.md
loader.py		loader.py
parse_human_annotations.py		parse_human_annotations.py
prompt_loader.py		prompt_loader.py
requirements.txt		requirements.txt
score_analysis.py		score_analysis.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Repo for the paper: Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations

News

Pipeline to generate the detection datasets and run the detection experiments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Official Repo for the paper: Gaps or Hallucinations? Scrutinizing Machine-Generated Legal Analysis for Fine-grained Text Evaluations

News

Pipeline to generate the detection datasets and run the detection experiments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages