Skip to content

nuprl/partialordereval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artifact: More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation

This release contains all artifacts related to the paper, organized into three main categories:

Structure

1. Dataset Generation (1-dataset-generation/)

Code and data for creating prompt variations used in the paper.

  • code/: Code for generating dataset variations (llm_summary, paragraph_sampling, sentence_block_masking)
  • datasets/: Generated source datasets (the actual datasets used in experiments)
  • config/: Configuration files (YAML) for dataset generation
  • scripts/: Scripts to run dataset generation

2. Evaluation (2-evaluation/)

Code for generating LLM outputs from datasets and evaluating them.

  • code-generation/: Scripts for generating LLM code completions
  • evaluation-code/: Scripts for evaluating generated code (Python evaluation, pass@k)
  • scripts/: Orchestration scripts for generate-and-evaluate pipelines
  • sample-outputs/: Sample generated outputs (full set too large for release)

Note: ParEval framework should be cloned as a git submodule (see Setup instructions below)

3. Results Presentation (3-results-presentation/)

Analysis notebooks, final results, and visualizations.

  • paper.pdf: The paper
  • notebooks/: Jupyter notebooks for analysis and creating plots
  • results/: Final results (CSV files, PDF plots, analysis outputs)

Shared Resources (shared/)

  • requirements.txt: Python dependencies

Citation

If you use this code or data, please cite:

@misc{zi2025scoreprobingimpactprompt,
      title={More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation}, 
      author={Yangtian Zi and Harshitha Menon and Arjun Guha},
      year={2025},
      eprint={2508.03678},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.03678}, 
}

About

Artifact for More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages