Skip to content

Commit 329a929

Browse files
authored
Merge pull request #1 from microsoft/data
Adding commitpacksubset + generalization benchmarks
2 parents 2ba3d91 + 4a670fb commit 329a929

File tree

4 files changed

+126894
-4
lines changed

4 files changed

+126894
-4
lines changed

.gitignore

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,2 @@
1-
data/nextcoder-synthetic.jsonl
2-
notebook.ipynb
3-
git-credential-manager
4-
models
1+
*.ipynb
52
*.parquet

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,17 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
7878

7979
<img src="assets/spider-plot.png" width=400></img>
8080

81+
82+
| Model | MMLU | GSM8K | HumanEval+ | MBPP+ |
83+
|-------|------|-------|------------|-------|
84+
| Qwen2.5-Coder-7B-Instruct | 53.0 | 83.40 | 85.4 | 72.5 |
85+
| NextCoder-7B | 54.5 | 81.65 | 84.8 | 72.0 |
86+
| Qwen2.5-Coder-32B-Instruct | 71.9 | 93.71 | 87.2 | 76.7 |
87+
| NextCoder-32B | 72.7 | 92.65 | 85.9 | 76.4 |
88+
89+
*Generalization properties kept across different benchmarks among base and nextcoder versions*
90+
91+
8192
**A detailed evaluation and ablations can be found in our paper**
8293

8394
## Contributing

data/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
- `config` contains the yaml file to map prompts to their corresponding location
66
- `utils.py` contains the helper code to extract and parse data from LLM responses
77
- `data_pipeline.py` contains the main source code for generating synthetic data according to the pipeline explained in our paper.
8+
- `commitpackft_subset.csv` file contains the `repo` and `commit` fields of the samples used in training, this can be used to map to the original commitpackft for extracting respective samples
89

910
# Usage
1011
- Make sure the proper packages are installed via the `environment.yaml` file provided at root folder

0 commit comments

Comments
 (0)