EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

Source code for the ASE'23 paper EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery.

Folder

Dstill folder contains the data format used in the distillation step dataset.py, the configuration file tiny_bert_config.json for student model and the distillation file bertdistill.py .
LinkGenerator folder contains the parser_lang folder for parsing abstract syntax trees and preprocessing steps for raw data.
data is used to store the processed datasets (you can get it in the link below).
modelscontains training and testing files.

Environment

python 3.9.7
pytorch 1.11.0
pandas 1.3.4
numpy 1.21.6
transformers 4.21.0
cudatoolkit 11.3.1
torchaudio 1.11.0
torchvision 1.12.0
GPU with CUDA 11.3

Datasets

We have constructed six large-scale project datasets for evaluating issue-commit link recovery. You can download the final dataset (Google Drive or 阿里云盘) described in the paper. To generate the dataset used for EALink in our experiments, please follow the data preprocessing steps.

How to run

1. Data preprocessing

You can follow the steps in the LinkGenerator folder to generate the dataset used for EALink. Or you can directly download the processed dataset (Google Drive or 阿里云盘) for use.

Get issue-code links for auxiliary task

In the LinkGenerator folder, 0_subdata.py generates issue-code links. You can run the following command：

python 0_subdata.py

Get issue-commit links after word segmentation processing

python 1_splitword.py

Merge

dataset merging

python 2_sub_merge.py

2. Distill the pre-trained model

cd Dstill
python bertdistill.py

3. Train and test

In the models folder, train.py and test.py enable training and testing of the trained model, respectively.

Train

cd models
python train.py \
   --tra_batch_size 16 \
   --val_batch_size 16 \
   --end_epoch 400 \
   --output_model <model_save_path>

Test

python test.py \
   --tes_batch_size 16 \
   --model_path <model_path>

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dstill		Dstill
LinkGenerator		LinkGenerator
data		data
models		models
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

Folder

Environment

Datasets

How to run

1. Data preprocessing

Get issue-code links for auxiliary task

Get issue-commit links after word segmentation processing

Merge

2. Distill the pre-trained model

3. Train and test

Train

Test

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

Folder

Environment

Datasets

How to run

1. Data preprocessing

Get issue-code links for auxiliary task

Get issue-commit links after word segmentation processing

Merge

2. Distill the pre-trained model

3. Train and test

Train

Test

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages