Skip to content

BITLiutianyang/BugCollection

Repository files navigation

LLM4SDR

LLM4SDR is a novel approach that leverages Large Language Models (LLMs) to fully automate the construction of open-source software defect repositories. It systematically addresses key challenges in repository construction through three main phases:

Data Preparation LLM4SDR uses LLMs to generate high-quality commit descriptions by synthesizing information from commit messages, issue reports, pull requests, and related comments. This ensures that commit messages are accurate and informative, even when the original messages are incomplete or ambiguous.

Defect Patch Identification To detect defect-related (bug-fixing) patches, LLM4SDR employs a Random Forest (RF) model that uses diverse features, including code diff metrics and analyses generated by LLMs and the static analysis tool Semgrep . Combining these sources improves precision and recall in patch detection.

Critical Variable Identification LLM4SDR identifies variables related to software defects by combining a patch-based technique with LLM-driven refinement. The LLM filters and augments candidate variables to produce a final set of critical variables that directly contribute to defect introduction and repair.

Important document description

run_llm_message.py: Leverages LLMs to integrate information from multiple sources and generate detailed commit descriptions.

llm_analyzer.py: Analyzing commits using a large model.

train.py: Train a classifier model.

keyvar_extractor_llm.py: Using large models to assist in extracting key variables.

About

bug collection from open-source projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages