GitHub

The dataset of the paper titled "Context-Aware Code Change Embedding for Better Patch Correctness Assessment".

This is the online repository of the paper "Context-Aware Code Change Embedding for Better Patch Correctness Assessment". We release the source code of Cache, the patches used in our evaluation, as well as the experiment results.

Patches: two patch benchmarks included in our study.
- Samll: The 1,183 deduplicated patches from Tian's ASE20 paper and Wang's ASE20 paper.
- Large: The patches collected by ourselves, which is consist of 49,694 patches from RepairThemAll(ground-truth labeled by Tian et al from ) and ManySStuBs.
Results
- RQ1: The detailed result files in RQ1, which are named by the format of [model]_[classifier].csv. For example, the file named BERT_DT.csv in the folder Tian's_dataset means that this file is the result of patches from Tian's study embedded by BERT and classified by Decision Tree.
  - Tian's_dataset: The detailed result files on Tian's dataset.
  - Cache_dataset: The detailed result files on our own dataset.
  - Cross_dataset: The detailed result files of representation learning techniques when training on our own dataset and testing on Tian's dataset.
- RQ2: The detailed result files in RQ2.
  - Wang_Cache.csv: The detailed result of Cache on the dataset from Wang's ASE20 paper.
  - ODS_Cache.csv: The datailed result of Cache on the dataset from Xiong's ICSE18 paper. We directly compare against the results reported by the authors of ODS on 139 patches from Xiong's paper since the data and source code of ODS is unavailable.
Source: The source code and lib for running Cache.

Prerequisite

Java 1.7
Python 3.6
Defects4j 1.2
Bugs.jar
Bears
QuixBugs

##　Preprocessing

Extract the buggy file and fixed file from patch

git clone https://github.com/bugs-dot-jar/bugs-dot-jar  # Bugs.jar benchmark
git clone https://github.com/bears-bugs/bears-benchmark  # Bears benchmark
git clone https://github.com/jkoppel/QuixBugs # QuixBugs benchmark
# Follow the instructions in https://github.com/rjust/defects4j to install defect 4j1.2

python3 genOverfittingPatches.py

Generate the AST paths

We reuse the ast path extractor implemented by JetBrains Research in here. To run the ASTMiner, execute the following command:

java -jar ./lib/astminer_revised.jar pathContexts --lang java --project path/to/project --output path/to/results --maxL L --maxW W --maxContexts C --maxTokens T --maxPaths P

For example:

java -Xms64g -Xmx128g -jar ./lib/astminer_revised.jar pathContexts --lang java --project ./materials --output ./dataset --maxH 9 --maxW 2 --maxContexts 200 --maxTokens 500 --maxPaths 500

Note that the space of memory the preprocessor will take up depends on the number of files and parameters. Usually, it will take up more than 60GB memory and we preproccess our dataset on a server with 128G memory.

Generate the sub-token level vocabulary

python3 genSubtokenVocab.py

Training

python3 main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The dataset of the paper titled "Context-Aware Code Change Embedding for Better Patch Correctness Assessment".

Prerequisite

Extract the buggy file and fixed file from patch

Generate the AST paths

Generate the sub-token level vocabulary

Training

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Preprocessing		Preprocessing
Results		Results
lib		lib
patches		patches
source		source
.gitattributes		.gitattributes
Readme.md		Readme.md

Ringbo/Cache

Folders and files

Latest commit

History

Repository files navigation

The dataset of the paper titled "Context-Aware Code Change Embedding for Better Patch Correctness Assessment".

Prerequisite

Extract the buggy file and fixed file from patch

Generate the AST paths

Generate the sub-token level vocabulary

Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages