Java implementation of Flexeme based on the original implementation of Flexeme.
Please see ORIGINAL_INSTRUCTIONS.md document for the documentation of the original Flexeme repository.
- Requires Python 3.8.
- Requires Java 8 on the path
- Requires Java 11 in
JAVA11_HOME
environment variable
- Install Graphviz https://graphviz.org/.
rm -rf .venv && python3 -m venv .venv
source .venv/bin/activate
pip install -e .
If the dependency pygraphviz
fails to install, visit https://pygraphviz.github.io/documentation/stable/install.html and follow the instructions for your OS.
- Run
cp .env-template .env
then fill in the environment variables in.env
:JAVA11_HOME
: Location of the Java 11 executable to run the PDG extractor. (e.g.,$HOME/.sdkman/candidates/java/11.0.18-amzn/bin/java
)
Run Flexeme on the synthetic benchmark.
input: path to repository
output: untangling accuracy for repository
Steps:
- Create lists of commit ids (e.g., [a, b, c, d]). A list of commit ids represents multiple synthetic commits of
varrying size (named 'concerns'). e.g.,
- a to b represent a synthetic commit with 1 concern
- a to c represent a synthetic commit with 2 concerns
- a to d represent a synthetic commit with 3 concerns
- Generate ∂PDGs for each synthetic commits:
- Each file changed in the synthetic commit gets a ∂PDG
- Merge file-based ∂PDG into a single ∂PDG to represent the synthetic commit.
- Normalization of labels in ∂PDGs.
- Evaluation (runs the untangling on the ∂PDGs).
- Report untangling accuracy.
- Checkout Defects4J repository
git clone $D4J_HOME/project_repos/commons-lang.git /private/tmp/commons-lang
. - Creating synthetic commits
python3 flexeme/tangle_concerns/tangle_by_file.py /private/tmp/commons-lang /private/tmp/ .
. - Generate ∂PDGs and evaluate:
python3 flexeme/tangle_concerns/generate_corpus.py ./commons-lan_history_filtered_flat.json /private/tmp/commons-lang /private/tmp/commons-lang-work/
. - Results are saved in
out/commons-lang/
.
The file defects4j/layout_changes.json
contains the changes in repository layouts for sourcepath for Defects4J
projects. The file is necessary for running the synthetic benchmark. The changes are ordered from newest to oldest.
When untangling a commit, the scripts find the correct layout by checking if the newest layout change commit is an
ancestor.
If it is not, it will check the next older layout change commit until it finds an ancestor. If no ancestor is found,
a warning is logged and the layout returned is None
.
The layout changes are added manually from the dir_layout.csv
project-specific file stored in the Defects4J
repository. The entries in dir_layout.csv
are ordered either from new to old or from old to new. Before adding a
new project in defects4j/layout_changes.json
, verify which order is used in dir_layout.csv
.
Run Flexeme to untangle a commit in a local repository.
- Run:
flexeme <repository> <commit> <sourcepath> <classpath> <output_file>
repository
: Path to the repository.commit
: Commit to untangle.sourcepath
: Java sourcepath to compile the files ofcommit
.classpath
: Java classpath to compile the files ofcommit
.output_file
: Where the results are stored.