AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch.
AutoCodeRover resolves ~16% of issues of SWE-bench (total 2294 GitHub issues) and ~22% issues of SWE-bench lite (total 300 GitHub issues), improving over the current state-of-the-art efficacy of AI software engineers.
AutoCodeRover works in two stages:
- 🔎 Context retrieval: The LLM is provided with code search APIs to navigate the codebase and collect relevant context.
- 💊 Patch generation: The LLM tries to write a patch, based on retrieved context.
AutoCodeRover has two unique features:
- Code search APIs are Program Structure Aware. Instead of searching over files by plain string matching, AutoCodeRover searches for relevant code context (methods/classes) in the abstract syntax tree.
- When a test suite is available, AutoCodeRover can take advantage of test cases to achieve an even higher repair rate, by performing statistical fault localization.
AutoCodeRover: Autonomous Program Improvement [arXiv 2404.05427]
For referring to our work, please cite and mention:
@misc{zhang2024autocoderover,
title={AutoCodeRover: Autonomous Program Improvement},
author={Yuntong Zhang and Haifeng Ruan and Zhiyu Fan and Abhik Roychoudhury},
year={2024},
eprint={2404.05427},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
As an example, AutoCodeRover successfully fixed issue #32347 of Django. See the demo video for the full process:
acr-final.mp4
AutoCodeRover can resolve even more issues, if test cases are available. See an example in the video:
acr_enhancement-final.mp4
We recommend running AutoCodeRover in a Docker container. First of all, build and start the docker image:
docker build -f Dockerfile -t acr .
docker run -it acr
In the docker container, set the OPENAI_KEY
env var to your OpenAI key:
export OPENAI_KEY=xx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
In the docker container, we need to first set up the tasks to run in SWE-bench (e.g., django__django-11133
). The list of all tasks can be found in conf/swe_lite_tasks.txt
.
The tasks need to be put in a file, one per line:
cd /opt/SWE-bench
echo django__django-11133 > tasks.txt
Then, set up these tasks by running:
cd /opt/SWE-bench
conda activate swe-bench
python harness/run_setup.py --log_dir logs --testbed testbed --result_dir setup_result --subset_file tasks.txt
Once the setup for this task is completed, the following two lines will be printed:
setup_map is saved to setup_result/setup_map.json
tasks_map is saved to setup_result/tasks_map.json
The testbed
directory will now contain the cloned source code of the target project.
A conda environment will also be created for this task instance.
If you want to set up multiple tasks together, put their ids in tasks.txt
and follow the same steps.
Before running the task (django__django-11133
here), make sure it has been set up as mentioned above.
cd /opt/auto-code-rover
conda activate auto-code-rover
PYTHONPATH=. python app/main.py --enable-layered --model gpt-4-0125-preview --setup-map ../SWE-bench/setup_result/setup_map.json --tasks-map ../SWE-bench/setup_result/tasks_map.json --output-dir output --task django__django-11133
The output of the run can then be found in output/
. For example, the patch generated for django__django-11133
can be found at a location like this: output/applicable_patch/django__django-11133_yyyy-MM-dd_HH-mm-ss/extracted_patch_1.diff
(the date-time field in the directory name will be different depending on when the experiment was run).
First, put the id's of all tasks to run in a file, one per line. Suppose this file is tasks.txt
, the tasks can be run with
PYTHONPATH=. python app/main.py --enable-layered --model gpt-4-0125-preview --setup-map ../SWE-bench/setup_result/setup_map.json --tasks-map ../SWE-bench/setup_result/tasks_map.json --output-dir output --task-list-file tasks.txt
NOTE: make sure that the tasks in tasks.txt
have all been set up in SWE-bench. See the steps above.
Alternatively, a config file can be used to specify all parameters and tasks to run. See conf/vanilla-lite.conf
for an example.
Also see EXPERIMENT.md for the details of the items in a conf file.
A config file can be used by:
python scripts/run.py conf/vanilla-lite.conf
Please refer to EXPERIMENT.md for information on experiment replication.
For any queries, you are welcome to open an issue.
Alternatively, contact us at: {yuntong,hruan,zhiyufan}@comp.nus.edu.sg.
This work was partially supported by a Singapore Ministry of Education (MoE) Tier 3 grant "Automated Program Repair", MOE-MOET32021-0001.