Arxiv preprint | DSPy Implementation
AvaTaR is a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data.
[July 2024] 🔥 Avatar is integrated into DSPy - Credit to Herumb Shandilya! You can try out the example on jupyter notebook.
conda create -n avatar python=3.11
pip install stark-qa typeguard
- Specify API keys in command line
export ANTHROPIC_API_KEY=YOUR_API_KEY
export OPENAI_API_KEY=YOUR_API_KEY export OPENAI_ORG=YOUR_ORGANIZATION
- Embeddings: Download all embeddings by running the following script:
sh scripts/emb_download_all.sh
- Raw data:
STaRK data will be downloaded automatically when running the code.
For Flickr30k Entities, submit form at Flickr 30k & Denotation Graph data to request access. Then organize the data as follows:
data ├── flickr30k_entities │ ├── raw │ │ ├── Annotations │ │ │ ├── 36979.xml │ │ │ ├── ... │ │ ├── flickr30k-images │ │ ├── 36979.jpg │ │ ├── ... │ ├── split │ │ ├── test.index │ │ ├── train.index │ │ ├── val.index │ ├── qa.csv ├── ...
We already include the VSS results locally under output/eval
and the grouping (for STaRK only) under output/agent
. With these files, you should be able to optimize actor actions directly following the AvaTaR pipeline.
- Optimization: Following the default settings at
config/default_args.json
, run the following command to optimize the actor actions for a group of queries:You can specify the dataset name and group insh scripts/run_avatar_stark.sh
scripts/run_avatar_stark.sh
.sh run_avatar_flickr30k_entities.sh
- Evaluation: Run the following command to evaluate the optimized actor actions:
or
sh scripts/run_eval_avatar_stark.sh
sh scripts/run_eval_avatar_flickr30k_entities.sh
@article{wu24avatar,
title = {AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval},
author = {
Shirley Wu and Shiyu Zhao and
Qian Huang and Kexin Huang and
Michihiro Yasunaga and Kaidi Cao and
Vassilis N. Ioannidis and Karthik Subbian and
Jure Leskove and James Zou
},
eprinttype = {arXiv},
eprint = {2406.11200},
year = {2024}
}