Skip to content

Commit

Permalink
Update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
MingxuanXia committed Mar 13, 2024
1 parent 88bff9c commit e17968d
Showing 1 changed file with 21 additions and 5 deletions.
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
<!-- News and Updates -->

## News and Updates
- [13/03/2024] Add support for multi-modal models and datasets.
- [05/01/2024] Add support for BigBench Hard, DROP, ARC datasets.
- [16/12/2023] Add support for Gemini, Mistral, Mixtral, Baichuan, Yi models.
- [15/12/2023] Add detailed instructions for users to add new modules (models, datasets, etc.) [examples/add_new_modules.md](examples/add_new_modules.md).
Expand Down Expand Up @@ -161,7 +162,7 @@ import promptbench as pb

We provide tutorials for:

1. **evaluate models on existing benchmarks:** please refer to the [examples/basic.ipynb](examples/basic.ipynb) for constructing your evaluation pipeline.
1. **evaluate models on existing benchmarks:** please refer to the [examples/basic.ipynb](examples/basic.ipynb) for constructing your evaluation pipeline. For a multi-modal evaluation pipeline, please refer to [examples/multimodal.ipynb](examples/multimodal.ipynb)
2. **test the effects of different prompting techniques:**
3. **examine the robustness for prompt attacks**, please refer to [examples/prompt_attack.ipynb](examples/prompt_attack.ipynb) to construct the attacks.
4. **use DyVal for evaluation:** please refer to [examples/dyval.ipynb](examples/dyval.ipynb) to construct DyVal datasets.
Expand All @@ -185,6 +186,13 @@ PromptBench currently supports different datasets, models, prompt engineering me
- Numersense
- QASC
- Last Letter Concatenate
- VQAv2
- NoCaps
- MMMU
- MathVista
- AI2D
- ChartQA
- ScienceQA

### Models

Expand All @@ -203,6 +211,18 @@ PromptBench currently supports different datasets, models, prompt engineering me
- GPT-4
- Gemini Pro

### Models (Multi-Modal)

- Open-source models:
- BLIP2
- LLaVA
- Qwen-VL, Qwen-VL-Chat
- InternLM-XComposer2-VL
- Proprietary models
- GPT-4v
- GeminiProVision
- Qwen-VL-Max, Qwen-VL-Plus

### Prompt Engineering

- Chain-of-thought (COT) [1]
Expand Down Expand Up @@ -239,10 +259,6 @@ PromptBench currently supports different datasets, models, prompt engineering me

Please refer to our [benchmark website](https://llm-eval.github.io/) for benchmark results on Prompt Attacks, Prompt Engineering and Dynamic Evaluation DyVal.

## TODO

- [ ] Add support for multi-modal models such as LlaVa and BLIP2.

## Acknowledgements

- [TextAttack](https://github.com/QData/TextAttack)
Expand Down

0 comments on commit e17968d

Please sign in to comment.