xDial-Eval

Repository for EMNLP-2023 Findings Paper - xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

Changelog

[25/10/2023] Add data to the repository.

[27/10/2023] Add code for zero-shot inference with open-source LLMs to the repository.

Prerequisites

Python 3.8+ and PyTorch 1.13.1+
See requirments.txt

Data Format

The csv files in the turn-level data include columns: [lang]_ctx, [lang]_res, and raings where [lang] refers to different languages.
The csv files in the dialogue-level data include columns: [lang]_dial and raings where [lang] refers to different languages.
[lang]_ctx and [lang]_dialogue are delimited by \n.

Original English Data

Sources

FED-Turn and FED-Dial: https://shikib.com/fed_data.json
Persona-USR and Topical-USR: https://shikib.com/usr
DailyDialog-Zhao and Persona-Zhao: https://github.com/ZHAOTING/dialog-processing/tree/master/src/tasks/response_eval
ConTurE-Dial and ConTurE-Turn: https://github.com/alexa/conture
Empathetic-GRADE, ConvAI2-GRADE, and DailyDialog-GRADE: https://github.com/li3cmz/GRADE/tree/main/evaluation
Persona-DSTC10 and Topical-DSTC10: https://chateval.org/dstc10
DailyDialog-Gupta: https://github.com/prakharguptaz/multirefeval
IEval: https://github.com/Sea94/ieval
Persona-See: https://github.com/facebookresearch/ParlAI/tree/main/projects/controllable_dialogue
Reliable-Eval: https://github.com/TianboJi/Dialogue-Eval
Human-Eval: https://github.com/facebookresearch/ParlAI/tree/main/projects/humaneval

Note that for accessing Human-Eval data, please contact the original authors of Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents. Once you have obtained the permission, you may contact me to obtain the multilingual extension of Human-Eval data.

Zero-shot Inference with Open-source LLMs

Currently, we included scripts for zero-shot inference with LLama-2, Baichuan-2, Phoenix, and Alpaca. You can easily adapt the scripts to other open-source LLMs.

The python scripts can be found in zeroshot_inference and the shell scripts are in scripts/zeroshot_inference.

Example execution - bash zeroshot_inference/turn/infer_alpaca.sh.

Please cite us if you found our benchmark useful

@inproceedings{zhang-etal-2023-xdial,
    title = "x{D}ial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark",
    author = "Zhang, Chen  and
      D{'}Haro, Luis  and
      Tang, Chengguang  and
      Shi, Ke  and
      Tang, Guohua  and
      Li, Haizhou",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.371",
    doi = "10.18653/v1/2023.findings-emnlp.371",
    pages = "5579--5601",
}

Acknowledge Statement

We thank all the authors for kindly making their data publicly available. In the same spirit, we make our multilingual extension publicly available as well. We hope our data can further benefit researchers working on multilingual open-domain dialogue systems and evaluation metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
scripts/zeroshot_inference		scripts/zeroshot_inference
zeroshot_inference		zeroshot_inference
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xDial-Eval

Changelog

Prerequisites

Data Format

Original English Data

Sources

Zero-shot Inference with Open-source LLMs

Please cite us if you found our benchmark useful

Acknowledge Statement

About

Releases

Packages

Languages

License

e0397123/xDial-Eval

Folders and files

Latest commit

History

Repository files navigation

xDial-Eval

Changelog

Prerequisites

Data Format

Original English Data

Sources

Zero-shot Inference with Open-source LLMs

Please cite us if you found our benchmark useful

Acknowledge Statement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages