Skip to content

A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

License

Notifications You must be signed in to change notification settings

project-miracl/hagrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

HAGRID

Build License arXiv

HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) is a dataset for generative information-seeking scenarios. It is constructed on top of MIRACL 🌍🙌🌏, an information retrieval dataset that consists of queries along with a set of manually labelled relevant passages (quotes).

We collect attributed explanations for each question by eliciting prompts from GPT-3.5, based on the given relevant passages. The explanations adhere to an in-context citation style, similar to scientific articles, that reference the supporting quotes. We then ask human annotators to judge the explanations based on two criteria:

  1. Informativeness: whether they provide a direct answer to the question.
  2. Attributability: whether they are attributable to the source passages.

HAGRID workflow

Quick Links

Data

HAGRID is hosted on Hugging Face 🤗: link.

import datasets
hagrid = datasets.load_dataset("miracl/hagrid", split="train")
print(hagrid[0])
Split #Q #A #Informativeness #Attribuatability
Train 1,922 3,214 3,214 754
Dev 716 1,318 1,157 826

Baselines (Coming soon!)

We are planning to release baseline models soon! Stay tuned!

Contact

If you have any questions, feel free to email us (project.miracl [at] gmail.com) or start a Github issue under this repository.

License

This work is licensed under the Apache 2 license. See LICENSE for details.

Citation

If you find this dataset and repository helpful, please cite HAGRID as follows:

@article{hagrid,
      title={{HAGRID}: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution}, 
      author={Ehsan Kamalloo and Aref Jafari and Xinyu Zhang and Nandan Thakur and Jimmy Lin},
      year={2023},
      journal={arXiv:2307.16883},
}

About

A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published