Skip to content

amazon-science/ParaAMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ParaAMR

This repository contains the dataset for our ACL-2023 Area Chair Award paper ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation.

After uncompressing ParaAMR_small.zip, you will get ParaAMR_small.json with each line being a json object:

{
    "text": "all of these men can confirm that cheryl was not here when harry was killed.",
    "paraphrases": [
        {
            "para_text": "all these men can confirm that cheryl wasn't here when harry was killed.",
            "perplexity": 125.02737280513969
        },
        {
            "para_text": "cheryl was not here when harry was killed, which all these men could confirm.",
            "perplexity": 96.61560039297649
        },
        ...
        {
            "para_text": "harry was the one killed when cheryl wasn't here, which is confirmed by all of these men.",
            "perplexity": 91.92300008948598
        }
    ]
}

There are 177,410 lines in total. This is a subset of ParaAMR that we are using for analysis and running experiments. We will release the full set of ParaAMR, which contains 15,543,606 lines in total, as soon as possible. Note that the source text comes from ParaNMT and we generate the paraphrases by AMR-back-translation. The perplexity is calculated based on GPT-2. Please check more details in our paper.

If you find that the code is useful in your research, please consider citing our paper.

@inproceedings{acl2023paraamr,
    author    = {Kuan-Hao Huang and Varun Iyer and I-Hung Hsu and Anoop Kumar and Kai-Wei Chang and Aram Galstyan},
    title     = {ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation},
    booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)},
    year      = {2023},
}

Security

See CONTRIBUTING for more information.

License

This library is licensed under the CC-BY-NC License. See the LICENSE file for more details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published