This repository presents the implementation of the NAACL 2024 paper:
Rationale-based Opinion Summarization,
Haoyuan Li and Snigdha Chaturvedi
Download the data file from this link and unzip it into the data
folder. It contains subsets of the Space dataset and the Yelp used by RATION for experiment and the intermediate outputs of RATION. It also includes training data for the specificity estimation model and textual alignment models.
Download all model files from this link and unzip all the files into the model
folder. They include the checkpoints for SemAE
and Snippext
and specificity estimation model and text alignment model used by RATION
.
The generated rationale-based opinion summaries are in the ration_summary
folder. The extracted rationales by RATION
and GPT-3.5
are in the rationales
folder. The conventional opinion summaries generated by SemAE
are in the conv_summary
folder.
RATION
depends on SemAE and Snippext. However, these two repos use an older version of pytorch not compatible with other codes of RATION
. Therefore, create an environment for these two repos based on the instructions of these two repos (denoted as old_env
) and create another enviroment based on the following intrcustions:
-
Python version:
python3.8
-
Dependencies: Use the
requirements.txt
file and conda/pip to install all necessary dependencies. E.g., for pip:pip install -U pip pip install -U setuptools pip install -r requirements.txt
This environment is denoted as new_env
.
To generate rationale-based opinion summaries from scratch, please follow the following steps.
RATION
uses SemAE
to generate conventional opinion summaries and uses Snippext
to extract representative opinions from the conventional opinion summaries. Under old_env
, please run sh script/extract_opinion_space.sh
for the Space
dataset or sh script/extract_opinion_yelp.sh
for the Yelp
dataset. Compared to the original implementation of SemAE
, RATION further restricts that each extracted summary sentence contains at least one opinions identified by Snippext
. You may use other summarization models to generate conventional opinion summaries.
RATION
extracts review sentences as rationale candidates for each representative opinion based on the textual alignment model. Under new_env
, please run sh script/gen_ration_cand_space.sh
for the Space
dataset or sh script/gen_ration_cand_yelp.sh
for the Yelp
dataset. The scripts also run automatic evaluations for the rationale candidate set. The values of --self_sim_thres
are tuned so that each entity on average has 8 rationale candidate sets. The values of --entail_thres
are tuned so all rationale candidate sets of each entity on average cover 30% of review sentences.
RATION
extracts k
rationales for each representative opinion from the rationale candidate sets. Under new_env
, please run sh script/gen_ration_space.sh
for the Space
dataset or sh script/gen_ration_yelp.sh
for the Yelp
dataset. The scripts also run the evaluation for the rationales. The scripts also run automatic evaluations for the rationale candidate set. The values of --entail_thres
are set as the same values as the previous step.
RATION
generates rationale-based opinion summaries combining representative opinions and their rationales. Under new_env
, please run sh script/gen_summary_space.sh
for the Space
dataset or sh script/gen_summary_yelp.sh
for the Yelp
dataset.
We finetune the deberta-base
model using the data from this link. Under new_env
, please run sh script/train_specificity.sh
.
We finetune the roberta-large
model on masked language model and classification using data from this link for the Space dataset and data from this link for the Yelp dataset. Under new_env
, please run sh script/train_align_space.sh
for the Space
dataset or sh script/train_align_yelp.sh
for the Yelp
dataset.
@inproceedings{li-chaturvedi-2024-rationale,
title = "Rationale-based Opinion Summarization",
author = "Li, Haoyuan and
Chaturvedi, Snigdha",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.458",
pages = "8267--8285",
abstract = "Opinion summarization aims to generate concise summaries that present popular opinions of a large group of reviews. However, these summaries can be too generic and lack supporting details. To address these issues, we propose a new paradigm for summarizing reviews, rationale-based opinion summarization. Rationale-based opinion summaries output the representative opinions as well as one or more corresponding rationales. To extract good rationales, we define four desirable properties: relatedness, specificity, popularity, and diversity and present a Gibbs-sampling-based method to extract rationales. Overall, we propose RATION, an unsupervised extractive system that has two components: an Opinion Extractor (to extract representative opinions) and Rationales Extractor (to extract corresponding rationales). We conduct automatic and human evaluations to show that rationales extracted by RATION have the proposed properties and its summaries are more useful than conventional summaries. The implementation of our work is available at https://github.com/leehaoyuan/RATION.",
}