Showing 1–50 of 72 results for author: Dagan, I

Search v0.5.6 released 2020-02-24

arXiv:2408.04246 [pdf, other]

cs.CL

Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

Authors: Paul Roit, Aviv Slobodkin, Eran Hirsch, Arie Cattan, Ayal Klein, Valentina Pyatkin, Ido Dagan

Abstract: Detecting semantic arguments of a predicate word has been conventionally modeled as a sentence-level task. The typical reader, however, perfectly interprets predicate-argument relations in a much wider context than just the sentence where the predicate was evoked. In this work, we reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence… ▽ More Detecting semantic arguments of a predicate word has been conventionally modeled as a sentence-level task. The typical reader, however, perfectly interprets predicate-argument relations in a much wider context than just the sentence where the predicate was evoked. In this work, we reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence boundaries. We propose a method that tests whether some semantic relation can be inferred from a full passage by first encoding it into a simple and standalone proposition and then testing for entailment against the passage. Our method does not require direct supervision, which is generally absent due to dataset scarcity, but instead builds on existing NLI and sentence-level SRL resources. Such a method can potentially explicate pragmatically understood relations into a set of explicit sentences. We demonstrate it on a recent document-level benchmark, outperforming some supervised methods and contemporary language models. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 9 pages, ACL 2024
arXiv:2407.00402 [pdf, other]

cs.CL cs.AI

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

Authors: Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty

Abstract: Improvements in language models' capabilities have pushed their applications towards longer contexts, making long-context evaluation and development an active research area. However, many disparate use-cases are grouped together under the umbrella term of "long-context", defined simply by the total length of the model's input, including - for example - Needle-in-a-Haystack tasks, book summarizatio… ▽ More Improvements in language models' capabilities have pushed their applications towards longer contexts, making long-context evaluation and development an active research area. However, many disparate use-cases are grouped together under the umbrella term of "long-context", defined simply by the total length of the model's input, including - for example - Needle-in-a-Haystack tasks, book summarization, and information aggregation. Given their varied difficulty, in this position paper we argue that conflating different tasks by their context length is unproductive. As a community, we require a more precise vocabulary to understand what makes long-context tasks similar or different. We propose to unpack the taxonomy of long-context based on the properties that make them more difficult with longer contexts. We propose two orthogonal axes of difficulty: (I) Diffusion: How hard is it to find the necessary information in the context? (II) Scope: How much necessary information is there to find? We survey the literature on long-context, provide justification for this taxonomy as an informative descriptor, and situate the literature with respect to it. We conclude that the most difficult and interesting settings, whose necessary information is very long and highly diffused within the input, is severely under-explored. By using a descriptive vocabulary and discussing the relevant properties of difficulty in long-context, we can implement more informed research in this area. We call for a careful design of tasks and benchmarks with distinctly long context, taking into account the characteristics that make it qualitatively different from shorter context. △ Less

Submitted 11 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.
arXiv:2406.14314 [pdf, other]

cs.CL cs.AI

Identifying User Goals from UI Trajectories

Authors: Omri Berkovitch, Sapir Caduri, Noam Kahlon, Anatoly Efros, Avi Caciularu, Ido Dagan

Abstract: Autonomous agents that interact with graphical user interfaces (GUIs) hold significant potential for enhancing user experiences. To further improve these experiences, agents need to be personalized and proactive. By effectively comprehending user intentions through their actions and interactions with GUIs, agents will be better positioned to achieve these goals. This paper introduces the task of g… ▽ More Autonomous agents that interact with graphical user interfaces (GUIs) hold significant potential for enhancing user experiences. To further improve these experiences, agents need to be personalized and proactive. By effectively comprehending user intentions through their actions and interactions with GUIs, agents will be better positioned to achieve these goals. This paper introduces the task of goal identification from observed UI trajectories, aiming to infer the user's intended task based on their GUI interactions. We propose a novel evaluation metric to assess whether two task descriptions are paraphrases within a specific UI environment. By Leveraging the inverse relation with the UI automation task, we utilized the Android-In-The-Wild and Mind2Web datasets for our experiments. Using our metric and these datasets, we conducted several experiments comparing the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro. Our results show that Gemini performs better than GPT but still underperforms compared to humans, indicating significant room for improvement. △ Less

Submitted 30 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.
arXiv:2406.00842 [pdf, other]

cs.CL

The Power of Summary-Source Alignments

Authors: Ori Ernst, Ori Shapira, Aviv Slobodkin, Sharon Adar, Mohit Bansal, Jacob Goldberger, Ran Levy, Ido Dagan

Abstract: Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied he… ▽ More Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied heuristically on the sentence level on a limited number of subtasks. In this paper, we propose extending the summary-source alignment framework by (1) applying it at the more fine-grained proposition span level, (2) annotating alignment manually in a multi-document setup, and (3) revealing the great potential of summary-source alignments to yield several datasets for at least six different tasks. Specifically, for each of the tasks, we release a manually annotated test set that was derived automatically from the alignment annotation. We also release development and train sets in the same way, but from automatically derived alignments. Using the datasets, each task is demonstrated with baseline models and corresponding evaluation metrics to spur future research on this broad challenge. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted to ACL-Findings 2024
arXiv:2405.20967 [pdf, other]

cs.CL

Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

Authors: Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty

Abstract: Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set. As such, superlatives provide an ideal phenomenon for studying implicit phenomena and discourse restrictions. While this comparison set is often not explicitly defined, its (implicit) restrictions can be… ▽ More Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set. As such, superlatives provide an ideal phenomenon for studying implicit phenomena and discourse restrictions. While this comparison set is often not explicitly defined, its (implicit) restrictions can be inferred from the discourse context the expression appears in. In this work we provide an extensive computational study on the semantics of superlatives. We propose a unified account of superlative semantics which allows us to derive a broad-coverage annotation schema. Using this unified schema we annotated a multi-domain dataset of superlatives and their semantic interpretations. We specifically focus on interpreting implicit or ambiguous superlative expressions, by analyzing how the discourse context restricts the set of interpretations. In a set of experiments we then analyze how well models perform at variations of predicting superlative semantics, with and without context. We show that the fine-grained semantics of superlatives in context can be challenging for contemporary models, including GPT-4. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 11 pages
arXiv:2405.12081 [pdf, other]

cs.CL

Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model

Authors: Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan

Abstract: To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data),… ▽ More To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 18 pages, 4 figures
arXiv:2405.01121 [pdf, other]

cs.CL cs.AI

Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts

Authors: Lotem Golany, Filippo Galgani, Maya Mamo, Nimrod Parasol, Omer Vandsburger, Nadav Bar, Ido Dagan

Abstract: Automating data generation with Large Language Models (LLMs) has become increasingly popular. In this work, we investigate the feasibility and effectiveness of LLM-based data generation in the challenging setting of source-grounded information-seeking dialogs, with response attribution, over long documents. Our source texts consist of long and noisy meeting transcripts, adding to the task complexi… ▽ More Automating data generation with Large Language Models (LLMs) has become increasingly popular. In this work, we investigate the feasibility and effectiveness of LLM-based data generation in the challenging setting of source-grounded information-seeking dialogs, with response attribution, over long documents. Our source texts consist of long and noisy meeting transcripts, adding to the task complexity. Since automating attribution remains difficult, we propose a semi-automatic approach: dialog queries and responses are generated with LLMs, followed by human verification and identification of attribution spans. Using this approach, we created MISeD -- Meeting Information Seeking Dialogs dataset -- a dataset of information-seeking dialogs focused on meeting transcripts. Models finetuned with MISeD demonstrate superior performance compared to off-the-shelf models, even those of larger size. Finetuning on MISeD gives comparable response generation quality to finetuning on fully manual data, while improving attribution quality and reducing time and effort. △ Less

Submitted 21 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.
arXiv:2403.17104 [pdf, other]

cs.CL

Attribute First, then Generate: Locally-attributable Grounded Text Generation

Authors: Aviv Slobodkin, Eran Hirsch, Arie Cattan, Tal Schuster, Ido Dagan

Abstract: Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attri… ▽ More Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attributable text generation approach, prioritizing concise attributions. Our method, named "Attribute First, then Generate", breaks down the conventional end-to-end generation process into three intuitive steps: content selection, sentence planning, and sequential sentence generation. By initially identifying relevant source segments ("select first") and then conditioning the generation process on them ("then generate"), we ensure these segments also act as the output's fine-grained attributions ("select" becomes "attribute"). Tested on Multi-document Summarization and Long-form Question-answering, our method not only yields more concise citations than the baselines but also maintains - and in some cases enhances - both generation quality and attribution accuracy. Furthermore, it significantly reduces the time required for fact verification by human assessors. △ Less

Submitted 4 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: ACL 2024
arXiv:2403.15351 [pdf, other]

cs.CL

Multi-Review Fusion-in-Context

Authors: Aviv Slobodkin, Ori Shapira, Ran Levy, Ido Dagan

Abstract: Grounded text generation, encompassing tasks such as long-form question-answering and summarization, necessitates both content selection and content consolidation. Current end-to-end methods are difficult to control and interpret due to their opaqueness. Accordingly, recent works have proposed a modular approach, with separate components for each step. Specifically, we focus on the second subtask,… ▽ More Grounded text generation, encompassing tasks such as long-form question-answering and summarization, necessitates both content selection and content consolidation. Current end-to-end methods are difficult to control and interpret due to their opaqueness. Accordingly, recent works have proposed a modular approach, with separate components for each step. Specifically, we focus on the second subtask, of generating coherent text given pre-selected content in a multi-document setting. Concretely, we formalize Fusion-in-Context (FiC) as a standalone task, whose input consists of source texts with highlighted spans of targeted content. A model then needs to generate a coherent passage that includes all and only the target information. Our work includes the development of a curated dataset of 1000 instances in the reviews domain, alongside a novel evaluation framework for assessing the faithfulness and coverage of highlights, which strongly correlate to human judgment. Several baseline models exhibit promising outcomes and provide insightful analyses. This study lays the groundwork for further exploration of modular text generation in the multi-document setting, offering potential improvements in the quality and reliability of generated content. Our benchmark, FuseReviews, including the dataset, evaluation framework, and designated leaderboard, can be found at https://fusereviews.github.io/. △ Less

Submitted 31 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: NAACL 2024, findings
arXiv:2312.04440 [pdf, other]

cs.CL

OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization

Authors: Shmuel Amar, Liat Schiff, Ori Ernst, Asi Shefer, Ori Shapira, Ido Dagan

Abstract: The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated o… ▽ More The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated on a limited set of pre-defined aspects, focused solely on single document inputs, or relied on synthetic data. To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization. This benchmark is created using a novel and cost-effective annotation protocol, by which an open aspect dataset is derived from existing generic multi-document summarization datasets. We analyze the properties of OpenAsp showcasing its high-quality content. Further, we show that the realistic open-aspect setting realized in OpenAsp poses a challenge for current state-of-the-art summarization models, as well as for large language models. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: EMNLP 2023
arXiv:2311.11301 [pdf, other]

cs.CL

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

Authors: Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

Abstract: Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both cl… ▽ More Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: EMNLP 2023
arXiv:2310.13682 [pdf, other]

cs.CL cs.AI cs.LG

Optimizing Retrieval-augmented Reader Models via Token Elimination

Authors: Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat

Abstract: Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribu… ▽ More Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results. △ Less

Submitted 5 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Main Conference
arXiv:2310.11877 [pdf, other]

cs.CL

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Authors: Aviv Slobodkin, Omer Goldman, Avi Caciularu, Ido Dagan, Shauli Ravfogel

Abstract: Large language models (LLMs) have been shown to possess impressive capabilities, while also raising crucial concerns about the faithfulness of their responses. A primary issue arising in this context is the management of (un)answerable queries by LLMs, which often results in hallucinatory behavior due to overconfidence. In this paper, we explore the behavior of LLMs when presented with (un)answera… ▽ More Large language models (LLMs) have been shown to possess impressive capabilities, while also raising crucial concerns about the faithfulness of their responses. A primary issue arising in this context is the management of (un)answerable queries by LLMs, which often results in hallucinatory behavior due to overconfidence. In this paper, we explore the behavior of LLMs when presented with (un)answerable queries. We ask: do models represent the fact that the question is (un)answerable when generating a hallucinatory answer? Our results show strong indications that such models encode the answerability of an input query, with the representation of the first decoded token often being a strong indicator. These findings shed new light on the spatial organization within the latent representations of LLMs, unveiling previously unexplored facets of these models. Moreover, they pave the way for the development of improved decoding techniques with better adherence to factual generation, particularly in scenarios where query (un)answerability is a concern. △ Less

Submitted 12 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: EMNLP 2023
arXiv:2310.09017 [pdf, other]

cs.CL

Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans

Authors: Aviv Slobodkin, Avi Caciularu, Eran Hirsch, Ido Dagan

Abstract: The recently introduced Controlled Text Reduction (CTR) task isolates the text generation step within typical summarization-style tasks. It does so by challenging models to generate coherent text conforming to pre-selected content within the input text (``highlights''). This framing enables increased modularity in summarization-like tasks, allowing to couple a single CTR model with various content… ▽ More The recently introduced Controlled Text Reduction (CTR) task isolates the text generation step within typical summarization-style tasks. It does so by challenging models to generate coherent text conforming to pre-selected content within the input text (``highlights''). This framing enables increased modularity in summarization-like tasks, allowing to couple a single CTR model with various content-selection setups and modules. However, there are currently no reliable CTR models, while the performance of the existing baseline for the task is mediocre, falling short of practical utility. Here, we address this gap by introducing a high-quality, open-source CTR model that tackles two prior key limitations: inadequate enforcement of the content-preservation constraint, and suboptimal silver training data. Addressing these, we amplify the content-preservation constraint in both training, via RL, and inference, via a controlled decoding strategy. Further, we substantially improve the silver training data quality via GPT-4 distillation. Overall, pairing the distilled dataset with the highlight-adherence strategies yields marked gains over the current baseline, of up to 30 ROUGE-L points, providing a reliable CTR model for downstream use. △ Less

Submitted 25 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: EMNLP 2023, findings
arXiv:2308.08363 [pdf, other]

cs.CL

SummHelper: Collaborative Human-Computer Summarization

Authors: Aviv Slobodkin, Niv Nachum, Shmuel Amar, Ori Shapira, Ido Dagan

Abstract: Current approaches for text summarization are predominantly automatic, with rather limited space for human intervention and control over the process. In this paper, we introduce SummHelper, a 2-phase summarization assistant designed to foster human-machine collaboration. The initial phase involves content selection, where the system recommends potential content, allowing users to accept, modify, o… ▽ More Current approaches for text summarization are predominantly automatic, with rather limited space for human intervention and control over the process. In this paper, we introduce SummHelper, a 2-phase summarization assistant designed to foster human-machine collaboration. The initial phase involves content selection, where the system recommends potential content, allowing users to accept, modify, or introduce additional selections. The subsequent phase, content consolidation, involves SummHelper generating a coherent summary from these selections, which users can then refine using visual mappings between the summary and the source text. Small-scale user studies reveal the effectiveness of our application, with participants being especially appreciative of the balance between automated guidance and opportunities for personal input. △ Less

Submitted 16 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: EMNLP 2023 System Demonstrations
arXiv:2305.15605 [pdf, other]

cs.CL

Revisiting Sentence Union Generation as a Testbed for Text Consolidation

Authors: Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi Shefer, Ido Dagan

Abstract: Tasks involving text generation based on multiple input texts, such as multi-document summarization, long-form question answering and contemporary dialogue applications, challenge models for their ability to properly consolidate partly-overlapping multi-text information. However, these tasks entangle the consolidation phase with the often subjective and ill-defined content selection requirement, i… ▽ More Tasks involving text generation based on multiple input texts, such as multi-document summarization, long-form question answering and contemporary dialogue applications, challenge models for their ability to properly consolidate partly-overlapping multi-text information. However, these tasks entangle the consolidation phase with the often subjective and ill-defined content selection requirement, impeding proper assessment of models' consolidation capabilities. In this paper, we suggest revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities, decoupling the consolidation challenge from subjective content selection. To support research on this task, we present refined annotation methodology and tools for crowdsourcing sentence union, create the largest union dataset to date and provide an analysis of its rich coverage of various consolidation aspects. We then propose a comprehensive evaluation protocol for union generation, including both human and automatic evaluation. Finally, as baselines, we evaluate state-of-the-art language models on the task, along with a detailed analysis of their capacity to address multi-text consolidation challenges and their limitations. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Findings of the Association for Computational Linguistics (ACL 2023)
arXiv:2305.15387 [pdf, other]

cs.CL cs.AI

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

Authors: Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan

Abstract: The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systemati… ▽ More The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systematically generate semantically-oriented questions from a salient sentence in one document and challenge the model, during pre-training, to answer these questions while "peeking" into other topically-related documents. In a similar manner, the model is also challenged to recover the sentence from which the question was generated, again while leveraging cross-document information. This novel multi-document QA formulation directs the model to better recover cross-text informational relations, and introduces a natural augmentation that artificially increases the pre-training data. Further, unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation (e.g., QA) and long text generation (e.g., summarization). Following this scheme, we pre-train our model -- termed QAmden -- and evaluate its performance across several multi-document tasks, including multi-document QA, summarization, and query-focused summarization, yielding improvements of up to 7%, and significantly outperforms zero-shot GPT-3.5 and GPT-4. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023; camera-ready version
arXiv:2304.00815 [pdf, other]

cs.CL

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

Authors: Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

Abstract: Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purp… ▽ More Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: Accepted to TACL, pre-MIT Press publication version
arXiv:2210.13449 [pdf, other]

cs.CL

Controlled Text Reduction

Authors: Aviv Slobodkin, Paul Roit, Eran Hirsch, Ori Ernst, Ido Dagan

Abstract: Producing a reduced version of a source text, as in generic or focused summarization, inherently involves two distinct subtasks: deciding on targeted content and generating a coherent text conveying it. While some popular approaches address summarization as a single end-to-end task, prominent works support decomposed modeling for individual subtasks. Further, semi-automated text reduction is also… ▽ More Producing a reduced version of a source text, as in generic or focused summarization, inherently involves two distinct subtasks: deciding on targeted content and generating a coherent text conveying it. While some popular approaches address summarization as a single end-to-end task, prominent works support decomposed modeling for individual subtasks. Further, semi-automated text reduction is also very appealing, where users may identify targeted content while models would generate a corresponding coherent summary. In this paper, we focus on the second subtask, of generating coherent text given pre-selected content. Concretely, we formalize \textit{Controlled Text Reduction} as a standalone task, whose input is a source text with marked spans of targeted content ("highlighting"). A model then needs to generate a coherent text that includes all and only the target information. We advocate the potential of such models, both for modular fully-automatic summarization, as well as for semi-automated human-in-the-loop use cases. Facilitating proper research, we crowdsource high-quality dev and test datasets for the task. Further, we automatically generate a larger "silver" training dataset from available summarization benchmarks, leveraging a pretrained summary-source alignment model. Finally, employing these datasets, we present a supervised baseline model, showing promising results and insightful analyses. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted to EMNLP 2022
arXiv:2210.12688 [pdf, other]

cs.CL

How "Multi" is Multi-Document Summarization?

Authors: Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

Abstract: The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for qua… ▽ More The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for quantifying and assessing this expectation. To that end, we propose an automated measure for evaluating the degree to which a summary is ``disperse'', in the sense of the number of source documents needed to cover its content. We apply our measure to empirically analyze several popular MDS datasets, with respect to their reference summaries, as well as the output of state-of-the-art systems. Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content. Overall, we advocate using our metric for assessing and improving the degree to which summarization datasets require combining multi-document information, and similarly how summarization models actually meet this challenge. Our code is available in https://github.com/ariecattan/multi_mds. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: EMNLP 2022
arXiv:2210.12654 [pdf, other]

cs.CL

Cross-document Event Coreference Search: Task, Dataset and Modeling

Authors: Alon Eirew, Avi Caciularu, Ido Dagan

Abstract: The task of Cross-document Coreference Resolution has been traditionally formulated as requiring to identify all coreference links across a given set of documents. We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search, focusing in this paper on event coreference. Concretely, given a mention in context of an event of interest, cons… ▽ More The task of Cross-document Coreference Resolution has been traditionally formulated as requiring to identify all coreference links across a given set of documents. We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search, focusing in this paper on event coreference. Concretely, given a mention in context of an event of interest, considered as a query, the task is to find all coreferring mentions for the query event in a large document collection. To support research on this task, we create a corresponding dataset, which is derived from Wikipedia while leveraging annotations in the available Wikipedia Event Coreference dataset (WEC-Eng). Observing that the coreference search setup is largely analogous to the setting of Open Domain Question Answering, we adapt the prominent Deep Passage Retrieval (DPR) model to our setting, as an appealing baseline. Finally, we present a novel model that integrates a powerful coreference scoring scheme into the DPR architecture, yielding improved performance. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: EMNLP 2022
arXiv:2205.11413 [pdf, other]

cs.CL

QASem Parsing: Text-to-text Modeling of QA-based Semantics

Authors: Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan

Abstract: Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive repres… ▽ More Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive representation of textual information. To promote this goal, we investigate how to best utilize the power of sequence-to-sequence (seq2seq) pre-trained language models, within the unique setup of semi-structured outputs, consisting of an unordered set of question-answer pairs. We examine different input and output linearization strategies, and assess the effect of multitask learning and of simple data augmentation techniques in the setting of imbalanced training data. Consequently, we release the first unified QASem parsing tool, practical for downstream applications who can benefit from an explicit, QA-based account of information units in a text. △ Less

Submitted 14 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.
arXiv:2112.08777 [pdf, other]

cs.CL cs.AI

Long Context Question Answering via Supervised Contrastive Learning

Authors: Avi Caciularu, Ido Dagan, Jacob Goldberger, Arman Cohan

Abstract: Long-context question answering (QA) tasks require reasoning over a long document or multiple documents. Addressing these tasks often benefits from identifying a set of evidence spans (e.g., sentences), which provide supporting evidence for answering the question. In this work, we propose a novel method for equipping long-context QA models with an additional sequence-level objective for better ide… ▽ More Long-context question answering (QA) tasks require reasoning over a long document or multiple documents. Addressing these tasks often benefits from identifying a set of evidence spans (e.g., sentences), which provide supporting evidence for answering the question. In this work, we propose a novel method for equipping long-context QA models with an additional sequence-level objective for better identification of the supporting evidence. We achieve this via an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks -- HotpotQA and QAsper. △ Less

Submitted 5 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: accepted NAACL 2022, main conference
arXiv:2112.08770 [pdf, other]

cs.CL cs.LG

Proposition-Level Clustering for Multi-Document Summarization

Authors: Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

Abstract: Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well as to avoid redundancy. Such prior methods focused on clustering sentences, even though closely related sentences usually contain also non-aligned parts. In this… ▽ More Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well as to avoid redundancy. Such prior methods focused on clustering sentences, even though closely related sentences usually contain also non-aligned parts. In this work, we revisit the clustering approach, grouping together sub-sentential propositions, aiming at more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster via text fusion. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference. △ Less

Submitted 19 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: NAACl 2022
arXiv:2110.04517 [pdf, other]

cs.CL

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Authors: Daniela Brook Weiss, Paul Roit, Ori Ernst, Ido Dagan

Abstract: NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To… ▽ More NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To facilitate researching such challenges, the sentence-level task of \textit{sentence fusion} was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling and employing complementing data sources, we were able to triple the size of a notable earlier dataset. Moreover, we show that our extended version uses more representative texts for multi-document tasks and provides a larger and more diverse training set, which substantially improves model training. △ Less

Submitted 9 October, 2021; originally announced October 2021.
arXiv:2110.01073 [pdf, other]

cs.CL

Multi-Document Keyphrase Extraction: Dataset, Baselines and Review

Authors: Ori Shapira, Ramakanth Pasunuru, Ido Dagan, Yael Amsterdamer

Abstract: Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the p… ▽ More Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task. △ Less

Submitted 1 July, 2022; v1 submitted 3 October, 2021; originally announced October 2021.
arXiv:2109.12655 [pdf, other]

cs.CL

QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

Authors: Daniela Brook Weiss, Paul Roit, Ayal Klein, Ori Ernst, Ido Dagan

Abstract: Multi-text applications, such as multi-document summarization, are typically required to model redundancies across related texts. Current methods confronting consolidation struggle to fuse overlapping information. In order to explicitly represent content overlap, we propose to align predicate-argument relations across texts, providing a potential scaffold for information consolidation. We go beyon… ▽ More Multi-text applications, such as multi-document summarization, are typically required to model redundancies across related texts. Current methods confronting consolidation struggle to fuse overlapping information. In order to explicitly represent content overlap, we propose to align predicate-argument relations across texts, providing a potential scaffold for information consolidation. We go beyond clustering coreferring mentions, and instead model overlap with respect to redundancy at a propositional level, rather than merely detecting shared referents. Our setting exploits QA-SRL, utilizing question-answer pairs to capture predicate-argument relations, facilitating laymen annotation of cross-text alignments. We employ crowd-workers for constructing a dataset of QA-based alignments, and present a baseline QA alignment model trained over our dataset. Analyses show that our new task is semantically challenging, capturing content overlap beyond lexical similarity and complements cross-document coreference with proposition-level links, offering potential use for downstream tasks. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: Accepted to EMNLP 2021, Main Conference
arXiv:2109.11621 [pdf, other]

cs.CL

iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

Authors: Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan

Abstract: We introduce iFacetSum, a web application for exploring topical document sets. iFacetSum integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user's selections. This approach offers both a comprehensive overview as well as concise details regarding subtopics of choice. Fine-grained facets are aut… ▽ More We introduce iFacetSum, a web application for exploring topical document sets. iFacetSum integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user's selections. This approach offers both a comprehensive overview as well as concise details regarding subtopics of choice. Fine-grained facets are automatically produced based on cross-document coreference pipelines, rendering generic concepts, entities and statements surfacing in the source texts. We analyze the effectiveness of our application through small-scale user studies, which suggest the usefulness of our approach. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: Proceedings of EMNLP 2021, System Demonstrations. 7 pages and an appendix
arXiv:2109.04832 [pdf, other]

cs.CL

Asking It All: Generating Contextualized Questions for any Semantic Role

Authors: Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

Abstract: Asking questions about a situation is an inherent step towards understanding it. To this end, we introduce the task of role question generation, which, given a predicate mention and a passage, requires producing a set of questions asking about all possible semantic roles of the predicate. We develop a two-stage model for this task, which first produces a context-independent question prototype for… ▽ More Asking questions about a situation is an inherent step towards understanding it. To this end, we introduce the task of role question generation, which, given a predicate mention and a passage, requires producing a set of questions asking about all possible semantic roles of the predicate. We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage. Unlike most existing approaches to question generation, our approach does not require conditioning on existing answers in the text. Instead, we condition on the type of information to inquire about, regardless of whether the answer appears explicitly in the text, could be inferred from it, or should be sought elsewhere. Our evaluation demonstrates that we generate diverse and well-formed questions for a large, broad-coverage ontology of predicates and roles. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: Accepted as a long paper to EMNLP 2021, Main Conference
arXiv:2106.04192 [pdf, other]

cs.CL

Realistic Evaluation Principles for Cross-document Coreference Resolution

Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

Abstract: We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regardi… ▽ More We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regarding singleton coreference clusters, which we address by decoupling the evaluation of mention detection from that of coreference linking. Second, we argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset, forcing models to confront the lexical ambiguity challenge, as intended by the dataset creators. We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model, yielding a score which is 33 F1 lower compared to evaluating by prior lenient practices. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: *SEM 2021
arXiv:2106.02954 [pdf, other]

cs.CL cs.LG

Denoising Word Embeddings by Averaging in a Shared Space

Authors: Avi Caciularu, Ido Dagan, Jacob Goldberger

Abstract: We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vector space using an efficient implementation of the Generalized Procrustes Analysis (GPA) procedure, previously used in multilingual word translation. O… ▽ More We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vector space using an efficient implementation of the Generalized Procrustes Analysis (GPA) procedure, previously used in multilingual word translation. Our word representation demonstrates consistent improvements over the raw models as well as their simplistic average, on a range of tasks. As the new representations are more stable and reliable, there is a noticeable improvement in rare word evaluations. △ Less

Submitted 5 June, 2021; originally announced June 2021.

Comments: Accepted to *SEM 2021
arXiv:2106.01210 [pdf, other]

cs.CL

Cross-document Coreference Resolution over Predicted Mentions

Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

Abstract: Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference reso… ▽ More Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference resolution from raw text, which extends the prominent model for within-document coreference to the CD setting. Our model achieves competitive results for event and entity coreference resolution on gold mentions. More importantly, we set first baseline results, on the standard ECB+ dataset, for CD coreference resolution over predicted mentions. Further, our model is simpler and more efficient than recent CD coreference resolution systems, while not using any external resources. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Findings of ACL 2021
arXiv:2104.08809 [pdf, other]

cs.CL cs.IR cs.LG

SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

Authors: Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope

Abstract: Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding. Previous work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which seldom involve abstract technical concepts that are prevalent in science and technology. These complex concepts take diverse or ambiguous forms and have… ▽ More Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding. Previous work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which seldom involve abstract technical concepts that are prevalent in science and technology. These complex concepts take diverse or ambiguous forms and have many hierarchical levels of granularity (e.g., tasks and subtasks), posing challenges for CDCR. We present a new task of Hierarchical CDCR (H-CDCR) with the goal of jointly inferring coreference clusters and hierarchy between them. We create SciCo, an expert-annotated dataset for H-CDCR in scientific papers, 3X larger than the prominent ECB+ resource. We study strong baseline models that we customize for H-CDCR, and highlight challenges for future work. △ Less

Submitted 1 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to AKBC 2021. Data and code available at https://scico.apps.allenai.org/
arXiv:2104.08481 [pdf, other]

cs.CL

Revisiting Few-shot Relation Classification: Evaluation Data and Classification Schemes

Authors: Ofer Sabo, Yanai Elazar, Yoav Goldberg, Ido Dagan

Abstract: We explore Few-Shot Learning (FSL) for Relation Classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, aka NOTA), we first revisit the recent popular dataset structure for FSL, pointing out its unrealistic data distribution. To remedy this, we propose a novel methodology for deriving more realist… ▽ More We explore Few-Shot Learning (FSL) for Relation Classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, aka NOTA), we first revisit the recent popular dataset structure for FSL, pointing out its unrealistic data distribution. To remedy this, we propose a novel methodology for deriving more realistic few-shot test data from available datasets for supervised RC, and apply it to the TACRED dataset. This yields a new challenging benchmark for FSL RC, on which state of the art models show poor performance. Next, we analyze classification schemes within the popular embedding-based nearest-neighbor approach for FSL, with respect to constraints they impose on the embedding space. Triggered by this analysis we propose a novel classification scheme, in which the NOTA category is represented as learned vectors, shown empirically to be an appealing option for FSL. △ Less

Submitted 17 April, 2021; originally announced April 2021.

Comments: Accepted to TACL 2021
arXiv:2104.05022 [pdf, other]

cs.CL

WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia

Authors: Alon Eirew, Arie Cattan, Ido Dagan

Abstract: Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present Wikipedia Event Coreference (WEC), an efficient met… ▽ More Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present Wikipedia Event Coreference (WEC), an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia, where coreference links are not restricted within predefined topics. We apply this methodology to the English Wikipedia and extract our large-scale WEC-Eng dataset. Notably, our dataset creation method is generic and can be applied with relatively little effort to other Wikipedia languages. To set baseline results, we develop an algorithm that adapts components of state-of-the-art models for within-document coreference resolution to the cross-document setting. Our model is suitably efficient and outperforms previously published state-of-the-art results for the task. △ Less

Submitted 30 April, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

Comments: NAACL 2021
arXiv:2101.12637 [pdf, other]

cs.CL

CD2CR: Co-reference Resolution Across Documents and Domains

Authors: James Ravenscroft, Arie Cattan, Amanda Clare, Ido Dagan, Maria Liakata

Abstract: Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case… ▽ More Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case we focus on in this paper is the resolution of entities mentioned across scientific work and newspaper articles that discuss them. Identifying the same entities and corresponding concepts in both scientific articles and news can help scientists understand how their work is represented in mainstream media. We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CD$^2$CR). The task aims to identify links between entities across heterogeneous document types. We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CD$^2$CR. Our data set, annotation tool and guidelines as well as our model for cross-document cross-domain co-reference are all supplied as open access open source resources. △ Less

Submitted 29 January, 2021; originally announced January 2021.

Comments: 9 pages, 5 figures, accepted at EACL 2021

ACM Class: I.2.7
arXiv:2101.00406 [pdf, other]

cs.CL

CDLM: Cross-Document Language Modeling

Authors: Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

Abstract: We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by… ▽ More We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by introducing dynamic global attention that has access to the entire input to predict masked tokens. We release CDLM (Cross-Document Language Model), a new general language model for multi-document setting that can be easily applied to downstream tasks. Our extensive analysis shows that both ideas are essential for the success of CDLM, and work in synergy to set new state-of-the-art results for several multi-text tasks. Code and models are available at https://github.com/aviclu/CDLM. △ Less

Submitted 2 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

Comments: EMNLP 2021, findings
arXiv:2010.02815 [pdf, other]

cs.CL

QADiscourse -- Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

Authors: Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

Abstract: Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding. However, annotating discourse relations typically requires expert annotators. Recently, different semantic aspects of a sentence have been represented and crowd-sourced via question-and-answer (QA) pairs. This paper proposes a novel repr… ▽ More Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding. However, annotating discourse relations typically requires expert annotators. Recently, different semantic aspects of a sentence have been represented and crowd-sourced via question-and-answer (QA) pairs. This paper proposes a novel representation of discourse relations as QA pairs, which in turn allows us to crowd-source wide-coverage data annotated with discourse relations, via an intuitively appealing interface for composing such questions and answers. Based on our proposed representation, we collect a novel and wide-coverage QADiscourse dataset, and present baseline algorithms for predicting QADiscourse relations. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: To appear at EMNLP 2020
arXiv:2010.02588 [pdf, other]

cs.CL

CoRefi: A Crowd Sourcing Suite for Coreference Annotation

Authors: Aaron Bornstein, Arie Cattan, Ido Dagan

Abstract: Coreference annotation is an important, yet expensive and time consuming, task, which often involved expert annotators trained on complex decision guidelines. To enable cheaper and more efficient annotation, we present CoRefi, a web-based coreference annotation suite, oriented for crowdsourcing. Beyond the core coreference annotation tool, CoRefi provides guided onboarding for the task as well as… ▽ More Coreference annotation is an important, yet expensive and time consuming, task, which often involved expert annotators trained on complex decision guidelines. To enable cheaper and more efficient annotation, we present CoRefi, a web-based coreference annotation suite, oriented for crowdsourcing. Beyond the core coreference annotation tool, CoRefi provides guided onboarding for the task as well as a novel algorithm for a reviewing phase. CoRefi is open source and directly embeds into any website, including popular crowdsourcing platforms. CoRefi Demo: aka.ms/corefi Video Tour: aka.ms/corefivideo Github Repo: https://github.com/aribornstein/corefi △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: EMNLP 2020 system demonstration paper
arXiv:2009.11032 [pdf, other]

cs.CL

Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, dis… ▽ More Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, disregards singleton prediction, and addresses typical targeted settings in CD coreference resolution. Aiming to set baseline results for future research that would follow our evaluation methodology, we build the first end-to-end model for this task. Our model adapts and extends recent neural models for within-document coreference resolution to address the CD coreference setting, which outperforms state-of-the-art results by a significant margin. △ Less

Submitted 23 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.
arXiv:2009.08380 [pdf, other]

cs.CL

Evaluating Interactive Summarization: an Expansion-Based Framework

Authors: Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Abstract: Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for expansion-based interactive summarization, which considers… ▽ More Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for expansion-based interactive summarization, which considers the accumulating information along an interactive session. Our framework includes a procedure of collecting real user sessions and evaluation measures relying on standards, but adapted to reflect interaction. All of our solutions are intended to be released publicly as a benchmark, allowing comparison of future developments in interactive summarization. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis of these systems motivate our design choices and support the viability of our framework. △ Less

Submitted 17 September, 2020; originally announced September 2020.
arXiv:2009.00590 [pdf, other]

cs.CL

Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

Authors: Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob Goldberger, Mohit Bansal, Ido Dagan

Abstract: Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we… ▽ More Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach. △ Less

Submitted 22 September, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

Comments: CoNLL 2021
arXiv:2004.14979 [pdf, other]

cs.CL

Paraphrasing vs Coreferring: Two Sides of the Same Coin

Authors: Yehudit Meged, Avi Caciularu, Vered Shwartz, Ido Dagan

Abstract: We study the potential synergy between two different NLP tasks, both confronting predicate lexical variability: identifying predicate paraphrases, and event coreference resolution. First, we used annotations from an event coreference dataset as distant supervision to re-score heuristically-extracted predicate paraphrases. The new scoring gained more than 18 points in average precision upon their r… ▽ More We study the potential synergy between two different NLP tasks, both confronting predicate lexical variability: identifying predicate paraphrases, and event coreference resolution. First, we used annotations from an event coreference dataset as distant supervision to re-score heuristically-extracted predicate paraphrases. The new scoring gained more than 18 points in average precision upon their ranking by the original scoring method. Then, we used the same re-ranking features as additional inputs to a state-of-the-art event coreference resolution model, which yielded modest but consistent improvements to the model's performance. The results suggest a promising direction to leverage data and models for each of the tasks to the benefit of the other. △ Less

Submitted 9 October, 2020; v1 submitted 30 April, 2020; originally announced April 2020.
arXiv:1911.03243 [pdf, ps, other]

cs.CL

Controlled Crowdsourcing for High-Quality QA-SRL Annotation

Authors: Paul Roit, Ayal Klein, Daniela Stepanov, Jonathan Mamou, Julian Michael, Gabriel Stanovsky, Luke Zettlemoyer, Ido Dagan

Abstract: Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen. Recently, a large-scale crowdsourced QA-SRL corpus and a trained parser were released. Trying to replicate the QA-SRL annotation for new texts, we found that the resulting annotations were lacking in quality, particularly in coverage, making them… ▽ More Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen. Recently, a large-scale crowdsourced QA-SRL corpus and a trained parser were released. Trying to replicate the QA-SRL annotation for new texts, we found that the resulting annotations were lacking in quality, particularly in coverage, making them insufficient for further research and evaluation. In this paper, we present an improved crowdsourcing protocol for complex semantic annotation, involving worker selection and training, and a data consolidation phase. Applying this protocol to QA-SRL yielded high-quality annotation with drastically higher coverage, producing a new gold evaluation dataset. We believe that our annotation protocol and gold standard will facilitate future replicable research of natural semantic annotations. △ Less

Submitted 13 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.
arXiv:1910.09302 [pdf, other]

cs.CL

Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets

Authors: Ohad Rozen, Vered Shwartz, Roee Aharoni, Ido Dagan

Abstract: Phenomenon-specific "adversarial" datasets have been recently designed to perform targeted stress-tests for particular inference types. Recent work (Liu et al., 2019a) proposed that such datasets can be utilized for training NLI and other types of models, often allowing to learn the phenomenon in focus and improve on the challenge dataset, indicating a "blind spot" in the original training data. Y… ▽ More Phenomenon-specific "adversarial" datasets have been recently designed to perform targeted stress-tests for particular inference types. Recent work (Liu et al., 2019a) proposed that such datasets can be utilized for training NLI and other types of models, often allowing to learn the phenomenon in focus and improve on the challenge dataset, indicating a "blind spot" in the original training data. Yet, although a model can improve in such a training process, it might still be vulnerable to other challenge datasets targeting the same phenomenon but drawn from a different distribution, such as having a different syntactic complexity level. In this work, we extend this method to drive conclusions about a model's ability to learn and generalize a target phenomenon rather than to "learn" a dataset, by controlling additional aspects in the adversarial datasets. We demonstrate our approach on two inference phenomena - dative alternation and numerical reasoning, elaborating, and in some cases contradicting, the results of Liu et al.. Our methodology enables building better challenge datasets for creating more robust models, and may yield better model understanding and subsequent overarching improvements. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: CoNLL 2019
arXiv:1909.09986 [pdf, other]

cs.CL

Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation

Authors: Amit Moryossef, Ido Dagan, Yoav Goldberg

Abstract: We follow the step-by-step approach to neural data-to-text generation we proposed in Moryossef et al (2019), in which the generation process is divided into a text-planning stage followed by a plan-realization stage. We suggest four extensions to that framework: (1) we introduce a trainable neural planning component that can generate effective plans several orders of magnitude faster than the orig… ▽ More We follow the step-by-step approach to neural data-to-text generation we proposed in Moryossef et al (2019), in which the generation process is divided into a text-planning stage followed by a plan-realization stage. We suggest four extensions to that framework: (1) we introduce a trainable neural planning component that can generate effective plans several orders of magnitude faster than the original planner; (2) we incorporate typing hints that improve the model's ability to deal with unseen relations and entities; (3) we introduce a verification-by-reranking stage that substantially improves the faithfulness of the resulting texts; (4) we incorporate a simple but effective referring expression generation module. These extensions result in a generation process that is faster, more fluent, and more accurate. △ Less

Submitted 22 September, 2019; originally announced September 2019.

Comments: 5 pages, INLG-2019
arXiv:1909.05608 [pdf, other]

cs.CL cs.AI

ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Authors: Oren Pereg, Daniel Korat, Moshe Wasserblat, Jonathan Mamou, Ido Dagan

Abstract: We present ABSApp, a portable system for weakly-supervised aspect-based sentiment extraction. The system is interpretable and user friendly and does not require labeled training data, hence can be rapidly and cost-effectively used across different domains in applied setups. The system flow includes three stages: First, it generates domain-specific aspect and opinion lexicons based on an unlabeled… ▽ More We present ABSApp, a portable system for weakly-supervised aspect-based sentiment extraction. The system is interpretable and user friendly and does not require labeled training data, hence can be rapidly and cost-effectively used across different domains in applied setups. The system flow includes three stages: First, it generates domain-specific aspect and opinion lexicons based on an unlabeled dataset; second, it enables the user to view and edit those lexicons (weak supervision); and finally, it enables the user to select an unlabeled target dataset from the same domain, classify it, and generate an aspect-based sentiment report. ABSApp has been successfully used in a number of real-life use cases, among them movie review analysis and convention impact analysis. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: 6 pages, demo paper at EMNLP 2019
arXiv:1909.01214 [pdf, other]

cs.CL

Better Rewards Yield Better Summaries: Learning to Summarise Without References

Authors: Florian Böhm, Yang Gao, Christian M. Meyer, Ori Shapira, Ido Dagan, Iryna Gurevych

Abstract: Reinforcement Learning (RL) based document summarisation systems yield state-of-the-art performance in terms of ROUGE scores, because they directly use ROUGE as the rewards during training. However, summaries with high ROUGE scores often receive low human judgement. To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratin… ▽ More Reinforcement Learning (RL) based document summarisation systems yield state-of-the-art performance in terms of ROUGE scores, because they directly use ROUGE as the rewards during training. However, summaries with high ROUGE scores often receive low human judgement. To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratings on 2,500 summaries. Our reward function only takes the document and system summary as input. Hence, once trained, it can be used to train RL-based summarisation systems without using any reference summaries. We show that our learned rewards have significantly higher correlation with human ratings than previous approaches. Human evaluation experiments show that, compared to the state-of-the-art supervised-learning systems and ROUGE-as-rewards RL summarisation systems, the RL systems using our learned rewards during training generate summarieswith higher human ratings. The learned reward function and our source code are available at https://github.com/yg211/summary-reward-no-reference. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted to EMNLP2019
arXiv:1906.01753 [pdf, other]

cs.CL

Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution

Authors: Shany Barhom, Vered Shwartz, Alon Eirew, Michael Bugert, Nils Reimers, Ido Dagan

Abstract: Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task's importance, research focus was given mostly to within-document entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al (2012), we jointly model entity and event co… ▽ More Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task's importance, research focus was given mostly to within-document entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al (2012), we jointly model entity and event coreference. We represent an event (entity) mention using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures. Our model outperforms the previous state-of-the-art event coreference model on ECB+, while providing the first entity coreference results on this corpus. Our analysis confirms that all our representation elements, including the mention span itself, its context, and the relation to other mentions contribute to the model's success. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: ACL 2019
arXiv:1904.05929 [pdf, ps, other]

cs.CL

Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

Authors: Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Abstract: Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluat… ▽ More Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations. △ Less

Submitted 11 April, 2019; originally announced April 2019.

Comments: 5 pages, 2 graphs, 1 table. Published in NAACL 2019

Search v0.5.6 released 2020-02-24