Skip to main content

Showing 1–50 of 72 results for author: Dagan, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04246  [pdf, other

    cs.CL

    Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

    Authors: Paul Roit, Aviv Slobodkin, Eran Hirsch, Arie Cattan, Ayal Klein, Valentina Pyatkin, Ido Dagan

    Abstract: Detecting semantic arguments of a predicate word has been conventionally modeled as a sentence-level task. The typical reader, however, perfectly interprets predicate-argument relations in a much wider context than just the sentence where the predicate was evoked. In this work, we reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 9 pages, ACL 2024

  2. arXiv:2407.00402  [pdf, other

    cs.CL cs.AI

    Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

    Authors: Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty

    Abstract: Improvements in language models' capabilities have pushed their applications towards longer contexts, making long-context evaluation and development an active research area. However, many disparate use-cases are grouped together under the umbrella term of "long-context", defined simply by the total length of the model's input, including - for example - Needle-in-a-Haystack tasks, book summarizatio… ▽ More

    Submitted 11 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.14314  [pdf, other

    cs.CL cs.AI

    Identifying User Goals from UI Trajectories

    Authors: Omri Berkovitch, Sapir Caduri, Noam Kahlon, Anatoly Efros, Avi Caciularu, Ido Dagan

    Abstract: Autonomous agents that interact with graphical user interfaces (GUIs) hold significant potential for enhancing user experiences. To further improve these experiences, agents need to be personalized and proactive. By effectively comprehending user intentions through their actions and interactions with GUIs, agents will be better positioned to achieve these goals. This paper introduces the task of g… ▽ More

    Submitted 30 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.00842  [pdf, other

    cs.CL

    The Power of Summary-Source Alignments

    Authors: Ori Ernst, Ori Shapira, Aviv Slobodkin, Sharon Adar, Mohit Bansal, Jacob Goldberger, Ran Levy, Ido Dagan

    Abstract: Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection, followed by text generation. In this context, alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data for some of the component tasks. Yet, this enabling alignment step has usually been applied he… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  5. arXiv:2405.20967  [pdf, other

    cs.CL

    Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

    Authors: Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty

    Abstract: Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set. As such, superlatives provide an ideal phenomenon for studying implicit phenomena and discourse restrictions. While this comparison set is often not explicitly defined, its (implicit) restrictions can be… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 11 pages

  6. arXiv:2405.12081  [pdf, other

    cs.CL

    Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model

    Authors: Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan

    Abstract: To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data),… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 18 pages, 4 figures

  7. arXiv:2405.01121  [pdf, other

    cs.CL cs.AI

    Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts

    Authors: Lotem Golany, Filippo Galgani, Maya Mamo, Nimrod Parasol, Omer Vandsburger, Nadav Bar, Ido Dagan

    Abstract: Automating data generation with Large Language Models (LLMs) has become increasingly popular. In this work, we investigate the feasibility and effectiveness of LLM-based data generation in the challenging setting of source-grounded information-seeking dialogs, with response attribution, over long documents. Our source texts consist of long and noisy meeting transcripts, adding to the task complexi… ▽ More

    Submitted 21 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2403.17104  [pdf, other

    cs.CL

    Attribute First, then Generate: Locally-attributable Grounded Text Generation

    Authors: Aviv Slobodkin, Eran Hirsch, Arie Cattan, Tal Schuster, Ido Dagan

    Abstract: Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attri… ▽ More

    Submitted 4 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: ACL 2024

  9. arXiv:2403.15351  [pdf, other

    cs.CL

    Multi-Review Fusion-in-Context

    Authors: Aviv Slobodkin, Ori Shapira, Ran Levy, Ido Dagan

    Abstract: Grounded text generation, encompassing tasks such as long-form question-answering and summarization, necessitates both content selection and content consolidation. Current end-to-end methods are difficult to control and interpret due to their opaqueness. Accordingly, recent works have proposed a modular approach, with separate components for each step. Specifically, we focus on the second subtask,… ▽ More

    Submitted 31 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: NAACL 2024, findings

  10. arXiv:2312.04440  [pdf, other

    cs.CL

    OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization

    Authors: Shmuel Amar, Liat Schiff, Ori Ernst, Asi Shefer, Ori Shapira, Ido Dagan

    Abstract: The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated o… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023

  11. arXiv:2311.11301  [pdf, other

    cs.CL

    CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

    Authors: Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

    Abstract: Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both cl… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  12. arXiv:2310.13682  [pdf, other

    cs.CL cs.AI cs.LG

    Optimizing Retrieval-augmented Reader Models via Token Elimination

    Authors: Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat

    Abstract: Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribu… ▽ More

    Submitted 5 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main Conference

  13. arXiv:2310.11877  [pdf, other

    cs.CL

    The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

    Authors: Aviv Slobodkin, Omer Goldman, Avi Caciularu, Ido Dagan, Shauli Ravfogel

    Abstract: Large language models (LLMs) have been shown to possess impressive capabilities, while also raising crucial concerns about the faithfulness of their responses. A primary issue arising in this context is the management of (un)answerable queries by LLMs, which often results in hallucinatory behavior due to overconfidence. In this paper, we explore the behavior of LLMs when presented with (un)answera… ▽ More

    Submitted 12 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  14. arXiv:2310.09017  [pdf, other

    cs.CL

    Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans

    Authors: Aviv Slobodkin, Avi Caciularu, Eran Hirsch, Ido Dagan

    Abstract: The recently introduced Controlled Text Reduction (CTR) task isolates the text generation step within typical summarization-style tasks. It does so by challenging models to generate coherent text conforming to pre-selected content within the input text (``highlights''). This framing enables increased modularity in summarization-like tasks, allowing to couple a single CTR model with various content… ▽ More

    Submitted 25 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023, findings

  15. arXiv:2308.08363  [pdf, other

    cs.CL

    SummHelper: Collaborative Human-Computer Summarization

    Authors: Aviv Slobodkin, Niv Nachum, Shmuel Amar, Ori Shapira, Ido Dagan

    Abstract: Current approaches for text summarization are predominantly automatic, with rather limited space for human intervention and control over the process. In this paper, we introduce SummHelper, a 2-phase summarization assistant designed to foster human-machine collaboration. The initial phase involves content selection, where the system recommends potential content, allowing users to accept, modify, o… ▽ More

    Submitted 16 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: EMNLP 2023 System Demonstrations

  16. arXiv:2305.15605  [pdf, other

    cs.CL

    Revisiting Sentence Union Generation as a Testbed for Text Consolidation

    Authors: Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi Shefer, Ido Dagan

    Abstract: Tasks involving text generation based on multiple input texts, such as multi-document summarization, long-form question answering and contemporary dialogue applications, challenge models for their ability to properly consolidate partly-overlapping multi-text information. However, these tasks entangle the consolidation phase with the often subjective and ill-defined content selection requirement, i… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Findings of the Association for Computational Linguistics (ACL 2023)

  17. arXiv:2305.15387  [pdf, other

    cs.CL cs.AI

    Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

    Authors: Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan

    Abstract: The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systemati… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023; camera-ready version

  18. arXiv:2304.00815  [pdf, other

    cs.CL

    Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

    Authors: Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

    Abstract: Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purp… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to TACL, pre-MIT Press publication version

  19. arXiv:2210.13449  [pdf, other

    cs.CL

    Controlled Text Reduction

    Authors: Aviv Slobodkin, Paul Roit, Eran Hirsch, Ori Ernst, Ido Dagan

    Abstract: Producing a reduced version of a source text, as in generic or focused summarization, inherently involves two distinct subtasks: deciding on targeted content and generating a coherent text conveying it. While some popular approaches address summarization as a single end-to-end task, prominent works support decomposed modeling for individual subtasks. Further, semi-automated text reduction is also… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

  20. arXiv:2210.12688  [pdf, other

    cs.CL

    How "Multi" is Multi-Document Summarization?

    Authors: Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

    Abstract: The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for qua… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  21. arXiv:2210.12654  [pdf, other

    cs.CL

    Cross-document Event Coreference Search: Task, Dataset and Modeling

    Authors: Alon Eirew, Avi Caciularu, Ido Dagan

    Abstract: The task of Cross-document Coreference Resolution has been traditionally formulated as requiring to identify all coreference links across a given set of documents. We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search, focusing in this paper on event coreference. Concretely, given a mention in context of an event of interest, cons… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  22. arXiv:2205.11413  [pdf, other

    cs.CL

    QASem Parsing: Text-to-text Modeling of QA-based Semantics

    Authors: Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan

    Abstract: Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive repres… ▽ More

    Submitted 14 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

  23. arXiv:2112.08777  [pdf, other

    cs.CL cs.AI

    Long Context Question Answering via Supervised Contrastive Learning

    Authors: Avi Caciularu, Ido Dagan, Jacob Goldberger, Arman Cohan

    Abstract: Long-context question answering (QA) tasks require reasoning over a long document or multiple documents. Addressing these tasks often benefits from identifying a set of evidence spans (e.g., sentences), which provide supporting evidence for answering the question. In this work, we propose a novel method for equipping long-context QA models with an additional sequence-level objective for better ide… ▽ More

    Submitted 5 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: accepted NAACL 2022, main conference

  24. arXiv:2112.08770  [pdf, other

    cs.CL cs.LG

    Proposition-Level Clustering for Multi-Document Summarization

    Authors: Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

    Abstract: Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well as to avoid redundancy. Such prior methods focused on clustering sentences, even though closely related sentences usually contain also non-aligned parts. In this… ▽ More

    Submitted 19 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: NAACl 2022

  25. arXiv:2110.04517  [pdf, other

    cs.CL

    Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

    Authors: Daniela Brook Weiss, Paul Roit, Ori Ernst, Ido Dagan

    Abstract: NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts. For example, in multi-document summarization it is crucial to identify salient information across texts and then generate a non-redundant summary, while facing repeated and usually differently-phrased salient content. To… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

  26. arXiv:2110.01073  [pdf, other

    cs.CL

    Multi-Document Keyphrase Extraction: Dataset, Baselines and Review

    Authors: Ori Shapira, Ramakanth Pasunuru, Ido Dagan, Yael Amsterdamer

    Abstract: Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the p… ▽ More

    Submitted 1 July, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

  27. arXiv:2109.12655  [pdf, other

    cs.CL

    QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

    Authors: Daniela Brook Weiss, Paul Roit, Ayal Klein, Ori Ernst, Ido Dagan

    Abstract: Multi-text applications, such as multi-document summarization, are typically required to model redundancies across related texts. Current methods confronting consolidation struggle to fuse overlapping information. In order to explicitly represent content overlap, we propose to align predicate-argument relations across texts, providing a potential scaffold for information consolidation. We go beyon… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021, Main Conference

  28. arXiv:2109.11621  [pdf, other

    cs.CL

    iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration

    Authors: Eran Hirsch, Alon Eirew, Ori Shapira, Avi Caciularu, Arie Cattan, Ori Ernst, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Ido Dagan

    Abstract: We introduce iFacetSum, a web application for exploring topical document sets. iFacetSum integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user's selections. This approach offers both a comprehensive overview as well as concise details regarding subtopics of choice. Fine-grained facets are aut… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Proceedings of EMNLP 2021, System Demonstrations. 7 pages and an appendix

  29. arXiv:2109.04832  [pdf, other

    cs.CL

    Asking It All: Generating Contextualized Questions for any Semantic Role

    Authors: Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

    Abstract: Asking questions about a situation is an inherent step towards understanding it. To this end, we introduce the task of role question generation, which, given a predicate mention and a passage, requires producing a set of questions asking about all possible semantic roles of the predicate. We develop a two-stage model for this task, which first produces a context-independent question prototype for… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted as a long paper to EMNLP 2021, Main Conference

  30. arXiv:2106.04192  [pdf, other

    cs.CL

    Realistic Evaluation Principles for Cross-document Coreference Resolution

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regardi… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: *SEM 2021

  31. arXiv:2106.02954  [pdf, other

    cs.CL cs.LG

    Denoising Word Embeddings by Averaging in a Shared Space

    Authors: Avi Caciularu, Ido Dagan, Jacob Goldberger

    Abstract: We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vector space using an efficient implementation of the Generalized Procrustes Analysis (GPA) procedure, previously used in multilingual word translation. O… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: Accepted to *SEM 2021

  32. arXiv:2106.01210  [pdf, other

    cs.CL

    Cross-document Coreference Resolution over Predicted Mentions

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference reso… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Findings of ACL 2021

  33. arXiv:2104.08809  [pdf, other

    cs.CL cs.IR cs.LG

    SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

    Authors: Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom Hope

    Abstract: Determining coreference of concept mentions across multiple documents is a fundamental task in natural language understanding. Previous work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which seldom involve abstract technical concepts that are prevalent in science and technology. These complex concepts take diverse or ambiguous forms and have… ▽ More

    Submitted 1 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted to AKBC 2021. Data and code available at https://scico.apps.allenai.org/

  34. arXiv:2104.08481  [pdf, other

    cs.CL

    Revisiting Few-shot Relation Classification: Evaluation Data and Classification Schemes

    Authors: Ofer Sabo, Yanai Elazar, Yoav Goldberg, Ido Dagan

    Abstract: We explore Few-Shot Learning (FSL) for Relation Classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, aka NOTA), we first revisit the recent popular dataset structure for FSL, pointing out its unrealistic data distribution. To remedy this, we propose a novel methodology for deriving more realist… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: Accepted to TACL 2021

  35. arXiv:2104.05022  [pdf, other

    cs.CL

    WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia

    Authors: Alon Eirew, Arie Cattan, Ido Dagan

    Abstract: Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present Wikipedia Event Coreference (WEC), an efficient met… ▽ More

    Submitted 30 April, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  36. arXiv:2101.12637  [pdf, other

    cs.CL

    CD2CR: Co-reference Resolution Across Documents and Domains

    Authors: James Ravenscroft, Arie Cattan, Amanda Clare, Ido Dagan, Maria Liakata

    Abstract: Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

    Comments: 9 pages, 5 figures, accepted at EACL 2021

    ACM Class: I.2.7

  37. arXiv:2101.00406  [pdf, other

    cs.CL

    CDLM: Cross-Document Language Modeling

    Authors: Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

    Abstract: We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by… ▽ More

    Submitted 2 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: EMNLP 2021, findings

  38. arXiv:2010.02815  [pdf, other

    cs.CL

    QADiscourse -- Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

    Authors: Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

    Abstract: Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding. However, annotating discourse relations typically requires expert annotators. Recently, different semantic aspects of a sentence have been represented and crowd-sourced via question-and-answer (QA) pairs. This paper proposes a novel repr… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear at EMNLP 2020

  39. arXiv:2010.02588  [pdf, other

    cs.CL

    CoRefi: A Crowd Sourcing Suite for Coreference Annotation

    Authors: Aaron Bornstein, Arie Cattan, Ido Dagan

    Abstract: Coreference annotation is an important, yet expensive and time consuming, task, which often involved expert annotators trained on complex decision guidelines. To enable cheaper and more efficient annotation, we present CoRefi, a web-based coreference annotation suite, oriented for crowdsourcing. Beyond the core coreference annotation tool, CoRefi provides guided onboarding for the task as well as… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020 system demonstration paper

  40. arXiv:2009.11032  [pdf, other

    cs.CL

    Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

    Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

    Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, dis… ▽ More

    Submitted 23 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

  41. arXiv:2009.08380  [pdf, other

    cs.CL

    Evaluating Interactive Summarization: an Expansion-Based Framework

    Authors: Ori Shapira, Ramakanth Pasunuru, Hadar Ronen, Mohit Bansal, Yael Amsterdamer, Ido Dagan

    Abstract: Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for expansion-based interactive summarization, which considers… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

  42. arXiv:2009.00590  [pdf, other

    cs.CL

    Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

    Authors: Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob Goldberger, Mohit Bansal, Ido Dagan

    Abstract: Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we… ▽ More

    Submitted 22 September, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: CoNLL 2021

  43. arXiv:2004.14979  [pdf, other

    cs.CL

    Paraphrasing vs Coreferring: Two Sides of the Same Coin

    Authors: Yehudit Meged, Avi Caciularu, Vered Shwartz, Ido Dagan

    Abstract: We study the potential synergy between two different NLP tasks, both confronting predicate lexical variability: identifying predicate paraphrases, and event coreference resolution. First, we used annotations from an event coreference dataset as distant supervision to re-score heuristically-extracted predicate paraphrases. The new scoring gained more than 18 points in average precision upon their r… ▽ More

    Submitted 9 October, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

  44. arXiv:1911.03243  [pdf, ps, other

    cs.CL

    Controlled Crowdsourcing for High-Quality QA-SRL Annotation

    Authors: Paul Roit, Ayal Klein, Daniela Stepanov, Jonathan Mamou, Julian Michael, Gabriel Stanovsky, Luke Zettlemoyer, Ido Dagan

    Abstract: Question-answer driven Semantic Role Labeling (QA-SRL) was proposed as an attractive open and natural flavour of SRL, potentially attainable from laymen. Recently, a large-scale crowdsourced QA-SRL corpus and a trained parser were released. Trying to replicate the QA-SRL annotation for new texts, we found that the resulting annotations were lacking in quality, particularly in coverage, making them… ▽ More

    Submitted 13 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

  45. arXiv:1910.09302  [pdf, other

    cs.CL

    Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets

    Authors: Ohad Rozen, Vered Shwartz, Roee Aharoni, Ido Dagan

    Abstract: Phenomenon-specific "adversarial" datasets have been recently designed to perform targeted stress-tests for particular inference types. Recent work (Liu et al., 2019a) proposed that such datasets can be utilized for training NLI and other types of models, often allowing to learn the phenomenon in focus and improve on the challenge dataset, indicating a "blind spot" in the original training data. Y… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: CoNLL 2019

  46. arXiv:1909.09986  [pdf, other

    cs.CL

    Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation

    Authors: Amit Moryossef, Ido Dagan, Yoav Goldberg

    Abstract: We follow the step-by-step approach to neural data-to-text generation we proposed in Moryossef et al (2019), in which the generation process is divided into a text-planning stage followed by a plan-realization stage. We suggest four extensions to that framework: (1) we introduce a trainable neural planning component that can generate effective plans several orders of magnitude faster than the orig… ▽ More

    Submitted 22 September, 2019; originally announced September 2019.

    Comments: 5 pages, INLG-2019

  47. arXiv:1909.05608  [pdf, other

    cs.CL cs.AI

    ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

    Authors: Oren Pereg, Daniel Korat, Moshe Wasserblat, Jonathan Mamou, Ido Dagan

    Abstract: We present ABSApp, a portable system for weakly-supervised aspect-based sentiment extraction. The system is interpretable and user friendly and does not require labeled training data, hence can be rapidly and cost-effectively used across different domains in applied setups. The system flow includes three stages: First, it generates domain-specific aspect and opinion lexicons based on an unlabeled… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: 6 pages, demo paper at EMNLP 2019

  48. arXiv:1909.01214  [pdf, other

    cs.CL

    Better Rewards Yield Better Summaries: Learning to Summarise Without References

    Authors: Florian Böhm, Yang Gao, Christian M. Meyer, Ori Shapira, Ido Dagan, Iryna Gurevych

    Abstract: Reinforcement Learning (RL) based document summarisation systems yield state-of-the-art performance in terms of ROUGE scores, because they directly use ROUGE as the rewards during training. However, summaries with high ROUGE scores often receive low human judgement. To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratin… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP2019

  49. arXiv:1906.01753  [pdf, other

    cs.CL

    Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution

    Authors: Shany Barhom, Vered Shwartz, Alon Eirew, Michael Bugert, Nils Reimers, Ido Dagan

    Abstract: Recognizing coreferring events and entities across multiple texts is crucial for many NLP applications. Despite the task's importance, research focus was given mostly to within-document entity coreference, with rather little attention to the other variants. We propose a neural architecture for cross-document coreference resolution. Inspired by Lee et al (2012), we jointly model entity and event co… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  50. arXiv:1904.05929  [pdf, ps, other

    cs.CL

    Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

    Authors: Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

    Abstract: Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluat… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: 5 pages, 2 graphs, 1 table. Published in NAACL 2019