Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

Sharma, Shikhar; Asri, Layla El; Schulz, Hannes; Zumer, Jeremie

Computer Science > Computation and Language

arXiv:1706.09799 (cs)

[Submitted on 29 Jun 2017]

Title:Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

Authors:Shikhar Sharma, Layla El Asri, Hannes Schulz, Jeremie Zumer

View PDF

Abstract:Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1706.09799 [cs.CL]
	(or arXiv:1706.09799v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1706.09799

Submission history

From: Shikhar Sharma [view email]
[v1] Thu, 29 Jun 2017 15:14:07 UTC (199 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shikhar Sharma
Layla El Asri
Hannes Schulz
Jeremie Zumer

export BibTeX citation

Computer Science > Computation and Language

Title:Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators