How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

Chen, Xuanting; Ye, Junjie; Zu, Can; Xu, Nuo; Zheng, Rui; Peng, Minlong; Zhou, Jie; Gui, Tao; Zhang, Qi; Huang, Xuanjing

Computer Science > Computation and Language

arXiv:2303.00293 (cs)

[Submitted on 1 Mar 2023]

Title:How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

Authors:Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, Xuanjing Huang

View PDF

Abstract:The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In this study, we perform a comprehensive experimental analysis of GPT-3.5, exploring its robustness using 21 datasets (about 116K test samples) with 66 text transformations from TextFlint that cover 9 popular Natural Language Understanding (NLU) tasks. Our findings indicate that while GPT-3.5 outperforms existing fine-tuned models on some tasks, it still encounters significant robustness degradation, such as its average performance dropping by up to 35.74\% and 43.59\% in natural language inference and sentiment analysis tasks, respectively. We also show that GPT-3.5 faces some specific robustness challenges, including robustness instability, prompt sensitivity, and number sensitivity. These insights are valuable for understanding its limitations and guiding future research in addressing these challenges to enhance GPT-3.5's overall performance and generalization abilities.

Subjects:	Computation and Language (cs.CL)
MSC classes:	68-06
ACM classes:	I.2
Cite as:	arXiv:2303.00293 [cs.CL]
	(or arXiv:2303.00293v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.00293

Submission history

From: Xuanting Chen [view email]
[v1] Wed, 1 Mar 2023 07:39:01 UTC (485 KB)

Computer Science > Computation and Language

Title:How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators