A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

Ye, Junjie; Chen, Xuanting; Xu, Nuo; Zu, Can; Shao, Zekai; Liu, Shichun; Cui, Yuhan; Zhou, Zeyang; Gong, Chao; Shen, Yang; Zhou, Jie; Chen, Siming; Gui, Tao; Zhang, Qi; Huang, Xuanjing

Computer Science > Computation and Language

arXiv:2303.10420v1 (cs)

[Submitted on 18 Mar 2023 (this version), latest version 23 Dec 2023 (v2)]

Title:A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

Authors:Junjie Ye, Xuanting Chen, Nuo Xu, Can Zu, Zekai Shao, Shichun Liu, Yuhan Cui, Zeyang Zhou, Chao Gong, Yang Shen, Jie Zhou, Siming Chen, Tao Gui, Qi Zhang, Xuanjing Huang

View PDF

Abstract:GPT series models, such as GPT-3, CodeX, InstructGPT, ChatGPT, and so on, have gained considerable attention due to their exceptional natural language processing capabilities. However, despite the abundance of research on the difference in capabilities between GPT series models and fine-tuned models, there has been limited attention given to the evolution of GPT series models' capabilities over time. To conduct a comprehensive analysis of the capabilities of GPT series models, we select six representative models, comprising two GPT-3 series models (i.e., davinci and text-davinci-001) and four GPT-3.5 series models (i.e., code-davinci-002, text-davinci-002, text-davinci-003, and gpt-3.5-turbo). We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets. In particular, we compare the performance and robustness of different models for each task under zero-shot and few-shot scenarios. Our extensive experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve, especially with the introduction of the RLHF training strategy. While this strategy enhances the models' ability to generate human-like responses, it also compromises their ability to solve some tasks. Furthermore, our findings indicate that there is still room for improvement in areas such as model robustness.

Subjects:	Computation and Language (cs.CL)
MSC classes:	68-06
ACM classes:	I.2
Cite as:	arXiv:2303.10420 [cs.CL]
	(or arXiv:2303.10420v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.10420

Submission history

From: Junjie Ye [view email]
[v1] Sat, 18 Mar 2023 14:02:04 UTC (879 KB)
[v2] Sat, 23 Dec 2023 12:53:02 UTC (785 KB)

Computer Science > Computation and Language

Title:A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators