Pre-Training to Learn in Context

Gu, Yuxian; Dong, Li; Wei, Furu; Huang, Minlie

Abstract:In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at this https URL.

Comments:	ACL2023 Main Conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.09137 [cs.CL]
	(or arXiv:2305.09137v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.09137

Computer Science > Computation and Language

Title:Pre-Training to Learn in Context

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators