AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

Chen, Zhongzhi; Liu, Guang; Zhang, Bo-Wen; Ye, Fulong; Yang, Qinghong; Wu, Ledell

Computer Science > Computation and Language

arXiv:2211.06679v2 (cs)

[Submitted on 12 Nov 2022 (v1), last revised 21 Nov 2022 (this version, v2)]

Title:AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

Authors:Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu

View PDF

Abstract:In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model. Starting from the pre-trained multimodal representation model CLIP released by OpenAI, we altered its text encoder with a pre-trained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k-CN, COCO-CN and XTD. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding. Our models and code are available at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2211.06679 [cs.CL]
	(or arXiv:2211.06679v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2211.06679

Submission history

From: Guang Liu [view email]
[v1] Sat, 12 Nov 2022 14:48:55 UTC (3,685 KB)
[v2] Mon, 21 Nov 2022 15:39:52 UTC (45,590 KB)

Computer Science > Computation and Language

Title:AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators