Skip to content

CLIP training on Korean caption & image with pretrained language and visual models.

Notifications You must be signed in to change notification settings

dongin1009/Ko-CLIP

Repository files navigation

Ko-CLIP

This repository contains code to train Korean CLIP on MS-COCO with Korean annotations in AI-HUB. Additionally, to get more Korean annotations, we use Naver Papago translator from English to Korean on VizWiz data.

Pretrained Model

The original CLIP has large-scaled dataset however ours dataset is much less than CLIP's. Due to lack Korean caption data, we use pretrained language and visual model to get representations on less dataset.

Pretrained Language Model

  • We fixed PLM as klue/roberta-large on huggingface to get more powerful text representation in Korean.

Pretrained Visual Model

  • We used PVMs as google/vit-base-patch16-224-in21k on huggingface and RN101 on torchvision to get image representations.
  • Actually, the images are not dependent in number of Korean dataset, but CLIP is trained pair of texts-images so Ko-CLIP trained limited images(which has Korean captions).

See WandB dashboard for check training records and model performance with comparing pretrained visual models.

Zero-shot classification

In zero-shot classification, we predict on CIFAR-10 and CIFAR-100 datasets.

We refer to CLIP, clip-training for train, koclip idea, and other pretrained models.

About

CLIP training on Korean caption & image with pretrained language and visual models.

Resources

Stars

Watchers

Forks

Languages