Merge pull request #71 from jmisilo/70-docs-update

Delete useless prints and update docs.
jmisilo · Nov 24, 2022 · 2e8065d · 2e8065d
2 parents 217bd3e + e9c5459
commit 2e8065d
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 12 deletions.
diff --git a/readme.md b/readme.md
@@ -4,23 +4,23 @@
 
 **`CLIPxGPT Captioner`** is Image Captioning Model based on [OpenAI's](https://openai.com/) [CLIP](https://openai.com/blog/clip/) and [GPT-2](https://openai.com/blog/better-language-models/). The Model uses a Mapping module to "translate" CLIP embeddings to GPT-2. The model is trained on the [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/) dataset, downloaded from [Kaggle](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset)
 
-**The goal** of the project was to find out about the possibility of CLIP + GPT-2 connection and to check whether, with a relatively short training time and a small dataset, the model will be able to recognize situations in the pictures. In the first version, the model achieved satisfactory results.
+**The goal** of the project was to find out about the possibility of CLIP + GPT-2 connection and to check whether, with a relatively short training time and a small dataset, the model will be able to recognize situations in the pictures. The model achieved satisfactory results.
 
 The Model uses prefixes as in the [ClipCap](https://arxiv.org/abs/2111.09734) paper. In my original idea, the length of the prefix was 1, but after reading publication, the length of the prefix was changed to 4, thanks to which the performance increased.
 
-The Model was trained with a frozen CLIP, a fully trained Mapping Module (6x Transformer Encoder Layers) and with partially frozen GPT-2 (the first and last 14 layers were trained).
+The Model was trained with a frozen CLIP, a fully trained Mapping Module (5-6x Transformer Encoder Layers) and with partially frozen GPT-2 (the first and last 14 layers were trained).
 
 The training process was carried out using the [Kaggle](https://www.kaggle.com/) P100 GPU.
 
 ### Model Versions
 
-> **Small** [Download](https://drive.google.com/uc?id=1p91KBj-oUmuMfG2Gc33tEN5Js5HpV8YH)
+> **Small** - [Download](https://drive.google.com/uc?id=1p91KBj-oUmuMfG2Gc33tEN5Js5HpV8YH)
 > * Text Model - GPT-2 Small - 124M parameters
 > * Mapping Module - 6x Transformer Encoder Layers
 > * CLIP Base - Patch 32 model 
 > * 256M Parameters
 
-> **Large** [Download](https://drive.google.com/uc?id=12h-NgryAf6zZdA1KclHdfzU35D1icjEp)
+> **Large** - [Download](https://drive.google.com/uc?id=12h-NgryAf6zZdA1KclHdfzU35D1icjEp)
 > * Text Model - GPT-2 Medium - 355M parameters
 > * Mapping Module - 5x Transformer Encoder Layers
 > * CLIP Large - Patch 14 model
@@ -57,7 +57,7 @@ pip install -r requirements.txt
 And run prediction:
 
 ```bash
-python .\src\predict.py -I <image_path> -S <model_size [S/L]>
+python .\src\predict.py -I <image_path> -S <model_size [S/L]> -C <checkpoint_name>
 ```
 
 ### References:

diff --git a/requirements.txt b/requirements.txt
@@ -5,4 +5,5 @@ pandas==1.5.0
 Pillow==9.3.0
 torch==1.13.0+cu117
 tqdm==4.64.1
-transformers==4.22.1
+transformers==4.22.1
+wandb==0.13.4
diff --git a/src/model/model.py b/src/model/model.py
@@ -263,12 +263,6 @@ def train_forward(self, img_emb, trg_cap, att_mask):
  )
  print(l)
 
- # total number ot parameters
- print(sum(p.numel() for p in m.parameters() if p.requires_grad))
-
- # number of trainable parameters
- print(sum(p.numel() for p in m.parameters() if p.requires_grad and p.grad is not None))
-
  # number of parameters
  print(f'Total number of parameters: {sum(p.numel() for p in m.parameters())}')
  print(f'Number of trainable parameters: {sum(p.numel() for p in m.parameters() if p.requires_grad)}')