first

TXH-mercury · May 29, 2023 · 5772fdc · 5772fdc
commit 5772fdc
Show file tree

Hide file tree

Showing 553 changed files with 172,521 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,110 @@
+# ctags
+tags
+apex/build
+cococaption/pycocoevalcap/spice
+output/
+datasets/
+upload/
+__pycache__
+pretrained_weights
+__pycache__/
+*.py[cod]
+*$py.class
+.attach_pid*
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Sihan Chen
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,104 @@
+# COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
+This is the official repository of COSA which provide training and testing code, as well as pretraining checkpoints. 
+
+
+## Building Environment
+COSA is implemented based on Pytorch. We use pytorch-1.9.0 and cuda-11.1. Other version could be also compatible.
+
+```
+pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
+```
+
+- apex is needed. 
+```
+cd apex
+pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
+```
+- setup packages.
+
+## Download Checkpoints
+- [pretrained_weights](https://drive.google.com/file/d/1n6cDOhwEyrba3fz0ftLeFkBXe98fcCr7/view?usp=share_link) (BERT,CLIP,Swin).
+
+Put pretrained_weights dir under main path. (COSA/pretrained_weights)
+
+- [COSA-base-swin-5m](https://drive.google.com/file/d/1jaKFGbVE-BW3x5JUjRHbRqhVaXIy8q8s/view?usp=sharing).
+- [COSA-base-swin-17m](https://drive.google.com/file/d/15LACWjLKD_Y7DCdvNRhqdc5MnwUBmcT7/view?usp=sharing).
+- [COSA-large-clip-417m](https://drive.google.com/file/d/114taD5SwhQ5NQdEtIRDh-HDdybsfP0EU/view?usp=sharing).
+
+Put them under the output dir. (COSA/output/COSA-base-swin-5m)
+
+## Prepare Datasets
+COSA is pretrained and finetuned on multiple vision-language datasets. 
+e.g. PRETRAIN: CC3M, WebVid-2.5M, CC4M, CC12M, LAION...
+FiNETUNE: MSRVTT, MSVD, DiDeMo, LSMDC, ActivityNet, VATEX, YouCook2, TVC, TGIF, MSCOCO, Flickr30K, VQAV2...
+
+The processed datasets folder is available at [here](https://drive.google.com/file/d/1fnd2rNFzgI7Pi-u3N0e5Z3zZjXP48988/view?usp=sharing), please download it and put it under the main directory (COSA/datasets). With regards to the vision part, you need to download raw images or videos of those datasets and extract frames (by default for fast traing, you can also use tools such as decord or av for online processing, and skip the this step but need to modify the dataset code).
+
+using utils/extract_frame_and_wav_multiprocess.py to extract frames.
+
+
+
+- [Download all (pretrained_weight, all ckpts and datasets)](https://drive.google.com/drive/folders/1pNdr1D4S4cQ3-VKzcl3rkk_5232bMvgS?usp=share_link)
+
+## Finetune Model
+- finetune retrieval tasks
+```
+sh scripts/finetune_ret.sh $pretrain_path(output/COSA-base-swin-5m)
+```
+- finetune captioning tasks
+```
+sh scripts/finetune_cap.sh $pretrain_path(output/COSA-base-swin-5m)
+```
+- finetune QA tasks
+```
+sh scripts/finetune_qa.sh $pretrain_path(output/COSA-base-swin-5m)
+```
+The finetuning output path will be the subdir of $pretrain_path
+
+## Test Model
+For example, the cmd for finetuning retrieval model in scripts/finetune_ret.sh is as follows:
+
+```
+python3 -m torch.distributed.launch \
+--nnodes 1 \
+--node_rank 0 \
+--nproc_per_node 8 \
+--master_port 9834 \
+./train.py \
+--train_video_sample_num 8 \
+--test_video_sample_num 16 \
+--learning_rate 2e-5 \
+--config ./config/retrieval-msrvtt.json \
+--pretrain_dir $output_dir \
+--save_best true \
+--checkpointing true \
+--output_dir $output_dir/retrieval-msrvtt \
+```
+
+if you want to test model, just add following two rows to the cmd:
+```
+--zero_shot \
+--checkpoint $checkpoint_save_path(.pt)
+```
+## Pretrain Model
+```
+sh scripts/pretrain_base_swin_5m.sh
+```
+
+
+<!-- 
+## Citation
+
+If you find this code useful for your research, please consider citing:
+```
+@inproceedings{chen2020opt,
+ title={OPT: Universal image-text representation learning},
+ author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
+ booktitle={ECCV},
+ year={2020}
+}
+```
+
+## License
+
+MIT -->
diff --git a/apex/LICENSE b/apex/LICENSE
@@ -0,0 +1,11 @@
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/apex/README.md b/apex/README.md
@@ -0,0 +1,143 @@
+# Introduction
+
+This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch.
+Some of the code here will be included in upstream Pytorch eventually.
+The intent of Apex is to make up-to-date utilities available to users as quickly as possible.
+
+## Full API Documentation: [https://nvidia.github.io/apex](https://nvidia.github.io/apex)
+
+## [GTC 2019](https://github.com/mcarilli/mixed_precision_references/tree/master/GTC_2019) and [Pytorch DevCon 2019](https://github.com/mcarilli/mixed_precision_references/tree/master/Pytorch_Devcon_2019) Slides
+
+# Contents
+
+## 1. Amp: Automatic Mixed Precision
+
+`apex.amp` is a tool to enable mixed precision training by changing only 3 lines of your script.
+Users can easily experiment with different pure and mixed precision training modes by supplying
+different flags to `amp.initialize`.
+
+[Webinar introducing Amp](https://info.nvidia.com/webinar-mixed-precision-with-pytorch-reg-page.html)
+(The flag `cast_batchnorm` has been renamed to `keep_batchnorm_fp32`).
+
+[API Documentation](https://nvidia.github.io/apex/amp.html)
+
+[Comprehensive Imagenet example](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
+
+[DCGAN example coming soon...](https://github.com/NVIDIA/apex/tree/master/examples/dcgan)
+
+[Moving to the new Amp API](https://nvidia.github.io/apex/amp.html#transition-guide-for-old-api-users) (for users of the deprecated "Amp" and "FP16_Optimizer" APIs)
+
+## 2. Distributed Training
+
+`apex.parallel.DistributedDataParallel` is a module wrapper, similar to
+`torch.nn.parallel.DistributedDataParallel`. It enables convenient multiprocess distributed training,
+optimized for NVIDIA's NCCL communication library.
+
+[API Documentation](https://nvidia.github.io/apex/parallel.html)
+
+[Python Source](https://github.com/NVIDIA/apex/tree/master/apex/parallel)
+
+[Example/Walkthrough](https://github.com/NVIDIA/apex/tree/master/examples/simple/distributed)
+
+The [Imagenet example](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
+shows use of `apex.parallel.DistributedDataParallel` along with `apex.amp`.
+
+### Synchronized Batch Normalization
+
+`apex.parallel.SyncBatchNorm` extends `torch.nn.modules.batchnorm._BatchNorm` to
+support synchronized BN.
+It allreduces stats across processes during multiprocess (DistributedDataParallel) training.
+Synchronous BN has been used in cases where only a small
+local minibatch can fit on each GPU.
+Allreduced stats increase the effective batch size for the BN layer to the
+global batch size across all processes (which, technically, is the correct
+formulation).
+Synchronous BN has been observed to improve converged accuracy in some of our research models.
+
+### Checkpointing
+
+To properly save and load your `amp` training, we introduce the `amp.state_dict()`, which contains all `loss_scalers` and their corresponding unskipped steps,
+as well as `amp.load_state_dict()` to restore these attributes.
+
+In order to get bitwise accuracy, we recommend the following workflow:
+```python
+# Initialization
+opt_level = 'O1'
+model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
+
+# Train your model
+...
+with amp.scale_loss(loss, optimizer) as scaled_loss:
+ scaled_loss.backward()
+...
+
+# Save checkpoint
+checkpoint = {
+ 'model': model.state_dict(),
+ 'optimizer': optimizer.state_dict(),
+ 'amp': amp.state_dict()
+}
+torch.save(checkpoint, 'amp_checkpoint.pt')
+...
+
+# Restore
+model = ...
+optimizer = ...
+checkpoint = torch.load('amp_checkpoint.pt')
+
+model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
+model.load_state_dict(checkpoint['model'])
+optimizer.load_state_dict(checkpoint['optimizer'])
+amp.load_state_dict(checkpoint['amp'])
+
+# Continue training
+...
+```
+
+Note that we recommend restoring the model using the same `opt_level`. Also note that we recommend calling the `load_state_dict` methods after `amp.initialize`.
+
+# Installation
+
+## Containers
+NVIDIA PyTorch Containers are available on NGC: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch.
+The containers come with all the custom extensions available at the moment. 
+
+See [the NGC documentation](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html) for details such as:
+- how to pull a container
+- how to run a pulled container
+- release notes
+
+## From Source
+
+To install Apex from source, we recommend using the nightly Pytorch obtainable from https://github.com/pytorch/pytorch.
+
+The latest stable release obtainable from https://pytorch.org should also work.
+
+### Linux
+For performance and full functionality, we recommend installing Apex with
+CUDA and C++ extensions via
+```bash
+git clone https://github.com/NVIDIA/apex
+cd apex
+pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
+```
+
+Apex also supports a Python-only build via
+```bash
+pip install -v --disable-pip-version-check --no-cache-dir ./
+```
+A Python-only build omits:
+- Fused kernels required to use `apex.optimizers.FusedAdam`.
+- Fused kernels required to use `apex.normalization.FusedLayerNorm` and `apex.normalization.FusedRMSNorm`.
+- Fused kernels that improve the performance and numerical stability of `apex.parallel.SyncBatchNorm`.
+- Fused kernels that improve the performance of `apex.parallel.DistributedDataParallel` and `apex.amp`.
+`DistributedDataParallel`, `amp`, and `SyncBatchNorm` will still be usable, but they may be slower.
+
+Pyprof support has been moved to its own [dedicated repository](https://github.com/NVIDIA/PyProf).
+Pyprof is deprecated in Apex and the pyprof directory will be removed by the end of June 2022.
+
+
+### [Experimental] Windows
+`pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .` may work if you were able to build Pytorch from source
+on your system. A Python-only build via `pip install -v --no-cache-dir .` is more likely to work. 
+If you installed Pytorch in a Conda environment, make sure to install Apex in that same environment.
diff --git a/apex/apex/RNN/README.md b/apex/apex/RNN/README.md
@@ -0,0 +1 @@
+Under construction...