Image-to-Captioned-Audio Synthesis

Project for CMSC 691 - Computer Vision (Dr. Tejas Gokhale) at UMBC

Installing it

Works on Python 3.9 with system CUDA version 12.3. Tested on RTX 4060 (8 GB VRAM) and 16 GB RAM, so those are the minimum system reqs but may work on lower.

Run setup.sh to create folders and fetch external libraries.
If I made a mistake in setup sh, just run the commands manually.
Install required packages with pip install -r requirements.txt
Download DeCap weights from DeCap_CoCo.zip.
Unzip and place inside custom_pipeline/pretrained weights/ (see gen_caption.py lines 32 and 58 for reference)

Running it.

Just run main.py.

The existing pipeline can run inference in 2-3 minutes. The custom pipeline may take up to 30 minutes for inference.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
audio_files_for_paper		audio_files_for_paper
custom_pipeline		custom_pipeline
existing_pipeline		existing_pipeline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bash example.txt		bash example.txt
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-to-Captioned-Audio Synthesis

Project for CMSC 691 - Computer Vision (Dr. Tejas Gokhale) at UMBC

Installing it

Running it.

Credits

Shadab Hafiz Choudhury

About

Releases

Packages

Languages

License

Namerlight/Image-Captioned-Audio-Synthesis

Folders and files

Latest commit

History

Repository files navigation

Image-to-Captioned-Audio Synthesis

Project for CMSC 691 - Computer Vision (Dr. Tejas Gokhale) at UMBC

Installing it

Running it.

Credits

Shadab Hafiz Choudhury

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages