VGGIshIsh

This is an implementation of the VGGIshIsh model, proposed in Taming Visually Guided Audio Generation by Vladimir Iashin and Esa Rahtu. The code is following closely the official implementation SpecVQGAN. In this repo, the model is used to classify between hit an scratch sounds from the Greatest Hit dataset.

Installation

Install the conda environment from the environment.yml file:

conda env create -f environment.yml

Data and Preprocessing

To download data go to the official website of the Greates Hit-dataset https://andrewowens.com/vis/.https://andrewowens.com/vis/.

After downloading the data, preprocess it with the provided wav_to_melspec.py script:

python wav_to_melspec.py --data_path=path/to/data --save_path=path/to/save

Usage

To train the model, run:

python train.py config=configs/vggishish.yaml

To test the model, run:

python test.py config=configs/vggishish.yaml ckpt_path=path/to/ckpt/file.pt

Results

To view training results in TensorBoard, run:

tensorboard --logdir=logs

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
.gitignore		.gitignore
README.md		README.md
datamodule.py		datamodule.py
dataset.py		dataset.py
environment.yaml		environment.yaml
test.py		test.py
train.py		train.py
utils.py		utils.py
vggishish.py		vggishish.py
vggishishmodule.py		vggishishmodule.py
wav_to_melspec.py		wav_to_melspec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGGIshIsh

Installation

Data and Preprocessing

Usage

Results

About

Releases

Packages

Languages

ilpoviertola/vggishish

Folders and files

Latest commit

History

Repository files navigation

VGGIshIsh

Installation

Data and Preprocessing

Usage

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages