MultimodalWordDiscovery

This repository contains the code for the paper "A DNN-HMM-DNN Hybrid Model for Discovering Word-like Units from Spoken Captions and Image Regions".

How to run it

Requirement: Pytorch 0.3 for pretraining the VGG 16/Res 34 net

Download the MSCOCO 2k image features from here and the MSCOCO 2k phone sequence from here, and put them under the directory data/mscoco
Download the pretrained image classifier weights from here
Example: Run the linear softmax model with Res 34 image features on MSCOCO 2k:

python run_image2phone.py --dataset mscoco2k --feat_type res34 --model_type linear --image_posterior_weights_file classifier_weights.npz --lr 0.01

Run the following for help on more customized experiments:

python run_image2phone.py --help

Cite

Please consider citing the following paper if you use the code:

@inproceedings{WH-interspeech2020,
    author = {Liming Wang and Mark Hasegawa-Johnson},
    title = {A {DNN-HMM-DNN} Hybrid Model for Discovering Word-like Units from Spoken Captions and Image Regions},
    booktitle = {Interspeech},
    year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
clda		clda
comparison_models		comparison_models
data		data
hmm		hmm
hmm_dnn		hmm_dnn
nmt		nmt
smt		smt
tdnn		tdnn
utils		utils
vgg16		vgg16
README.md		README.md
run_audio.py		run_audio.py
run_image2audio.py		run_image2audio.py
run_image2phone.py		run_image2phone.py
run_phoneme.py		run_phoneme.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultimodalWordDiscovery

How to run it

Cite

About

Releases

Packages

Languages

lwang114/MultimodalWordDiscovery

Folders and files

Latest commit

History

Repository files navigation

MultimodalWordDiscovery

How to run it

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages