Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques
This repository is derived from the NMTGMinor project at
https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca
Powered by Mediaan.com
Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:
- Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
- Direct ST: models trained only on ST data
- (Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task
The Transformer architecture is used as the baseline for the implementation.
High-level instruction to use the repo:
- Run
covost_data_preparation.py
to download and preprocess the data. - Run the shell script of interst, change the variables in the script if needed.
run_translation_pipeline.sh
for single-task models (ASR, MT, ST)cascaded_ST_evaluation.sh
evaluates cascaded ST using pretrained ASR and MT modelsrun_translation_multi_modalities_pipeline.sh
for multi-task, multi-modality models (including zero-shot)run_zeroshot_with_artificial_data.sh
for zero-shot models using data augmentationrun_bidirectional_zeroshot.sh
for zero-shot models using additional opposite training datarun_fine_tunning.sh
,run_fine_tunning_fromASR.sh
for fine-tuning models with ST data, resulting in few-shot modelsmodality_similarity_svcca.sh
,modality_similarity_classifier.sh
measure text-audio similarity in representation
See notebooks/Repo_Instruction.ipynb
for more details.
@INPROCEEDINGS{9746815,
author={Dinh, Tu Anh and Liu, Danni and Niehues, Jan},
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Tackling Data Scarcity in Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques},
year={2022},
volume={},
number={},
pages={6222-6226},
doi={10.1109/ICASSP43922.2022.9746815}}