Skip to content

Commit

Permalink
merge the develop
Browse files Browse the repository at this point in the history
  • Loading branch information
Jackwaterveg committed Dec 31, 2021
2 parents c907a8d + 6272496 commit a1d8ab0
Show file tree
Hide file tree
Showing 25 changed files with 452 additions and 331 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -530,7 +530,7 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
## Acknowledgement


- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling) for years of attention, constructive advice and great help.
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
- Many thanks to [AK391](https://github.com/AK391) for TTS web demo on Huggingface Spaces using Gradio.
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
Expand Down
3 changes: 1 addition & 2 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -497,7 +497,6 @@ year={2021}
<a name="欢迎贡献"></a>
## 参与 PaddleSpeech 的开发


热烈欢迎您在[Discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) 中提交问题,并在[Issues](https://github.com/PaddlePaddle/PaddleSpeech/issues) 中指出发现的 bug。此外,我们非常希望您参与到 PaddleSpeech 的开发中!

### 贡献者
Expand Down Expand Up @@ -539,7 +538,7 @@ year={2021}

## 致谢

- 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling) 多年来的关注和建议,以及在诸多问题上的帮助。
- 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) 多年来的关注和建议,以及在诸多问题上的帮助。
- 非常感谢 [AK391](https://github.com/AK391) 在 Huggingface Spaces 上使用 Gradio 对我们的语音合成功能进行网页版演示。
- 非常感谢 [mymagicpower](https://github.com/mymagicpower) 采用PaddleSpeech 对 ASR 的[短语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk)[长语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk)进行 Java 实现。
- 非常感谢 [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) 采用 PaddleSpeech 语音合成功能实现 Virtual Uploader(VUP)/Virtual YouTuber(VTuber) 虚拟主播。
Expand Down
5 changes: 1 addition & 4 deletions examples/aishell3/voc1/conf/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,7 @@ lambda_adv: 4.0 # Loss balancing coefficient.
###########################################################
batch_size: 8 # Batch size.
batch_max_steps: 24000 # Length of each audio in batch. Make sure dividable by n_shift.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 4 # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
5 changes: 1 addition & 4 deletions examples/csmsc/voc1/conf/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,7 @@ lambda_adv: 4.0 # Loss balancing coefficient.
###########################################################
batch_size: 8 # Batch size.
batch_max_steps: 25500 # Length of each audio in batch. Make sure dividable by n_shift.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 2 # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
2 changes: 1 addition & 1 deletion examples/csmsc/voc4/conf/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ discriminator_adv_loss_params:
batch_size: 32 # Batch size.
# batch_max_steps(24000) == prod(noise_upsample_scales)(80) * prod(upsample_scales)(300, n_shift)
batch_max_steps: 24000 # Length of each audio in batch. Make sure dividable by n_shift.
num_workers: 2 # Number of workers in Pytorch DataLoader.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
2 changes: 1 addition & 1 deletion examples/csmsc/voc5/conf/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ lambda_feat_match: 2.0 # Loss balancing coefficient for feat match loss..
###########################################################
batch_size: 16 # Batch size.
batch_max_steps: 8400 # Length of each audio in batch. Make sure dividable by hop_size.
num_workers: 2 # Number of workers in Pytorch DataLoader.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
2 changes: 1 addition & 1 deletion examples/csmsc/voc5/conf/finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ lambda_feat_match: 2.0 # Loss balancing coefficient for feat match loss..
###########################################################
batch_size: 16 # Batch size.
batch_max_steps: 8400 # Length of each audio in batch. Make sure dividable by hop_size.
num_workers: 2 # Number of workers in Pytorch DataLoader.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
5 changes: 1 addition & 4 deletions examples/ljspeech/voc1/conf/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,7 @@ lambda_adv: 4.0 # Loss balancing coefficient.
###########################################################
batch_size: 8 # Batch size.
batch_max_steps: 25600 # Length of each audio in batch. Make sure dividable by n_shift.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 4 # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
16 changes: 9 additions & 7 deletions examples/ted_en_zh/st0/conf/transformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
###########################################
# Data #
###########################################
train_manifest: data/manifest.train.tiny
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev
test_manifest: data/manifest.test
min_input_len: 0.05 # second
Expand All @@ -19,8 +19,10 @@ vocab_filepath: data/lang_char/vocab.txt
unit_type: 'spm'
spm_model_prefix: data/lang_char/bpe_unigram_8000
mean_std_filepath: ""
# augmentation_config: conf/augmentation.json
batch_size: 10
augmentation_config: conf/preprocess.yaml
batch_size: 16
maxlen_in: 5 # if input length > maxlen-in, batchsize is automatically reduced
maxlen_out: 150 # if output length > maxlen-out, batchsize is automatically reduced
raw_wav: True # use raw_wav or kaldi feature
spectrum_type: fbank #linear, mfcc, fbank
feat_dim: 80
Expand Down Expand Up @@ -84,13 +86,13 @@ accum_grad: 2
global_grad_clip: 5.0
optim: adam
optim_conf:
lr: 0.004
weight_decay: 1.0e-06
scheduler: warmuplr
lr: 2.5
weight_decay: 1e-06
scheduler: noam
scheduler_conf:
warmup_steps: 25000
lr_decay: 1.0
log_interval: 5
log_interval: 50
checkpoint:
kbest_n: 50
latest_n: 5
6 changes: 4 additions & 2 deletions examples/ted_en_zh/st0/conf/transformer_mtl_noam.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@ vocab_filepath: data/lang_char/vocab.txt
unit_type: 'spm'
spm_model_prefix: data/lang_char/bpe_unigram_8000
mean_std_filepath: ""
# augmentation_config: conf/augmentation.json
batch_size: 10
augmentation_config: conf/preprocess.yaml
batch_size: 16
maxlen_in: 5 # if input length > maxlen-in, batchsize is automatically reduced
maxlen_out: 150 # if output length > maxlen-out, batchsize is automatically reduced
raw_wav: True # use raw_wav or kaldi feature
spectrum_type: fbank #linear, mfcc, fbank
feat_dim: 80
Expand Down
2 changes: 0 additions & 2 deletions examples/ted_en_zh/st0/local/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,13 @@ ckpt_prefix=$3

for type in fullsentence; do
echo "decoding ${type}"
batch_size=32
python3 -u ${BIN_DIR}/test.py \
--ngpu ${ngpu} \
--config ${config_path} \
--decode_cfg ${decode_config_path} \
--result_file ${ckpt_prefix}.${type}.rsl \
--checkpoint_path ${ckpt_prefix} \
--opts decode.decoding_method ${type} \
--opts decode.decode_batch_size ${batch_size}

if [ $? -ne 0 ]; then
echo "Failed in evaluation!"
Expand Down
2 changes: 1 addition & 1 deletion examples/ted_en_zh/st1/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@
## Transformer
| Model | Params | Config | Val loss | Char-BLEU |
| --- | --- | --- | --- | --- |
| FAT + Transformer+ASR MTL | 50.26M | conf/transformer_mtl_noam.yaml | 62.86 | 19.45 |
| FAT + Transformer+ASR MTL | 50.26M | conf/transformer_mtl_noam.yaml | 69.91 | 20.26 |
| FAT + Transformer+ASR MTL with word reward | 50.26M | conf/transformer_mtl_noam.yaml | 62.86 | 20.80 |
51 changes: 22 additions & 29 deletions examples/ted_en_zh/st1/conf/transformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,35 @@
###########################################
# Data #
###########################################
train_manifest: data/manifest.train.tiny
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev
test_manifest: data/manifest.test
min_input_len: 5.0 # frame
max_input_len: 3000.0 # frame
min_output_len: 0.0 # tokens
max_output_len: 400.0 # tokens
min_output_input_ratio: 0.01
max_output_input_ratio: 20.0

###########################################
# Dataloader #
###########################################
vocab_filepath: data/lang_char/vocab.txt
vocab_filepath: data/lang_char/ted_en_zh_bpe8000.txt
unit_type: 'spm'
spm_model_prefix: data/lang_char/bpe_unigram_8000
spm_model_prefix: data/lang_char/ted_en_zh_bpe8000
mean_std_filepath: ""
# augmentation_config: conf/augmentation.json
batch_size: 10
raw_wav: True # use raw_wav or kaldi feature
spectrum_type: fbank #linear, mfcc, fbank
batch_size: 20
feat_dim: 83
delta_delta: False
dither: 1.0
target_sample_rate: 16000
max_freq: None
n_fft: None
stride_ms: 10.0
window_ms: 25.0
use_dB_normalization: True
target_dB: -20
random_seed: 0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
maxlen_in: 512 # if input length > maxlen-in, batchsize is automatically reduced
maxlen_out: 150 # if output length > maxlen-out, batchsize is automatically reduced
minibatches: 0 # for debug
batch_count: auto
batch_bins: 0
batch_frames_in: 0
batch_frames_out: 0
batch_frames_inout: 0
augmentation_config:
num_workers: 0
subsampling_factor: 1
num_encs: 1


############################################
Expand Down Expand Up @@ -80,18 +73,18 @@ model_conf:
###########################################
# Training #
###########################################
n_epoch: 20
n_epoch: 40
accum_grad: 2
global_grad_clip: 5.0
optim: adam
optim_conf:
lr: 0.004
weight_decay: 1.0e-06
scheduler: warmuplr
lr: 2.5
weight_decay: 0.
scheduler: noam
scheduler_conf:
warmup_steps: 25000
lr_decay: 1.0
log_interval: 5
log_interval: 50
checkpoint:
kbest_n: 50
latest_n: 5
41 changes: 17 additions & 24 deletions examples/ted_en_zh/st1/conf/transformer_mtl_noam.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,6 @@
train_manifest: data/manifest.train
dev_manifest: data/manifest.dev
test_manifest: data/manifest.test
min_input_len: 5.0 # frame
max_input_len: 3000.0 # frame
min_output_len: 0.0 # tokens
max_output_len: 400.0 # tokens
min_output_input_ratio: 0.01
max_output_input_ratio: 20.0

###########################################
# Dataloader #
Expand All @@ -20,24 +14,23 @@ unit_type: 'spm'
spm_model_prefix: data/lang_char/ted_en_zh_bpe8000
mean_std_filepath: ""
# augmentation_config: conf/augmentation.json
batch_size: 10
raw_wav: True # use raw_wav or kaldi feature
spectrum_type: fbank #linear, mfcc, fbank
batch_size: 20
feat_dim: 83
delta_delta: False
dither: 1.0
target_sample_rate: 16000
max_freq: None
n_fft: None
stride_ms: 10.0
window_ms: 25.0
use_dB_normalization: True
target_dB: -20
random_seed: 0
keep_transcription_text: False
sortagrad: True
shuffle_method: batch_shuffle
num_workers: 2
sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs
maxlen_in: 512 # if input length > maxlen-in, batchsize is automatically reduced
maxlen_out: 150 # if output length > maxlen-out, batchsize is automatically reduced
minibatches: 0 # for debug
batch_count: auto
batch_bins: 0
batch_frames_in: 0
batch_frames_out: 0
batch_frames_inout: 0
augmentation_config:
num_workers: 0
subsampling_factor: 1
num_encs: 1


############################################
Expand Down Expand Up @@ -80,18 +73,18 @@ model_conf:
###########################################
# Training #
###########################################
n_epoch: 20
n_epoch: 40
accum_grad: 2
global_grad_clip: 5.0
optim: adam
optim_conf:
lr: 2.5
weight_decay: 1.0e-06
weight_decay: 0.
scheduler: noam
scheduler_conf:
warmup_steps: 25000
lr_decay: 1.0
log_interval: 5
log_interval: 50
checkpoint:
kbest_n: 50
latest_n: 5
5 changes: 4 additions & 1 deletion examples/ted_en_zh/st1/local/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,18 @@ ckpt_prefix=$3

for type in fullsentence; do
echo "decoding ${type}"
batch_size=32
python3 -u ${BIN_DIR}/test.py \
--ngpu ${ngpu} \
--config ${config_path} \
--decode_cfg ${decode_config_path} \
--result_file ${ckpt_prefix}.${type}.rsl \
--checkpoint_path ${ckpt_prefix} \
<<<<<<< HEAD
--opts decode.decoding_method ${type} \
--opts decode.decode_batch_size ${batch_size}
=======
--opts decoding.decoding_method ${type} \
>>>>>>> 6272496d9c26736750b577fd832ea9dd4ddc4e6e

if [ $? -ne 0 ]; then
echo "Failed in evaluation!"
Expand Down
5 changes: 1 addition & 4 deletions examples/vctk/voc1/conf/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,7 @@ lambda_adv: 4.0 # Loss balancing coefficient.
###########################################################
batch_size: 8 # Batch size.
batch_max_steps: 24000 # Length of each audio in batch. Make sure dividable by n_shift.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 4 # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.
num_workers: 2 # Number of workers in DataLoader.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
Expand Down
Loading

0 comments on commit a1d8ab0

Please sign in to comment.