Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Error when run tedlium recipe #213

Open
yuexianghubit opened this issue May 22, 2019 · 2 comments
Open

Training Error when run tedlium recipe #213

yuexianghubit opened this issue May 22, 2019 · 2 comments

Comments

@yuexianghubit
Copy link

yuexianghubit commented May 22, 2019

Hi Alls:

I already install essen and I am trying to run the tedlium recipe, but I got the error:

train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1
WARNING (train-ctc-parallel:SelectGpuId():cuda-device.cc:150) Suggestion: use 'nvidia-smi -c 1' to set compute exclusive mode
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 4 GPUs
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX 1080 Ti free:189M, used:10985M, total:11175M, free/total:0.0169513
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(1): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(2): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(3): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 1 (automatically)
LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [1]: GeForce GTX 1080 Ti free:10983M, used:195M, total:11178M, free/total:0.982556 version 6.1
LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes.
LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.

ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".
ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".

[stack trace: ]
eesen::KaldiGetStackTraceabi:cxx11
eesen::KaldiErrorMessage::~KaldiErrorMessage()
eesen::ExpectToken(std::istream&, bool, char const*)
eesen::BiLstm::ReadData(std::istream&, bool)
eesen::Layer::Read(std::istream&, bool, bool)
.
.
.
eesen::Net::Read(std::istream&, bool)
eesen::Net::Read(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
train-ctc-parallel(main+0xbb1) [0x434345]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1fec83830]
train-ctc-parallel(_start+0x29) [0x4320f9]

How to fix it?
Thank you!

@fmetze
Copy link
Contributor

fmetze commented May 23, 2019

The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the librispeech recipe (or https://github.com/jb1999/eesen), while the actual code is srvk's Eesen?

Eesen's standard acoustic model does not contain the ForwardDropFactor, but jb1999's Eesen does.

@yuexianghubit
Copy link
Author

I installed srvk's Eesen, not jb1999's Eesen.
Today i tried the librispeech recipe, the acoutsic model training process can run normally, the nnet proto is below:
<Nnet>
<BiLstmParallel> <InputDim> 360 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<AffineTransform> <InputDim> 640 <OutputDim> 44 <ParamRange> 0.1
<Softmax> <InputDim> 44 <OutputDim> 44
</Nnet>

but when i ran the tedlium recipe , the acoustic model training got the error i sent before. And the nnet proto now is:
<Nnet>
<BiLstmParallel> <InputDim> 120 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<AffineTransform> <InputDim> 640 <OutputDim> 78 <ParamRange> 0.1
<Softmax> <InputDim> 78 <OutputDim> 78
</Nnet>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants