-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CTCloss gets the nan loss when training with a custom Chinese dataset. #66
Comments
I met the same problem, but I still finished training the model. This situation only accurs when using CTC Loss. |
I also use CTC loss,do your trained model works well? |
The final result is ok, train_loss can descent normally during training. But I don't how why, maybe something wrong when using CTCLoss(). |
@AnddyWang @MengLcool did you use own data set to train? |
thanks for your code, can i use my own data set to train? if yes, What do I need to pay attention to? |
I will also train the model,can you help us solve the nan loss? @ku21fan |
Hello, So, I have 2 questions
|
I tried the latest code but still met NAN |
yes, I prepare my own dataset just follow the README ^_^ |
I use the latest code but the loss is NAN |
Thanks.
|
use the released datasets, without nan using tps or not. |
i have the same problem, when i use CTS loss |
@AnddyWang @MengLcool @13438960761 In general, CTCloss has some limitations and one of them is "input length >= target length". Thus, set 'batch_max_length = 63' and then the data whose length is longer than 63 will be filtered with these codes. Best |
Thanks for your reply. |
I wonder how long it took you to train a model? |
I think it's a bug of ctcloss in pytorch @AnddyWang, you can try pytorch1.2+ |
@WenmuZhou did you try to run the model of CTC in use pytorch1.2+? is its loss nan? |
pytorch1.3 works fine. |
Can it be used for double line text recognition |
------------ Options -------------
experiment_name: TPS-VGG-BiLSTM-CTC-Seed2222
manualSeed: 2222
workers: 16
batch_size: 192
num_iter: 300000
valInterval: 300000
continue_model:
adam: False
lr: 0.1
lr_decay_steps: 100000
lr_decay_rate: 0.8
beta1: 0.9
rho: 0.95
eps: 1e-08
grad_clip: 5
select_data: ['train']
batch_ratio: ['1']
total_data_usage_ratio: 1.0
batch_max_length: 64
imgH: 32
imgW: 256
rgb: True
sensitive: True
PAD: True
data_filtering_off: False
Transformation: TPS
FeatureExtraction: VGG
SequenceModeling: BiLSTM
Prediction: CTC
num_fiducial: 20
input_channel: 3
output_channel: 512
hidden_size: 256
num_gpu: 1
num_class: 6885
Loss
[38/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.22594 train_loss: nan
[39/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.23453 train_loss: nan
[40/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25824 train_loss: nan
[41/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25702 train_loss: nan
[42/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.28295 train_loss: nan
[43/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.28247 train_loss: nan
[44/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.27586 train_loss: nan
[45/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25553 train_loss: 8.42399
[46/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.22859 train_loss: nan
[47/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25175 train_loss: 8.32840
[48/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.24148 train_loss: nan
[49/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.22841 train_loss: nan
[50/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.27223 train_loss: nan
[51/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.23665 train_loss: 8.47187
[52/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.22846 train_loss: nan
[53/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.24092 train_loss: nan
[54/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25575 train_loss: 8.26231
[55/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.26092 train_loss: 8.02194
[56/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25898 train_loss: nan
[57/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.22861 train_loss: nan
[58/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.27106 train_loss: nan
[59/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.24483 train_loss: nan
[60/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.25403 train_loss: nan
[61/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.24929 train_loss: nan
[62/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.27895 train_loss: nan
[63/300000] lr: 0.1 0.1 single_train_elapsed_time: 0.24706 train_loss: nan
The text was updated successfully, but these errors were encountered: