Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss is nan #4

Closed
YvoGao opened this issue Jun 7, 2024 · 8 comments
Closed

loss is nan #4

YvoGao opened this issue Jun 7, 2024 · 8 comments

Comments

@YvoGao
Copy link

YvoGao commented Jun 7, 2024

Thanks for your work, I have a problem with running default params, loss is nan like:
Session 0, epo 1, lrc=0.0002,total loss=nan, loss_CE=nan, loss_ED=nan, loss_SKD=nan, acc=0.0160: 86%|██████████████████████████████████████████████▍ | 202/235 [04:08<00:40, 1.22s/it

@rongtongxueya
Copy link

same question.can give me some advice?

@zichengpan
Copy link

Same question and the accuracy is quite low.

@pgh2874
Copy link
Collaborator

pgh2874 commented Jun 13, 2024

As other FSCIL methods do, we used FSCIL Benchmark Settings proposed in FSCIL. If you have an issue, It would be helpful to use the data split index file from the [link](https://github.com/xyutao/fscil, which is widely used in FSCIL to benchmark scenarios.

@pgh2874 pgh2874 closed this as completed Jun 13, 2024
@YvoGao
Copy link
Author

YvoGao commented Jul 11, 2024

As other FSCIL methods do, we used FSCIL Benchmark Settings proposed in FSCIL. If you have an issue, It would be helpful to use the data split index file from the [link](https://github.com/xyutao/fscil, which is widely used in FSCIL to benchmark scenarios.

We mean there are some wrong of your code. The loss is nan and the ACC is quite low, only 1-2%.

@pgh2874
Copy link
Collaborator

pgh2874 commented Jul 11, 2024

I've found that it is due to the Torch version. Try using the Torch version 1.12 or 1.11. I conducted all the experiments using Torch version 1.12.
Please let me know if the loss still contains NaN values after updating the Torch version.

@pgh2874 pgh2874 reopened this Jul 11, 2024
@YvoGao
Copy link
Author

YvoGao commented Jul 15, 2024

Slove the problem that loss is nan, but the ACC is still low, I use the default parameters.
Namespace(Dataset=<module 'dataloader.cifar100.cifar' from '/data/gaoyunlong/FSCIL/PriViLege/dataloader/cifar100/cifar.py'>, ED=True, ED_hp=0.1, LT=True, MP=False, PKT_tune_way=1, SKD=True, WC=False, base_class=60, base_mode='ft_dot', baseline=False, batch_size_base=128, batch_size_new=0, clip=False, comp_out=1, dataroot='/data/gaoyunlong/dataset/', dataset='cifar100', debug=False, decay=0.0005, dp=False, episode_query=15, episode_shot=1, episode_way=15, epochs_base=5, epochs_new=3, fraction_to_keep=0.1, ft=False, gamma=0.1, gpu='0', l2p=False, low_shot=1, low_way=15, lp=False, lr_base=0.0002, lr_new=0.0002, lrg=0.1, milestones=[20, 30, 45], model_dir=None, momentum=0.9, new_mode='avg_cos', not_data_init=False, num_classes=100, num_gpu=1, num_workers=4, out='PriViLege', prefix=False, pret_clip=False, project='base', rotation=False, save_path='checkpoint/PriViLege/cifar100/base_ViT_Ours/ft_dot-avg_cos-data_init-start_0/Epo_5-Lr_0.0002-COS_80-Gam_0.10-Bs_128-Mom_0.90-Wd_0.00050-seed_1-T_16.00', schedule='Cosine', scratch=False, seed=1, sessions=9, set_no_val=False, shot=5, start_session=0, step=80, taskblock=2, temperature=16, test_batch_size=128, train_episode=50, vit=True, way=5)
epoch:000,lr:0.0002,training_loss:3.63284,training_acc:0.13897,test_loss:3.12547,test_acc:0.19867
epoch:001,lr:0.0002,training_loss:3.06128,training_acc:0.23677,test_loss:2.86275,test_acc:0.25467
epoch:002,lr:0.0001,training_loss:2.77224,training_acc:0.30250,test_loss:2.65879,test_acc:0.29983
epoch:003,lr:0.0001,training_loss:2.52510,training_acc:0.35493,test_loss:2.53084,test_acc:0.33217
epoch:004,lr:0.0000,training_loss:2.32806,training_acc:0.40283,test_loss:2.40860,test_acc:0.36167
Session 0, Test Best Epoch 4,
best test Acc 36.1670

Session 1, test Acc 23.831

Session 2, test Acc 22.143

Session 3, test Acc 19.693

Session 4, test Acc 18.600

Session 5, test Acc 16.918

Session 6, test Acc 15.978

Session 7, test Acc 15.095

Session 8, test Acc 13.840

Base Session Best Epoch 4

[25.567, 23.831, 22.143, 19.693, 18.6, 16.918, 15.978, 15.095, 13.84]

@YvoGao
Copy link
Author

YvoGao commented Jul 15, 2024

FileNotFoundError: [Errno 2] No such file or directory: '/data/pgh2874/FSCIL/Ours/dataloader/miniimagenet/map_clsloc.txt'

@pgh2874
Copy link
Collaborator

pgh2874 commented Jul 15, 2024

Low accuracy can result from the different TiMM versions. I recommend using timm==0.6.7.

For the FileNotFoundError, I've uploaded the map_clsloc.txt file to this repository. Make sure to change the directory path to match your directory path.

@pgh2874 pgh2874 closed this as completed Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants