Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learning-rate and pretrained model of SAITS #35

Closed
lyy0095 opened this issue Feb 26, 2023 · 6 comments
Closed

learning-rate and pretrained model of SAITS #35

lyy0095 opened this issue Feb 26, 2023 · 6 comments

Comments

@lyy0095
Copy link

lyy0095 commented Feb 26, 2023

Hello, Wenjie,

I tried the PyPOTS with, it awesome! But I have following questions:
(1) During training with SAITS model, I found the learning-rate is recommend to lr = 0.00068277455043675505 in ‘PhysioNet2012_SAITS_best.ini’ file. I am wondering if there are some great methods to get such a learning-rate? (I only know to set 0.001, 0.0001 or such kind of stuffy numbers)
(2) if there are some possible to release the pretrained state_dict .pth file of SAITS(base) and SAITS? Because during training with my custom dataset, I encounter with an early-stop problem inside of 100 epochs, so I decide to see if there will be the same problem with PhysioNet2012 of epochs = 10000.
Or the training log files of SAITS(base) and SAITS would be helpful !

Thank you very much for your reply !

@WenjieDu
Copy link
Owner

Hi there 👋,

Thank you so much for your attention to PyPOTS! If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community.

I have received your message and will respond ASAP. Thank you for your patience! 😃

Best,
Wenjie

@WenjieDu
Copy link
Owner

WenjieDu commented Feb 27, 2023

Hi,

Many thanks for your likes to PyPOTS! Here are my answers:

1). Such a learning rate is tuned by NNI. For all models in the SAITS paper, we use NNI to help tune their hyperparameters to make fair comparisons. I recommend you to read the full paper of SAITS if you'd like to know more details;
2). For the pre-trained models, I'm sorry I cannot provide them right now because I don't have copies. The work of SAITS was finished about 18 months ago during my internship at Ciena. All experiments were run on Ciena's servers. I didn't copy the trained models because people can reproduce our results with open-source code in the repo https://github.com/WenjieDu/SAITS. If you have the early-stopping problem during training, you can make the num of the parameter early_stop_patience bigger in the config file.

And if your questions are specifically related to SAITS, you can raise issues in the repo of SAITS.

Thank you again for your support!

@lyy0095
Copy link
Author

lyy0095 commented Feb 27, 2023

Thank you for the answer, I will read the full paper carefully.
By the way, could you just remember how long the 10000 epochs training duration was? a few hours or several days?
The saits paper point out the using of " Nvidia Quadro RTX 5000 GPUs", so could you tell me how manys GPU have been used?

Sometimes when I found no big val-loss improvement between 100 epochs and 1000 epochs , I may lose the confidence of models, and keen on finding ways to change them.

Thank you again for your reply!

@WenjieDu
Copy link
Owner

In all experiments I ran, I remember the models usually converge in hundreds of training epochs and it shouldn't take more than 2 hours. There're four GPU cards, but we don't use them for parallelly training a single model, i.e. a model's training time does not get affected by the GPU number.

Right, sometimes the loss descends slowly but steadily. You can adjust the learning rate to obtain faster speed. And again, if you want to reproduce the results in the SAITS paper, you'd better use code in https://github.com/WenjieDu/SAITS because there're minor differences in code logic between the repos SAITS and PyPOTS.

@lyy0095
Copy link
Author

lyy0095 commented Feb 28, 2023

Thanks !
I got a Nvidia RTX 3080 (each epoch cost nearly 1 minute), which seems much more slower than RTX 5000 during training. If a RTX 5000 take less than 2 hours for 10000 training, I must buy one.
I don't quite familiar with imputation area models. Are there some common backbone structures for the imputation models (like we have common Resnet50/101 for CNN models, which can be used in Detectron2/YOLO/...)? these imputation models are totally different.

In Table 2 of SAITS paper, the Metrics of MAE / RMSE (like 0.186 / 0.431) are for original data or scaled/normallized data?
I download the PhysioNet-2012 dataset, some original data is rather big , eg:
nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,15.000000,nan,nan,nan,75.000000,nan,nan,nan,nan,nan,61.500000,91.665000,152.000000,nan,nan,nan,nan,19.000000,nan,nan,35.350000,nan,nan,480.000000,nan,-1.000000,nan

But the scaled/normallized data is small enough,eg:
nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,0.898174,nan,nan,nan,-0.656144,nan,nan,nan,nan,nan,0.206670,0.954933,1.447409,nan,nan,nan,nan,-0.109436,nan,nan,-1.316676,nan,nan,2.312329,nan,-3.345466,nan

Thank you very much for the reply!

@WenjieDu
Copy link
Owner

I didn't mean RTX5000 can finish all 10K epochs in two hours. When the early stopping strategy is applied, the training procedure doesn't have to finish all 10K epochs.

For your 1st question, you can refer to the section Related Work in the paper SAITS.

For the 2nd one, you can refer to the code in the repo SAITS and you'll find that all error metrics are computed on the normalized data.

@lyy0095 lyy0095 closed this as completed Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants