Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error about meta_optimizer and new data #15

Open
leyi-123 opened this issue Dec 20, 2020 · 3 comments
Open

Error about meta_optimizer and new data #15

leyi-123 opened this issue Dec 20, 2020 · 3 comments

Comments

@leyi-123
Copy link

Hello! I use my own data to train your model. After line 170 meta_optimizer.step() is executed, line 150 val_loss, v_ppl = do_learning_fix_step(meta_net, train_iter, val_iter, iterations=config.meta_iteration) outputs val_loss as tensor(nan, device='cuda:0', grad_fn=<AddBackward>), which causes the training failure. I didn't change your code but persona_map, and I want to know what went wrong. Thanks!

@andreamad8
Copy link
Member

Maybe you need to provide a little bit more details.

This could happen for many reasons:

  • the train_iter or val_iter is empty
  • the lr is too high
    or more.

@leyi-123
Copy link
Author

Maybe you need to provide a little bit more details.

This could happen for many reasons:

* the train_iter or val_iter is empty

* the lr is too high
  or more.

I'm sorry that my question may be too sketchy. Now together with my last question, I will tell you in detail! I have trained on your model with my own data, which is in the same format as the example you gave, as follows:
0c8950903140cefc161f24644d5a307

The persona part is not a description, but a person's ID. I changed the cluster_persona function in data_reader.py as follows:
336275b2406bbfb77a20d29fef0a15f

and persona_map.txt is just like:
0db2db71de763ca935f4e738dc75376

When I ran MAML.py, I found that I couldn't train the model I wanted. After observation, I found that the functions, do_evaluation (defined on line 96 in MAML.py) and do_learning_fix_step (defined on line 74 in MAML.py), both returned tensor(nan, device='cuda:0', grad_fn=<AddBackward>) after the meta_optimizer.step() (line 170), which led to the training failure. So I want to know how to solve it. Thank you very much!

@andreamad8
Copy link
Member

mmm I see. I really don't know at this point.

I suggest going step by step inside the do_evaluation function to check where the loss gets none.

Sorry I cannot help much here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants