-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues running your model #1
Comments
I would check that the gradient is calculated by printing loss.grad. It's easy to use variables that don't ask for the gradient and then the loss oscillates but never gets optimized. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
First of all, let me thank you for putting this together. I was very curious about the paper, but their TF implementation is rather poor and very hard to understand. Yours is very clean and makes a lot more sense!
I ran your model with default parameters in the reconstruction mode on the Hotel dataset on a single Tesla K80 machine. It took 20+ hours to train for 10 epochs, and the model didn't converge (see below). The loss has never moved below 22,000.
I have a few questions:
Is there something that I am doing wrong? Are there any parameters that need to be specified to make the model work? I checked the defaults for the parameters and they looked in line with the paper.
You use log softmax as the loss function for the deconvolutional model and I assume that's why the model is taking so long to train. I know that's what the paper recommends, but have you tried using adapative softmax instead?
What are your thoughts on seeding the embedding matrix with pre-learned embeddings? I am curious if using L2-normalized Glove embeddings would speed up the training.
I also tried to train jointly with a classifier using AG News dataset, but MLP classifier is unhappy about the dimensions it gets.
h = encoder(feature) print(h.shape) prob = decoder(h) log_prob = mlp(h.squeeze())
h = torch.Size([64, 500, 5, 1])
The last dimension gets squeezed, but 64, 500, 5 vector is not compatible with the 500x300 FC layer:
RuntimeError: size mismatch, m1: [32000 x 5], m2: [500 x 300] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1518385717421/work/torch/lib/TH/generic/THTensorMath.c:1434
I would greatly appreciate any guidance you could give me on these!
The text was updated successfully, but these errors were encountered: