Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues running your model #1

Open
opletayev opened this issue Feb 23, 2018 · 1 comment
Open

Issues running your model #1

opletayev opened this issue Feb 23, 2018 · 1 comment

Comments

@opletayev
Copy link

Hello,

First of all, let me thank you for putting this together. I was very curious about the paper, but their TF implementation is rather poor and very hard to understand. Yours is very clean and makes a lot more sense!

I ran your model with default parameters in the reconstruction mode on the Hotel dataset on a single Tesla K80 machine. It took 20+ hours to train for 10 epochs, and the model didn't converge (see below). The loss has never moved below 22,000.

I have a few questions:

  1. Is there something that I am doing wrong? Are there any parameters that need to be specified to make the model work? I checked the defaults for the parameters and they looked in line with the paper.

  2. You use log softmax as the loss function for the deconvolutional model and I assume that's why the model is taking so long to train. I know that's what the paper recommends, but have you tried using adapative softmax instead?

  3. What are your thoughts on seeding the embedding matrix with pre-learned embeddings? I am curious if using L2-normalized Glove embeddings would speed up the training.

  4. I also tried to train jointly with a classifier using AG News dataset, but MLP classifier is unhappy about the dimensions it gets.

h = encoder(feature) print(h.shape) prob = decoder(h) log_prob = mlp(h.squeeze())

h = torch.Size([64, 500, 5, 1])
The last dimension gets squeezed, but 64, 500, 5 vector is not compatible with the 500x300 FC layer:

RuntimeError: size mismatch, m1: [32000 x 5], m2: [500 x 300] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1518385717421/work/torch/lib/TH/generic/THTensorMath.c:1434

I would greatly appreciate any guidance you could give me on these!

======= RESULTS ==========

Input Sentence:
stayed two nights in this hotel for our 20th anniversary . the location is fantastic , near great shopping , restaraunts and entertainment . the staff was great . the bed was the most comfortable i have ever slept in . i wanted to take it home with me ! the rooms and halls were quiet and peaceful . the bathroom was incredible , sparkling marble , huge space , impecably clean . the only down side was how expensive it was to park our car . yikes ! over all we could not have asked for a better hotel and we will definately stay here again . it was worth every penny . END_TOKEN

Output Sentence:
ricca raggiungibile duur raggiungibile uhr tasse toujours nuestro nogal tren dava frequentato bagno altre uhr krijg salir krap toujours krijg uhr deve l'albergo misma uhr frequentato quand atencion standaard frequentato uhr avere uhr cambiare arredamento precios preso gevraagd bekommt dotate interessante parken l'albergo z'n uhr accanto uhr raggiungibile uhr stanze uhr krijg uhr spazi aeropuerto kwamen uhr mocht ruido frequentato uhr avere bekommt all'arrivo salir totalmente uhr zentral bekommt spettacolare l'albergo llegamos dava frequentato servizio pesar bekommt metropolitana serviable stanze salir relativamente jahre relativamente bekommt arrivati passa z'n uhr trova naechte necesario suis raam l'albergo necesario l'albergo z'n servizio hemos l'albergo enkel aeropuerto citta foi zoek nostro estar salir avere l'albergo heerlijk verkennen andando salir particolarmente trova pagamento trova trovate trova acondicionado trova frigorifero trova trovate trova trovate trova acondicionado trova frigorifero trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova acondicionado trova trovate trova trovate trova trovate trova strasse trova trovate trova trovate trova trovate trova strasse trova trovate trova trovate trova trovate trova trova
Epoch: 10

Epoch: 10
Steps: 108920
Loss: 22058.16015625
Eval
Evaluation - loss: 683.1286144549368 Rouge1: 1.5889867148342671e-06 Rouge2: 0.0
Finish!!!

@fcampagne
Copy link

I would check that the gradient is calculated by printing loss.grad. It's easy to use variables that don't ask for the gradient and then the loss oscillates but never gets optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants