Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop' #119

Merged
merged 6 commits into from
May 31, 2019

Conversation

albertwujj
Copy link
Contributor

@albertwujj albertwujj commented Apr 12, 2019

Hi,

This change runs the initial model step on the full context, by calling the body() function. I added an 'first' parameter defaulting to False to allow this.

@bladedsupernova
Copy link

Are you saying the AI was not considering all my context whatttt...huh? When I tested it it was great. What is the new change in English?

@albertwujj
Copy link
Contributor Author

Run the initial prompt through the model all at once instead of the final token separately. It's just a tiny performance improvement.

@bladedsupernova
Copy link

How did you find it? Are you a collaborator or simply you deeply analyzed the code of GPT-2? I'm interested in paying someone a lot to understand GPT-2...let me know...

Copy link
Contributor

@WuTheFWasThat WuTheFWasThat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this! had some suggestions for making it nicer

src/sample.py Outdated
def body(past, prev, output):
next_outputs = step(hparams, prev[:, tf.newaxis], past=past)
def body(past, prev, output, first=False):
next_outputs = step(hparams, prev if first else prev[:, tf.newaxis], past=past)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just always make it just prev, and don't do the later squeeze?

src/sample.py Outdated
logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature)
logits = top_k_logits(logits, k=top_k)
samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)
return [
tf.concat([past, next_outputs['presents']], axis=-2),
next_outputs['presents'] if first else tf.concat([past, next_outputs['presents']], axis=-2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should probably be if past is not None instead of if first

src/sample.py Outdated
]

past, prev, output = body(None, context, None, first=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove the first=True flag, and change output to just be tf.zeros([batch_size, 0]) or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I was considering that. Mine is simpler to read, but yours is kinda like just better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this was actually the source of the bug. We need to pass the context as the output.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

In order to set the seed, I made this single change (not sure how Fire works). Then just called 'python interactive_conditional_samples.py' on the 117M model.
However, it seems that this does not result in the same output for the same prompt. This is from the original GPT-2 that I just forked:
image

And this is from my current fork (which has pulled from upstream):
image

Are there more random seeds to be fixed?

@WuTheFWasThat
Copy link
Contributor

tensorflow seeds are dependent on the graph :( silly design

@albertwujj
Copy link
Contributor Author

hmm, how do i fix that?

@WuTheFWasThat
Copy link
Contributor

I would just test with k=1 or something. your outputs definitely look buggy though also

@albertwujj
Copy link
Contributor Author

Maybe. not 'buggy' as I think I did exactly what the TODO requested. I'll take a look at the output, and re-check that the TODO request should not change the output tho.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Remember this is 117M. Second output looks fine.

@albertwujj
Copy link
Contributor Author

Actually, is there anyone you can ask about the TODO request? I do not think it would change the output.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Oh yeah you do mean top_k, nvm. Thanks for the tip!

@albertwujj
Copy link
Contributor Author

To put it very briefly, the TODO does not change the output because all Decoder states from past iterations are passed to the current iteration.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Close but not quite.
Original:
image

My fork:
image

Same prompt as above. Seems like just a bug, or at least fixable. Missing output in the beginning, but managed to get the exact same to repeat. I'll run more tests.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Wait, about your 'depends on graph' comment. I did not make your suggested change yet, I am testing with this, see my fork for the exact repo I am using, besides the top of the file, my version here:
image

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Ok yes, it seems like my fork is completely correct except for missing the non-repeating output.
Original:
image
My fork:
image

The prompt:
image

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Damn the model took the next few words right out of the article: https://www.theverge.com/2019/5/2/18525323/detective-pikachu-review-pokemon (jk obvs article released after).

I'll get to looking into the bug with my fork.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Testing one more time with the prompt as 'hi'

orig
image
my fork
image
Missing the comma.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Juuuuust in case, tested on Unconditional, with same seed fixes at the top of file:
orig
image
my fork
image
Missing a newline!
Really mysterious why this happens. I feel like it's gonna be obvious.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Certainly let me know if you or someone at OpenAI finds out why! I'm working on it too.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

To summarize the above, I am testing my fork with the TODO fixed, against the actual repo, with the seeds fixed and top_k=1. It seems to match exactly, except for missing some of the output in the beginning.

@albertwujj
Copy link
Contributor Author

Ok, comparing my code with the original, I think I see where I had a bug. Will fix and test.

@albertwujj
Copy link
Contributor Author

I will work on removing the 'first' parameter.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Ok, tested after removing first param, same result:
image
Bug fixed! Let me know if you want me to test on other inputs. I will also test on the Detective Pikachu article quote, 'hi', and Unconditional.

P.S. after testing on the Detective Pikachu quote, I noticed that since I had added a space to the end of the prompt, I got a different result. Without the space actually gives a better result.

@albertwujj
Copy link
Contributor Author

albertwujj commented May 24, 2019

Tested on the 3 prompts, plus Unconditional. Results of this fork are the exact same as the actual repo.

top_k set to 1 and seeded with 1957, in addition to adding

tf.random.set_random_seed(1957)
np.random.seed(1957)

to the top of the script just in case.

@albertwujj
Copy link
Contributor Author

The fork is complete. Barring any further tests you may have. Output matches exactly.

@albertwujj
Copy link
Contributor Author

Just curious, may some of the changes (like shape invariant) cause complications for library users? Or have you just not gotten to looking and checking yet?

@albertwujj
Copy link
Contributor Author

I can write a test script with the two versions of sample.py. Where should I put it? Do you have an exact way you want me to go about it?

@WuTheFWasThat
Copy link
Contributor

ah sorry, didn't get around to looking for awhile. looks good!

@WuTheFWasThat WuTheFWasThat merged commit c0859d7 into openai:master May 31, 2019
nshepperd pushed a commit to nshepperd/gpt-2 that referenced this pull request Oct 31, 2022
…ation to while loop' (openai#119)

* do initial run on full context

* decrement while loop iterations

* add context to output

* remove first param

* removing first param: change shape invariant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants