-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TODO in sample.sample_sequences- Avoid 'leaving last token calculation to while loop' #119
Conversation
Are you saying the AI was not considering all my context whatttt...huh? When I tested it it was great. What is the new change in English? |
Run the initial prompt through the model all at once instead of the final token separately. It's just a tiny performance improvement. |
How did you find it? Are you a collaborator or simply you deeply analyzed the code of GPT-2? I'm interested in paying someone a lot to understand GPT-2...let me know... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this! had some suggestions for making it nicer
src/sample.py
Outdated
def body(past, prev, output): | ||
next_outputs = step(hparams, prev[:, tf.newaxis], past=past) | ||
def body(past, prev, output, first=False): | ||
next_outputs = step(hparams, prev if first else prev[:, tf.newaxis], past=past) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we just always make it just prev
, and don't do the later squeeze?
src/sample.py
Outdated
logits = next_outputs['logits'][:, -1, :] / tf.to_float(temperature) | ||
logits = top_k_logits(logits, k=top_k) | ||
samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32) | ||
return [ | ||
tf.concat([past, next_outputs['presents']], axis=-2), | ||
next_outputs['presents'] if first else tf.concat([past, next_outputs['presents']], axis=-2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should probably be if past is not None
instead of if first
src/sample.py
Outdated
] | ||
|
||
past, prev, output = body(None, context, None, first=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would remove the first=True
flag, and change output to just be tf.zeros([batch_size, 0]) or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, I was considering that. Mine is simpler to read, but yours is kinda like just better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this was actually the source of the bug. We need to pass the context as the output.
merge fix sample.py TODO
In order to set the seed, I made this single change (not sure how Fire works). Then just called 'python interactive_conditional_samples.py' on the 117M model. And this is from my current fork (which has pulled from upstream): Are there more random seeds to be fixed? |
tensorflow seeds are dependent on the graph :( silly design |
hmm, how do i fix that? |
I would just test with k=1 or something. your outputs definitely look buggy though also |
Maybe. not 'buggy' as I think I did exactly what the TODO requested. I'll take a look at the output, and re-check that the TODO request should not change the output tho. |
Remember this is 117M. Second output looks fine. |
Actually, is there anyone you can ask about the TODO request? I do not think it would change the output. |
Oh yeah you do mean top_k, nvm. Thanks for the tip! |
To put it very briefly, the TODO does not change the output because all Decoder states from past iterations are passed to the current iteration. |
Damn the model took the next few words right out of the article: https://www.theverge.com/2019/5/2/18525323/detective-pikachu-review-pokemon (jk obvs article released after). I'll get to looking into the bug with my fork. |
Certainly let me know if you or someone at OpenAI finds out why! I'm working on it too. |
To summarize the above, I am testing my fork with the TODO fixed, against the actual repo, with the seeds fixed and top_k=1. It seems to match exactly, except for missing some of the output in the beginning. |
Ok, comparing my code with the original, I think I see where I had a bug. Will fix and test. |
I will work on removing the 'first' parameter. |
Tested on the 3 prompts, plus Unconditional. Results of this fork are the exact same as the actual repo. top_k set to 1 and seeded with 1957, in addition to adding
to the top of the script just in case. |
The fork is complete. Barring any further tests you may have. Output matches exactly. |
Just curious, may some of the changes (like shape invariant) cause complications for library users? Or have you just not gotten to looking and checking yet? |
I can write a test script with the two versions of sample.py. Where should I put it? Do you have an exact way you want me to go about it? |
ah sorry, didn't get around to looking for awhile. looks good! |
…ation to while loop' (openai#119) * do initial run on full context * decrement while loop iterations * add context to output * remove first param * removing first param: change shape invariant
Hi,
This change runs the initial model step on the full context, by calling the body() function. I added an 'first' parameter defaulting to False to allow this.