-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fit the exponential decay curve to accuracy distribution #23
Comments
We were recently discussing this theory in Discord, and it occurred to @norabelrose and I that an exponential decay pattern does not actually agree with the tail probability theory, as the tails an exponential decay are too fat to produce noticable bumps. However, power laws do have fat tails and a "rich get richer" dynamic makes sense in the context of memorization as the more detail one specifies about the generated sequence the more locked-in the model should be to the correct distribution. Last night Nora decided to run some basic analysis, and lo and behold: |
Does this issue still need help? |
@CalmDownKarm Thanks for reaching out! We took care of this, and are currently preparing a paper for release detailing the results. |
We hypothesize that the Scatter SDE summary plot of the accuracy distribution is an exponential decay with a bump at acc = 1 corresponding to the sum of the tail probabilities (since the memorization score can't go above 1). Specifically, let p(x) = [the number of sequences in the training data that have accuracy x]. We want to do the following:
The text was updated successfully, but these errors were encountered: