Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit the exponential decay curve to accuracy distribution #23

Closed
StellaAthena opened this issue Nov 27, 2022 · 3 comments
Closed

Fit the exponential decay curve to accuracy distribution #23

StellaAthena opened this issue Nov 27, 2022 · 3 comments
Assignees
Labels
good first issue Good for newcomers help wanted This issue needs assistance

Comments

@StellaAthena
Copy link
Member

StellaAthena commented Nov 27, 2022

We hypothesize that the Scatter SDE summary plot of the accuracy distribution is an exponential decay with a bump at acc = 1 corresponding to the sum of the tail probabilities (since the memorization score can't go above 1). Specifically, let p(x) = [the number of sequences in the training data that have accuracy x]. We want to do the following:

  1. Fit an exponential decay curve to p(x) looking only at x in [0, k] for k in [0.25, 0.5, 0.75, 0.9, 0.99]
  2. Check how well the curves agree on [k, infinity)
  3. Check whether the sum from i = 1 to infinity of p(i) according to the fit model equals the observed p(1) value.
@StellaAthena StellaAthena changed the title Fit the exponential decay curve to accuracy distribution, confirm the hypothesis that the cut-off tail sums to the spike. Fit the exponential decay curve to accuracy distribution Dec 1, 2022
@StellaAthena StellaAthena added good first issue Good for newcomers help wanted This issue needs assistance labels Dec 1, 2022
@StellaAthena
Copy link
Member Author

We were recently discussing this theory in Discord, and it occurred to @norabelrose and I that an exponential decay pattern does not actually agree with the tail probability theory, as the tails an exponential decay are too fat to produce noticable bumps. However, power laws do have fat tails and a "rich get richer" dynamic makes sense in the context of memorization as the more detail one specifies about the generated sequence the more locked-in the model should be to the correct distribution.

Last night Nora decided to run some basic analysis, and lo and behold:

Image

@CalmDownKarm
Copy link

Does this issue still need help?

@StellaAthena
Copy link
Member Author

@CalmDownKarm Thanks for reaching out! We took care of this, and are currently preparing a paper for release detailing the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted This issue needs assistance
Projects
Status: Done
Development

No branches or pull requests

3 participants