Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does early memorization predict late memorization? #29

Closed
StellaAthena opened this issue Dec 1, 2022 · 11 comments
Closed

Does early memorization predict late memorization? #29

StellaAthena opened this issue Dec 1, 2022 · 11 comments
Assignees
Labels
good first issue Good for newcomers help wanted This issue needs assistance

Comments

@StellaAthena
Copy link
Member

StellaAthena commented Dec 1, 2022

We currently have the following correlation heat-map which indicates that the answer is "yes." We should probably also make confusion matrices for the classifier that takes a model and predicts memorization by the fully trained model by assuming it is the same as the 23M sequence checkpoint.

Image

@StellaAthena StellaAthena added good first issue Good for newcomers help wanted This issue needs assistance labels Dec 1, 2022
@lintangsutawika
Copy link
Contributor

The correlation looks strong. This is for 13B model.

image
image
image
image
image
image
image

@lintangsutawika
Copy link
Contributor

The trend is consistent in small models (19M)
image
image
image
image
image
image
image

@StellaAthena
Copy link
Member Author

@lintangsutawika very interesting! It looks like the change between checkpoints is mostly confined to FP cell.. the TN and FN numbers are rather consistent across the course of training (until the end, where incorrect judgements are impossible).

One potential next step is to plot the TP, FP, and FN rates as a function of the number of steps the checkpoint has been trained for. My intuition is that we should plot this with a logarithm on the x-axis (I know this will make it look funky), maybe do it both with and without a log and see what they look like?

Another analysis worth doing is looking at the consistency across checkpoints. Obviously the final checkpoint gets all predictions correct, but once points get categorized correctly do they remain correctly categorized throughout the remaining checkpoints? I guess what I have in mind here is a markov model for how individual predictions move around the four states over the course of training. Does that make sense?

This is a minor thing, but I’m used to these matrices being laid out slightly differently, as shown in the below image. Would you mind reorienting them? It’s messing with my head
89902D94-A797-46F6-97F0-CF51B15B2D28

@lintangsutawika
Copy link
Contributor

No problem.
image
image
image
image
image
image
image

@lintangsutawika
Copy link
Contributor

lintangsutawika commented Dec 13, 2022

Preliminary graphs through steps for all model sizes until before 146M.

TPR
image

FPR

image

FNR

image

@lintangsutawika
Copy link
Contributor

Also, I tried graphing the number of lines memorized through checkpoints for each model.

It looks like there is a linear growth for all models and that the rate of growth increases as the model size gets larger.

image

@StellaAthena
Copy link
Member Author

@lintangsutawika This is great work, and I think with a little addition design iteration will be an important component of the memorization paper. Unfortunately the raw underlying memorization data was erroneous, but @uSaiPrashanth is working on fixing that and should have new data by Jan 1. Have you had a chance to read the WIP paper draft?

I’m hoping we can package these results and deliver a paper to ICML (deadline Jan 26). I think this is a realistic but non-trivial deadline to make, and that we have really compelling results so far.

Where is the code you’ve been working with? Is it on a branch / fork somewhere, in a Jupyter Notebook, or what?

@lintangsutawika
Copy link
Contributor

I have the code in Jupyter Notebook that I copied from Sai's. I can rerun and adjust the graphs for the paper.

I haven't read the paper draft. I've reached out to Sai for help.

@StellaAthena
Copy link
Member Author

Great. Can you upload that notebook to analysis/memorization

@lintangsutawika
Copy link
Contributor

What happens to points that are forgotten? Tracking all the points that were memorized in the first checkpoints, the all model sizes tend to forget what they memorized earlier. Connecting this to above graph, where the number of memorization increases, this probably means that the ones memorized later are new points and that the ones memorized earlier have a slight tendency to be forgotten.
image

@lintangsutawika
Copy link
Contributor

Added notebook in a PR for now.
#50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted This issue needs assistance
Projects
Status: Done
Development

No branches or pull requests

2 participants