Does early memorization predict late memorization? #29

StellaAthena · 2022-12-01T19:49:07Z

We currently have the following correlation heat-map which indicates that the answer is "yes." We should probably also make confusion matrices for the classifier that takes a model and predicts memorization by the fully trained model by assuming it is the same as the 23M sequence checkpoint.

lintangsutawika · 2022-12-13T03:54:56Z

The correlation looks strong. This is for 13B model.

lintangsutawika · 2022-12-13T03:57:24Z

The trend is consistent in small models (19M)

StellaAthena · 2022-12-13T04:40:13Z

@lintangsutawika very interesting! It looks like the change between checkpoints is mostly confined to FP cell.. the TN and FN numbers are rather consistent across the course of training (until the end, where incorrect judgements are impossible).

One potential next step is to plot the TP, FP, and FN rates as a function of the number of steps the checkpoint has been trained for. My intuition is that we should plot this with a logarithm on the x-axis (I know this will make it look funky), maybe do it both with and without a log and see what they look like?

Another analysis worth doing is looking at the consistency across checkpoints. Obviously the final checkpoint gets all predictions correct, but once points get categorized correctly do they remain correctly categorized throughout the remaining checkpoints? I guess what I have in mind here is a markov model for how individual predictions move around the four states over the course of training. Does that make sense?

This is a minor thing, but I’m used to these matrices being laid out slightly differently, as shown in the below image. Would you mind reorienting them? It’s messing with my head

lintangsutawika · 2022-12-13T08:30:46Z

No problem.

lintangsutawika · 2022-12-13T14:35:04Z

Preliminary graphs through steps for all model sizes until before 146M.

TPR

FPR

FNR

lintangsutawika · 2022-12-14T15:04:15Z

Also, I tried graphing the number of lines memorized through checkpoints for each model.

It looks like there is a linear growth for all models and that the rate of growth increases as the model size gets larger.

StellaAthena · 2022-12-28T06:46:19Z

@lintangsutawika This is great work, and I think with a little addition design iteration will be an important component of the memorization paper. Unfortunately the raw underlying memorization data was erroneous, but @uSaiPrashanth is working on fixing that and should have new data by Jan 1. Have you had a chance to read the WIP paper draft?

I’m hoping we can package these results and deliver a paper to ICML (deadline Jan 26). I think this is a realistic but non-trivial deadline to make, and that we have really compelling results so far.

Where is the code you’ve been working with? Is it on a branch / fork somewhere, in a Jupyter Notebook, or what?

lintangsutawika · 2022-12-28T09:43:01Z

I have the code in Jupyter Notebook that I copied from Sai's. I can rerun and adjust the graphs for the paper.

I haven't read the paper draft. I've reached out to Sai for help.

StellaAthena · 2022-12-28T16:30:49Z

Great. Can you upload that notebook to analysis/memorization

lintangsutawika · 2023-01-03T04:30:47Z

What happens to points that are forgotten? Tracking all the points that were memorized in the first checkpoints, the all model sizes tend to forget what they memorized earlier. Connecting this to above graph, where the number of memorization increases, this probably means that the ones memorized later are new points and that the ones memorized earlier have a slight tendency to be forgotten.

lintangsutawika · 2023-01-03T04:37:03Z

Added notebook in a PR for now.
#50

StellaAthena added good first issue Good for newcomers help wanted This issue needs assistance labels Dec 1, 2022

StellaAthena mentioned this issue Dec 14, 2022

Does memorization in small models predict memorization in large models? #19

Closed

StellaAthena assigned lintangsutawika Dec 28, 2022

StellaAthena closed this as completed Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does early memorization predict late memorization? #29

Does early memorization predict late memorization? #29

StellaAthena commented Dec 1, 2022 •

edited

Loading

lintangsutawika commented Dec 13, 2022

lintangsutawika commented Dec 13, 2022

StellaAthena commented Dec 13, 2022

lintangsutawika commented Dec 13, 2022

lintangsutawika commented Dec 13, 2022 •

edited

Loading

lintangsutawika commented Dec 14, 2022

StellaAthena commented Dec 28, 2022

lintangsutawika commented Dec 28, 2022

StellaAthena commented Dec 28, 2022

lintangsutawika commented Jan 3, 2023

lintangsutawika commented Jan 3, 2023

Does early memorization predict late memorization? #29

Does early memorization predict late memorization? #29

Comments

StellaAthena commented Dec 1, 2022 • edited Loading

lintangsutawika commented Dec 13, 2022

lintangsutawika commented Dec 13, 2022

StellaAthena commented Dec 13, 2022

lintangsutawika commented Dec 13, 2022

lintangsutawika commented Dec 13, 2022 • edited Loading

lintangsutawika commented Dec 14, 2022

StellaAthena commented Dec 28, 2022

lintangsutawika commented Dec 28, 2022

StellaAthena commented Dec 28, 2022

lintangsutawika commented Jan 3, 2023

lintangsutawika commented Jan 3, 2023

StellaAthena commented Dec 1, 2022 •

edited

Loading

lintangsutawika commented Dec 13, 2022 •

edited

Loading