-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does early memorization predict late memorization? #29
Comments
@lintangsutawika very interesting! It looks like the change between checkpoints is mostly confined to FP cell.. the TN and FN numbers are rather consistent across the course of training (until the end, where incorrect judgements are impossible). One potential next step is to plot the TP, FP, and FN rates as a function of the number of steps the checkpoint has been trained for. My intuition is that we should plot this with a logarithm on the x-axis (I know this will make it look funky), maybe do it both with and without a log and see what they look like? Another analysis worth doing is looking at the consistency across checkpoints. Obviously the final checkpoint gets all predictions correct, but once points get categorized correctly do they remain correctly categorized throughout the remaining checkpoints? I guess what I have in mind here is a markov model for how individual predictions move around the four states over the course of training. Does that make sense? This is a minor thing, but I’m used to these matrices being laid out slightly differently, as shown in the below image. Would you mind reorienting them? It’s messing with my head |
@lintangsutawika This is great work, and I think with a little addition design iteration will be an important component of the memorization paper. Unfortunately the raw underlying memorization data was erroneous, but @uSaiPrashanth is working on fixing that and should have new data by Jan 1. Have you had a chance to read the WIP paper draft? I’m hoping we can package these results and deliver a paper to ICML (deadline Jan 26). I think this is a realistic but non-trivial deadline to make, and that we have really compelling results so far. Where is the code you’ve been working with? Is it on a branch / fork somewhere, in a Jupyter Notebook, or what? |
I have the code in Jupyter Notebook that I copied from Sai's. I can rerun and adjust the graphs for the paper. I haven't read the paper draft. I've reached out to Sai for help. |
Great. Can you upload that notebook to |
Added notebook in a PR for now. |
We currently have the following correlation heat-map which indicates that the answer is "yes." We should probably also make confusion matrices for the classifier that takes a model and predicts memorization by the fully trained model by assuming it is the same as the 23M sequence checkpoint.
The text was updated successfully, but these errors were encountered: