-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Social bias and knowledge of social bias #5
Comments
@StellaAthena Can I take this up? |
Also the CrowS-Pairs dataset has been deprecated by the authors after future works found some issues described in the README. It can still be used though but just not trusted significantly i guess |
@aflah02 Thanks for letting us know! @haileyschoelkopf has already launched interventions on model trainings, where we change the pronoun distribution in the text for a tail portion of training. These should be done pretty soon, and we'll need help analyzing them. Hailey: can you explain precisely the intervention you've run, and we can plan how precisely we are going to analyze the results given this new info? |
Yes! I have rerun the last 5k steps = ~15% of training of our Pythia-1.3b-deduped model with specific tokens in the GPT-NeoX-20b tokenizer for male pronouns swapped out to female pronouns (see full mapping here: EleutherAI/gpt-neox@df1bdca ) Was running into issues because of a bug affecting evals which was now fixed, I intend to run evaluations on this intervened model on:
Would love to get input on any other evaluations that might be useful, or any similar papers worth reading! Particularly interested in anything to run with this model that is not just a 0-shot benchmark. |
Thanks @haileyschoelkopf and sorry for the late reply I'm not particularly aware of non zero shot benchmarks but I can help out with the zero shot ones if there is any need for the same! Alsoo Happy New Year and Holidays @StellaAthena @haileyschoelkopf 🎉 |
I did some poking around, and it seems that some of the leading (aka only) work on this topic is The Birth of Bias: A case study on the evolution of gender bias in an English language model. A couple notable papers about bias evaluation and amplification include Trustworthy Social Bias Measurement and Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. See also the critical paper Undesirable biases in NLP: Averting a crisis of measurement This is a complicated and nuanced issue and our goal is not to meaningfully advance the body of knowledge. It's to motivate why this question is interesting and pitch Pythia as a platform for studying this question. This changes how we frame things: we can embrace the criticality of particular measurement techniques and calls for better analytic methods that would undermine papers seeking to solve the problem directly. We also don't need to get an answer let alone the answer. We just need to show that someone with more time and subject matter expertise should seriously consider this as a platform for doing the work. The first order of business is a lit review: we want to make a list of papers talking about bias amplification in NLP or bias and training dynamics in deep learning. We especially want to note any papers that argue for studying how bias evolves over time, notes it as an interesting or worthwhile problem, or note it as an avenue for future work. Extra bonus points for papers that mention this but say that they can't study it due to insufficient model / data access (since that's the actual contribution of the paper). @aflah02 I know this is a bit different from what you probably envisioned, but would you be interested in taking the lead on this lit review? |
Hi @StellaAthena |
@aflah02 I don’t care about the format at all. It can be bullets, it can be paragraphs, it can be delivered orally. What matters is that you’re able to take the research you read and explain it to @haileyschoelkopf so she can decide what the exact experiment we are going to run is, and then to me so I can help write about why people interested in this question should use our model suite to answer it. This would need to have a pretty quick turn-around. We are targeting submission to ICML, which has a deadline on Jan 26th, and I think that we’ll need this info by the 15th to bring able to incorporate it properly. Do you think you can do that? |
@StellaAthena that seems reasonable! I think i can do it and I'll get to work on this and share periodic updates here after I cover each paper! I also wrote a survey paper some months ago for a writing class on debiasing methods it's not written in the best way but should give some ideas, I'll share that as well |
Awesome! Again, I want to stress that the goal is not to write a survey paper: it’s to survey the field so we can pitch our model suite to people in the field effectively. Let’s plan on synching up on Friday about your progress? Can you post in #interp-across-time in the discord about setting that up? |
Hey @aflah02 , sorry, been partially AFK the past 2 days due to the holiday so haven't caught up with the literature Stella found so far! Would be great to follow up with you in our discord so we can discuss this further. |
@StellaAthena Sure! I'll make a post there @haileyschoelkopf No worries hope you had fun! I'll post updates there |
Pretty straight forward… need to implement the following in the eval harness:
For a discussion of “recognizing bias” vs “reproducing bias” check out here. The primary goal is to look at how the development of understanding of bias correlates with the development of tendency to produce biased content.
It would also be interesting to look at correlations across categories of bias, e.g., does the model learn to reproduce and/or identify all types of bias at an equal rate? And if not, can we identify specific subsets of the Pile that are “biased in how they are biased” so to speak.
The text was updated successfully, but these errors were encountered: