-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing temperature sampling and extending description to enable majority voting for tasks #376
Conversation
Updated the PR to enable majority voting without adding extra tasks. Now majority voting can be enabled through the task description, e.g., saving {"math_algebra": "majority_voting=32,sampling_temperature=0.3"} as config.json and invoking the evaluation by GPT2 performance with 3-shot on task math_algebra:
|
I will endeavor to look at this during the coming week. |
Obviously I didn’t have time to look at this as I had hoped. I’m currently discussing a re-factoring of this library with @haileyschoelkopf and @jon-tow which will likely take this feature into account. Apologies for the delays, things have been quite hectic. |
No worries! It's great to see the re-factoring effort. |
lm_eval/base.py
Outdated
greedy = False | ||
_model_generate_kwargs = {"k": k, "temperature": temperature} | ||
elif len(request) == 5: | ||
context, until, k, temperature, k_batch = request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps name k
and k_batch
to line up with what is specified in the config? E.g. right now the config has majority_voting
(or rename to majority_voting_k
or majority_voting_num_votes
).
This could help with keeping track of (a) which hyperparameters are supported, and (b) which part of the code corresponds to what is specified in the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a very good suggestion. Improving consistency of internal names for things is a huge add for hackability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed- after thinking about it some more, the k
here isn't specific to majority voting though, so perhaps just num_samples
lm_eval/evaluator.py
Outdated
@@ -215,7 +218,10 @@ def evaluate( | |||
ctx = task.fewshot_context( | |||
doc=doc, num_fewshot=num_fewshot, rnd=rnd, description=description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will the key-value pair string description
be prepended to the few-shot context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to support both cases, one option is to break apart the two cases in the config, e.g. {"math_algebra": {"description": "You will solve mathematical problems. Here are some examples: ", "params": {"majority_voting": 4,"sampling_temperature":0.3,"eval_batch_size":4}}}
this would also remove the need to implement the logic in parse_description
(see below)
lm_eval/tasks/hendrycks_math.py
Outdated
def process_results(self, doc, results): | ||
retval = 0 | ||
indices = [pos for pos, char in enumerate(results[0]) if char == "$"] | ||
def parse_description(self, description): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to reuse lm_eval.utils.simple_parse_args_string
?
(the logic is slightly different in that function, so maybe you already tried it and ran into an issue :))
@albertqjiang also if it's helpful to integrate or expand upon, I wrote some documentation while looking through the code: |
Albert Jiang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Currently, the temperature sampling is only implemented for HFLM and maj@k is only implemented for MATH/algebra through the hendrycks_math.MathAlgebraMaj task.
The problem to solve before upstreaming are stylistic, not technical:
Is it ok if we just add a new task for each task we want to do majority voting on? This is clumsy, but somehow necessary. Because for a different task, we might need a different post-processing method to figure out the vote.