Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing temperature sampling and extending description to enable majority voting for tasks #376

Closed
wants to merge 0 commits into from

Conversation

albertqjiang
Copy link

Currently, the temperature sampling is only implemented for HFLM and maj@k is only implemented for MATH/algebra through the hendrycks_math.MathAlgebraMaj task.

The problem to solve before upstreaming are stylistic, not technical:

Is it ok if we just add a new task for each task we want to do majority voting on? This is clumsy, but somehow necessary. Because for a different task, we might need a different post-processing method to figure out the vote.

@albertqjiang albertqjiang changed the title [WIP don't merge] implementing temperature sampling and maj@k for HFLM Implementing temperature sampling and extending description to enable majority voting for tasks Jan 13, 2023
@albertqjiang
Copy link
Author

Updated the PR to enable majority voting without adding extra tasks.

Now majority voting can be enabled through the task description, e.g., saving {"math_algebra": "majority_voting=32,sampling_temperature=0.3"} as config.json and invoking the evaluation by
python main.py --model gpt2 --tasks math_algebra --device cuda --num_fewshot 3 --description_dict_path config.json

GPT2 performance with 3-shot on task math_algebra:

  • Without majority voting: 0.08±0.08%
  • With majority voting @ 32 and sampling temperature 0.3: 0.51 ± 0.21 %

@StellaAthena
Copy link
Member

I will endeavor to look at this during the coming week.

@StellaAthena
Copy link
Member

Obviously I didn’t have time to look at this as I had hoped. I’m currently discussing a re-factoring of this library with @haileyschoelkopf and @jon-tow which will likely take this feature into account.

Apologies for the delays, things have been quite hectic.

@albertqjiang
Copy link
Author

Obviously I didn’t have time to look at this as I had hoped. I’m currently discussing a re-factoring of this library with @haileyschoelkopf and @jon-tow which will likely take this feature into account.

Apologies for the delays, things have been quite hectic.

No worries! It's great to see the re-factoring effort.

lm_eval/base.py Outdated
greedy = False
_model_generate_kwargs = {"k": k, "temperature": temperature}
elif len(request) == 5:
context, until, k, temperature, k_batch = request

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name k and k_batch to line up with what is specified in the config? E.g. right now the config has majority_voting (or rename to majority_voting_k or majority_voting_num_votes).

This could help with keeping track of (a) which hyperparameters are supported, and (b) which part of the code corresponds to what is specified in the config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a very good suggestion. Improving consistency of internal names for things is a huge add for hackability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed- after thinking about it some more, the k here isn't specific to majority voting though, so perhaps just num_samples

@@ -215,7 +218,10 @@ def evaluate(
ctx = task.fewshot_context(
doc=doc, num_fewshot=num_fewshot, rnd=rnd, description=description

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the key-value pair string description be prepended to the few-shot context?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed that it is prepended:

Screen Shot 2023-03-14 at 11 01 20 AM

Copy link

@wellecks wellecks Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support both cases, one option is to break apart the two cases in the config, e.g. {"math_algebra": {"description": "You will solve mathematical problems. Here are some examples: ", "params": {"majority_voting": 4,"sampling_temperature":0.3,"eval_batch_size":4}}}

this would also remove the need to implement the logic in parse_description (see below)

def process_results(self, doc, results):
retval = 0
indices = [pos for pos, char in enumerate(results[0]) if char == "$"]
def parse_description(self, description):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to reuse lm_eval.utils.simple_parse_args_string?

(the logic is slightly different in that function, so maybe you already tried it and ran into an issue :))

@wellecks
Copy link

@albertqjiang also if it's helpful to integrate or expand upon, I wrote some documentation while looking through the code:

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ StellaAthena
❌ Albert Jiang


Albert Jiang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants