Remove `LM` dependency from `build_all_requests` #2011

baberabb · 2024-06-22T13:50:33Z

Removed the LM object as an argument and added the chat_template: Callable arg to improve modularity and reduce coupling between components.

haileyschoelkopf

Thank you! LGTM!

* refactored `lm.apply_chat_template` * nit * fix weird type error * fixed! * skip failing test * pre-commit run all * add type hints * nit * nit * fixup

* Fix: support PEFT/LoRA with added tokens (EleutherAI#1828) * resize model embeddings * resize only * tokenizer help * load tokenizer before model * add comment and run precommit lint * Add log message Co-authored-by: Hailey Schoelkopf <[email protected]> --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * fixed incorrect check for task type (replace `~` with `not`) (EleutherAI#1865) * fixed docs typos (EleutherAI#1863) * Update polemo2_out.yaml (EleutherAI#1871) * Unpin vllm in dependencies (EleutherAI#1874) * Fix outdated links to the latest links in `docs` (EleutherAI#1876) * [HFLM]Use Accelerate's API to reduce hard-coded CUDA code (EleutherAI#1880) * Fix `batch_size=auto` for HF Seq2Seq models (EleutherAI#1765) (EleutherAI#1790) * fix auto-batch size bug for seq2seq models * run linter * Fix Brier Score (EleutherAI#1847) `gold_one_hot` needs to follow the dimension of predictions so that it still works when `--limit` is used and the indexes in gold does not cover all gold indexes. * Fix for bootstrap_iters = 0 case (EleutherAI#1715) (EleutherAI#1789) * add handling for bootstrap_iters=0 case * add more detail to docstring * run precommit * add mmlu tasks from pile-t5 (EleutherAI#1710) * add mmlu tasks from pile-t5 * Update _mmlu_flan_cot_fewshot_template_yaml * Update _mmlu_flan_cot_zeroshot_template_yaml * Update _mmlu_flan_generative_template_yaml * Update _mmlu_flan_loglikelihood_template_yaml * Update _default_template_yaml --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * Bigbench fix (EleutherAI#1686) * edit process multiple-choice * split template yaml * remove * modified multiple_choice tasks * udpate * Update multiple_choice_template_b_yaml * Update multiple_choice_template_a_yaml --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * Rename `lm_eval.logging -> lm_eval.loggers` (EleutherAI#1858) * rename lm_eval.logging module * fix evaluation tracker args * Updated vllm imports in vllm_causallms.py (EleutherAI#1890) * Reorder vllm imports in vllm_causallms.py * Update vllm_causallms.py * [HFLM]Add support for Ascend NPU (EleutherAI#1886) * [HFLM]Add support for Ascend NPU Co-authored-by: jiaqiw09 <[email protected]> Co-authored-by: zhabuye <[email protected]> * bump accelerate dependency version to 0.26.0 for NPU compat. --------- Co-authored-by: jiaqiw09 <[email protected]> Co-authored-by: zhabuye <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> * `higher_is_better` tickers in output table (EleutherAI#1893) * Higher is better tickers in output table * add extra check for `higher_is_better` not being None already * Update lm_eval/evaluator.py * fixup format I messed up * add comment (and retrigger tests) --------- Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> * Add dataset card when pushing to HF hub (EleutherAI#1898) * dataset card initial * few fixes * adds groups for math, mmlu, gpqa * added summary agrs * moved sanitize_list to utils * readme update * recreate metadata moved * multiple model support * results latest split fix * readme update and small refactor * fix grouping * add comments * added pathlib * corrected pathlib approach * check whether to create a metadata card * convert posix paths to str * default hf org from token * hf token value error * Add logs after successful upload * logging updates * dataset card example in the readme --------- Co-authored-by: Nathan Habib <[email protected]> Co-authored-by: Alina Lozovskaia <[email protected]> * Making hardcoded few shots compatible with the chat template mechanism (EleutherAI#1895) * init test 1 * fix * this format seems to be working - need to update all other tasks with the new format * bbh with few shot format * fix fewshot bbh * add mmlu flan cot * samples of cot * kmmlu * fix gsm8k * update keys for mmlu * minerva math * bbh * fix * fix samples * small fixes to templates * last prompt format change * fixing prompt * fixed minerva math format * rm accidental commited file * added doc for few shot samples * Update lm_eval/loggers/evaluation_tracker.py * Update lm_eval/loggers/evaluation_tracker.py * Update docs/new_task_guide.md Co-authored-by: Hailey Schoelkopf <[email protected]> * added check in sampler per code review * added the system from a function, plus an example in minerva math * style * Apply suggestions from code review Co-authored-by: Hailey Schoelkopf <[email protected]> * fix unit tests 1 * forcing use of test split --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * Try to make existing tests run little bit faster (EleutherAI#1905) * Fix fewshot seed only set when overriding num_fewshot (EleutherAI#1914) Fix EleutherAI#1906 * Complete task list from pr 1727 (EleutherAI#1901) * added tasks and task family descriptors * continue work on task list w/ links; slightly reorganize README * Apply suggestions from code review * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder * Update new_task_guide.md * Update README.md * run linter * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs * fix typo * Apply suggestions from code review Co-authored-by: Hailey Schoelkopf <[email protected]> * apply format --------- Co-authored-by: Harish Vadaparty <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> * Add chat template (EleutherAI#1873) * initial chat template * tokenizer attribute check * variable rename * interface update * system instruction * system inst default update * fewshot as multiturn * typing update * indent update * added comments * Adding a fewshot in a more readable way * linting * Moved apply chat template to LM * multiturn alternation fix * cache key update * apply chat template method fix * add system prompt hash to cache_key * tokenizer name property for cache_key * property name fix * linting backward compatibility fix * docs and errors update * add documentation on adding chat template compatibility to model_guide * fewshot as multiturn check fix * saving system inst and chat template in results * eval tracker update * docs update * Apply suggestions from code review Co-authored-by: Hailey Schoelkopf <[email protected]> --------- Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: Clémentine Fourrier <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> * Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data (EleutherAI#1867) * glianorex tasks * Create README.md * Update README.md * Update README.md * fix formatting * fix internal formatting * Modify pre-commit hook to check merge conflicts accidentally committed not at current merge commit (EleutherAI#1927) * [add] fld logical formula task (EleutherAI#1931) * Add new Lambada translations (EleutherAI#1897) * added tasks and task family descriptors * configs for the new lambada translations * continue work on task list w/ links; slightly reorganize README * Apply suggestions from code review * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder * Update new_task_guide.md * Update README.md * run linter * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs * fix typo * update `lm_eval/tasks/README.md` with task description --------- Co-authored-by: Harish Vadaparty <[email protected]> Co-authored-by: anthony <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> * Implement NoticIA (EleutherAI#1912) * Noticia * test * Final testes implementation * Fixes * Fix linters * Add The Arabic version of the PICA benchmark (EleutherAI#1917) * Update siqa.yaml (EleutherAI#1909) * Update basque-glue (EleutherAI#1913) * Update README.md * Update bec.yaml * Update bhtc.yaml * Update coref.yaml * Update qnli.yaml * Update vaxx.yaml * Update wic.yaml * Test output table layout consistency (EleutherAI#1916) * sort metrics in output table * update docstring in `consolidate_results` * add tests for verifying consistency of table output * update tests to account for floating point inconsistencies * updated tests based on `pythia-14m` * Update __main__.py (EleutherAI#1939) * Add the Arabic version with refactor to Arabic pica to be in alghafa folder (EleutherAI#1940) * Results filenames handling fix (EleutherAI#1926) * results filenames handling moved to utils * zeno results handling fix * tasks_for_model backward compatibility * results files logic moved to tasks_for_model * moved sanitize_model_name to utils * Remove AMMLU Due to Translation (EleutherAI#1948) * Update README.md * Delete lm_eval/tasks/ammlu directory * add include_defaults kwarg to taskmanager, add tests for include_path (EleutherAI#1856) * add hacky add_bos_token forcing for Gemma to VLLM too (EleutherAI#1857) * Update interface.md (EleutherAI#1955) * Fix self.max_tokens in anthropic_llms.py (EleutherAI#1848) Fix bug where `self.max_tokens` was not set * `samples` is newline delimited (EleutherAI#1930) * `samples` is newline delimited * updated git and pre-commit * appease pre-commit * nit * Revert back for now * Revert for now --------- Co-authored-by: Lintang Sutawika <[email protected]> * Fix `--gen_kwargs` and VLLM (`temperature` not respected) (EleutherAI#1800) * Update vllm_causallms.py * adjust --------- Co-authored-by: lintangsutawika <[email protected]> * make write_out.py explicitly error if no splits match (EleutherAI#1796) Co-authored-by: lintangsutawika <[email protected]> * fix: add directory filter to os.walk to ignore 'ipynb_checkpoints' (EleutherAI#1956) * fix: add filter to os.walk to ignore 'ipynb_checkpoints * Update __init__.py * Update __init__.py --------- Co-authored-by: Lintang Sutawika <[email protected]> * add trust_remote_code for piqa (EleutherAI#1983) Signed-off-by: changwangss <[email protected]> * Fix self assignment in neuron_optimum.py (EleutherAI#1990) * [New Task] Add Paloma benchmark (EleutherAI#1928) * init paloma benchmark * pre-process in utils function * add `task_alias` * updated task aliases * Update paloma_dolma-v1_5.yaml * Update paloma_twitterAAE_HELM_fixed.yaml * Update paloma_dolma_100_programing_languages.yaml --------- Co-authored-by: Lintang Sutawika <[email protected]> * Fix Paloma Template yaml (EleutherAI#1993) * init paloma benchmark * pre-process in utils function * add `task_alias` * updated task aliases * Update paloma_dolma-v1_5.yaml * Update paloma_twitterAAE_HELM_fixed.yaml * Update paloma_dolma_100_programing_languages.yaml * update on names * fix paloma template issue --------- Co-authored-by: Zafir Stojanovski <[email protected]> Co-authored-by: Zafir Stojanovski <[email protected]> Co-authored-by: Lintang Sutawika <[email protected]> * Log `fewshot_as_multiturn` in results files (EleutherAI#1995) * log fewshot_as_multiturn in general tracker args * Update evaluator.py --------- Co-authored-by: Lintang Sutawika <[email protected]> * Added ArabicMMLU (EleutherAI#1987) * Added ArabicMMLU * Rename `ammlu` to `arabicmmlu` * Fix Datasets `--trust_remote_code` (EleutherAI#1998) * Add BertaQA dataset tasks (EleutherAI#1964) * add bertaqa tasks * rename basquetrivia-->bertaqa ; make template stub not .yaml * add bertaqa entry to lm_eval/tasks/README.md --------- Co-authored-by: haileyschoelkopf <[email protected]> * add tokenizer logs info (EleutherAI#1731) * add tokenizer logs info * add no tokenizer case * Update lm_eval/logging_utils.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/logging_utils.py Co-authored-by: Hailey Schoelkopf <[email protected]> * add updates * fix conflict --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * Hotfix breaking import (EleutherAI#2015) * add arc_challenge_mt (EleutherAI#1900) * add arc_challenge_mt * add README * add icelandic * Remove `LM` dependency from `build_all_requests` (EleutherAI#2011) * refactored `lm.apply_chat_template` * nit * fix weird type error * fixed! * skip failing test * pre-commit run all * add type hints * nit * nit * fixup * Added CommonsenseQA task (EleutherAI#1721) * Initial configuration * Using the validation set for the test set, because the test set on HF doesn't have labels * Probably just makes more sense to have validation be validation * fix format ; add docs to tasks/README.md * fix format --------- Co-authored-by: haileyschoelkopf <[email protected]> * Factor out LM-specific tests (EleutherAI#1859) * separate out optimum/neuralmagic tests to separate job * fix vllm tests * fix bug in --trust_remote_code * use datasets.config instead intentionally * fix remote code issue? * Update interface.md (EleutherAI#1982) * Update interface.md update interface to remove link to really outdated commit of evaluator.py * switch to relative referencing? * Update interface.md --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * Fix `trust_remote_code`-related test failures (EleutherAI#2024) * make MMLU trust remote code to fix tests * remove trust remote code * Fixes scrolls task bug with few_shot examples (EleutherAI#2003) Bug: ``` python -m scripts.write_out --task scrolls_quality --output_base_path ~/workspace/ Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/lm-evaluation-harness/scripts/write_out.py", line 92, in <module> main() File "/lm-evaluation-harness/scripts/write_out.py", line 51, in main task_dict = tasks.get_task_dict(task_names, task_manager) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 423, in get_task_dict task_name_from_string_dict = task_manager.load_task_or_group( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 271, in load_task_or_group collections.ChainMap(*map(self._load_individual_task_or_group, task_list)) File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 162, in _load_individual_task_or_group return load_task(task_config, task=name_or_config, group=parent_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 148, in load_task task_object = config["class"]() ^^^^^^^^^^^^^^^^^ File "/lm-evaluation-harness/lm_eval/tasks/scrolls/task.py", line 120, in __init__ super().__init__() File "/lm-evaluation-harness/lm_eval/api/task.py", line 703, in __init__ self._config = TaskConfig(**config) ^^^^^^^^^^^^^^^^^^^^ TypeError: lm_eval.api.task.TaskConfig() argument after ** must be a mapping, not NoneType ``` * fix cache (EleutherAI#2037) * Add chat template to `vllm` (EleutherAI#2034) * add chat template * refactor token padding * nit * nit * check on failing test * check transformers version * remove transformers pin * add ids to test * nit * fixup * fix bos bug * nit * fixup! fix bos bug * increase tolerance for table test * don't detokenize vllm logprobs * Update lm_eval/models/utils.py Co-authored-by: Hailey Schoelkopf <[email protected]> * pre-commit run --all-files --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * fail gracefully upon tokenizer logging failure (EleutherAI#2038) * ship with exact_match function already used ; don't call evaluate.load() on import (EleutherAI#2045) * update to v0.4.3 (EleutherAI#2046) * fix wandb logger module import in example (EleutherAI#2041) * Fix strip whitespace filter (EleutherAI#2048) * batch commit * :Revert "batch commit" This reverts commit d859d1c. * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * update gemma-2 default BOS behavior (EleutherAI#2049) * Update hellaswag.yaml (EleutherAI#2029) * Adds Open LLM Leaderboard Taks (EleutherAI#2047) * adds leaderboard tasks * Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml * add readme * Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml * modify readme * fix bbh task * fix bbh salient task * modify the readme * Delete lm_eval/tasks/leaderboard/ifeval/README.md * Delete lm_eval/tasks/leaderboard/math/README.md * add leaderboard to the tasks repertory * add anouncment about new leaderbaord tasks * linting * Update README.md Co-authored-by: Hailey Schoelkopf <[email protected]> * installs ifeval dependency in new_task github workflow --------- Co-authored-by: Nathan Habib <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> * EleutherAI#1442 inverse scaling tasks implementation (EleutherAI#1589) * initial_implementation (test has to be proceeded) * minor fix * revised task name and implemented new task * minor fixes * new tasks implement * minor fix * added 'prompt injection' task * delete prompt injection task (will be implemented at next PR) * trust remote code * Update lm_eval/tasks/inverse_scaling/README.md Co-authored-by: Hailey Schoelkopf <[email protected]> * added readme * Update lm_eval/tasks/README.md * Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml * Update lm_eval/tasks/inverse_scaling/README.md Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml Co-authored-by: Hailey Schoelkopf <[email protected]> * Update README.md * precommit? * run precommit on readme --------- Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> * Fix TypeError in samplers.py by converting int to str (EleutherAI#2074) Co-authored-by: yhjo <[email protected]> * Group agg rework (EleutherAI#1741) * add greoup_config arg * add a group config that allows disabling table for group score and group aggregate in general * fixed size configuration * adjust config * add group config * adjust mmlu to use group_config * fixed args input in aggregate_subtask_metrics * fixed issues related to printing alias of group and updated yaml * update all mmlu variants to include group_config * edit format * modify mmlu tasks * adjust group to also be a configurable group * add configurable group * simplify get_task_list * adjust group scoring with using ConfigurableGroup * adjust args * update mmlu * update mmlu * update to work with new group and task configuration * readd group_agg * readd files * move prepare_print_tasks to evaluator_utils * sort set to False by default, fix predict_only arg * add version for groups * reversed task list * update additional condition when loading a group in a group yaml * update truthfulqa * add description regarding tags replacing group * replace group to tag * fixed conditional statement * remove warning * update loading of task group and newly added tags * reformat with pre-commit * fixed info log * update * fix bug * fix bug * use task id to differentiate tasks * convert all groups to configurable groups * use task_id * reformat * add task_id for python tasks as well * add task_id for python tasks as well * add task_id for python tasks as well * revert truthfulqa * revert mmlu tasks * new mmlu config * new group config parameter `tag_to_task` * Update truthfulqa_mc2.yaml * reformate * add _process_group_config * adjust task_id * add get_subtask_list function to get proper subtask list * group config to_dict update * remove tag check * update mmlu * fix config passing issues * add test yaml * format fix * add documentation * corner case for single tag being called * fix indentation * formatting * update all mmlu variants * Update docs/task_guide.md Co-authored-by: Hailey Schoelkopf <[email protected]> * remove group_alias * Update docs/task_guide.md Co-authored-by: Hailey Schoelkopf <[email protected]> * remove version for metadata * Update docs/task_guide.md Co-authored-by: Hailey Schoelkopf <[email protected]> * update mmlu/ * removed " " in make_table * change how aggregate_metric is loaded * change how aggregate_metric is loaded * update aggregate_metric arg * update format * update format * some docs fixes * add groups for agieval, aexams, aclue * add more explicit aggregation groups * add more groupings / tags distinctions * add more groupings * more groupings * add many explicit group configs * add many explicit group configs * add more explicit group configs * add more explicit group configs * add more error msgs, agg_metric -> agg_metric_list * some docs updates * update task_id to be updateable and uses group:task format * make KMMLU a tag for now * update docs * don't duplicate task names * fix merge conflicts? * giving this a try * clean up diff * switch mmlu variants over to using * don't use to-be-deprecated group: config field in overview notebook * Python tasks which subclass ConfigurableTask now run * update mmlu * pre-commit format * fixed sorting for multi-level printing * move group api to separate file * fix bbh aggregation filter usage * track api/group.py * adjust group and tags loading * make explicit group configs for leaderboard and other newer tasks * fix arabicmmlu * update * change arabicmmlu template name??? * update group alias * fix printing bugs * check table printing is correct ; update tests * use mmlu_stem to have a group included in print tests --------- Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> * we run with bootstrap_iters=0 for printing tests (EleutherAI#2080) * Easier unitxt tasks loading and removal of unitxt library dependancy (EleutherAI#1933) * Updated unitxt loading Signed-off-by: Elron Bandel <[email protected]> * Revert change to general Readme Signed-off-by: Elron Bandel <[email protected]> * Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor Signed-off-by: Elron Bandel <[email protected]> * Fix scrolls Signed-off-by: elronbandel <[email protected]> * Update documentation Signed-off-by: elronbandel <[email protected]> * Enforce backward compatability Signed-off-by: elronbandel <[email protected]> * Format unitxt class Signed-off-by: elronbandel <[email protected]> --------- Signed-off-by: Elron Bandel <[email protected]> Signed-off-by: elronbandel <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> * Allow gating EvaluationTracker HF Hub results; customizability (EleutherAI#2051) * batch commit * :Revert "batch commit" This reverts commit d859d1c. * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup eval results * cleanup * add check for gated repo * fix jsonline issue * fix * add try catch when gating the details repo * add doc * adds back hub_repo_name * readds hub repo name * Minor doc fix: leaderboard README.md missing mmlu-pro group and task (EleutherAI#2075) leaderboard README.md missing mmlu-pro group and task * fix: utf-8 encoding for logged sample files was missing (EleutherAI#2082) * Update utils.py (EleutherAI#2085) Group Configs with no aggregation will print a empty space as the score for result table. Example ``` | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |--------------|-------|------|-----:|--------|---|-----:|---|-----:| |group | N/A| | | | | | | | | - task 0 |Yaml |none | 0|acc |↑ |0.4000|± |0.0910| | - task 1 |Yaml |none | 0|acc |↑ |0.3333|± |0.0875| | - task 2 |Yaml |none | 0|acc |↑ |0.2667|± |0.0821| | - task 3 |Yaml |none | 0|acc |↑ |0.3333|± |0.0875| ``` So the `v` variable in the `make_table` needs to check if the value is a float or a string. * batch_size may be str if 'auto' is specified (EleutherAI#2084) * Prettify lm_eval --tasks list (EleutherAI#1929) * add and ; move task list newline logic to new TaskManager.list_all_tasks() method * format table list into markdown table; add config location column * add Output Type column * add logic for printing table of tags separately * merge with main and fix conflicts ; update docstrings --------- Co-authored-by: haileyschoelkopf <[email protected]> * make RougeScorer only initialized once (EleutherAI#2090) * Update default.yaml (EleutherAI#2092) * Add new dataset MMLU-SR tasks (EleutherAI#2032) * add mmlusr tasks * renamed all tasks names in mmlusr * edit format and readme * added mmlu_sr * mmlu_sr -> mmlusr * update --------- Co-authored-by: lintangsutawika <[email protected]> * Irokobench: Benchmark Dataset for African languages (EleutherAI#2042) * add afrixnli to task * add chat completion * remove chat completion -untested * afrimmlu added * afrimmlu folder update * afrimmlu folder update * updated prompt * remove print * add afrimgsm -direct * add squad metric * fix bash script * remove direct util, update common yaml * remove print * add few show. metric fixes * fix direct path, add bash script for gpt models * added transate test * update afrixnli tasks * update afrixnli tasks * update metrics for afrixnli * prompt translations fix * prompt translations fix * filter and metric fix -mgsm * remove squad metric * remove squad metric * add f1 score to mgsm * add f1 score to mgsm * update native-direct with lin * change f1 function * add lin to utils * add utils * remove test limit * remove test configs * add swahili to mmlu * change eng to ewe in ewe yaml mmlu * add squad metric to mgsm, remove whitespace filter * added translate test * added afrixnli_translate * fix exact match valueError * fix exact match valueError * restructure mmlu folder * spacing * remove afrimmlu_translate folder * add utility * format task name, clean ups * modefied mgsm * update on afrimgsm * update on afrimgsm * removed utils * other mgsm varieties * other mgsm varieties * adding trasnslate direct * Update translate_direct_yaml * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model * edit for open models * Update translate_direct_yaml * add verbalizer for xnli * change xnli from multiple choice to generate * add manual accuracy scores * revert xnli to multiple choice * change afrimgsm utils * revert xnli to multiple_choice * cleanups and readmes * remove openai fixes and unused regex * pr review changes * revert metrics.py, task.py and extraction.py to main version --------- Co-authored-by: Israel Abebe Azime <[email protected]> Co-authored-by: Israel Abebe Azime <[email protected]> * docs: remove trailing sentence from contribution doc (EleutherAI#2098) Signed-off-by: Nathan Weinberg <[email protected]> * Added MedConceptsQA Benchmark (EleutherAI#2010) * Added MedConceptsQA Benchmark * pre-commit factor * update group name * update in naming * changed name * Changed mcqa to med_concepts_qa prefix * Added med_concepts_qa to README.md * Changed config files according the new format * Updated README --------- Co-authored-by: lintangsutawika <[email protected]> * make recurrent_gemma model types included in the force-BOS case (EleutherAI#2105) * formatting (EleutherAI#2104) * docs: align local test command to match CI (EleutherAI#2100) Also add 'test_logs/' to .gitignore Signed-off-by: Nathan Weinberg <[email protected]> * Fixed colon in Belebele _default_template_yaml (EleutherAI#2111) * [python] fix haerae tasks (EleutherAI#2112) * fix: broken discord link in CONTRIBUTING.md (EleutherAI#2114) Signed-off-by: Nathan Weinberg <[email protected]> * docs: update truthfulqa tasks (EleutherAI#2119) * fix caching module (hotfix for now) (EleutherAI#2124) * Refactor API models (EleutherAI#2008) * refactor pad_token handling to fn * fix docs * add pad_token_handling to vllm * start on API superclass * don't detokenize the returned logits * streamline vllm tokenizer * add type hint * pre-commit * seems to be in working order * add model to init * refactor api models * nit * cleanup * add pbar * fix type hints * change optional dependencies * json encode chat template * add type hints * deal with different prompt input requiremnts * nits * fix * cache inside async * fix * fix * nits * nits * nits * nit * fixup * fixup * nit * add dummy retry * add dummy retry * handle imports; skip failing test * add type hint * add tests * add dependency to tests * add package names to exception * nit * docs; type hints * handle api key * nit * tokenizer bug * fix tokenizer * nit * nit * add better error messages * nit * remove decorator * CI: install api dep * revert evaluator.py * consolidate * consolidate * nits * nit * fix typealias * nit * nit * nit * Update lm_eval/models/api_models.py typo Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/models/openai_completions.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/models/anthropic_llms.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/models/api_models.py Co-authored-by: Hailey Schoelkopf <[email protected]> * fix typo * add news section * add info for API * pre-commit * typo * fix bug: unpack logliklehood requests * fix bug: shared gen_kwargs mutated * nit: handle copy properly * Update README.md * Update README.md * Update README.md * Update api_models.py * Update README.md --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * bugfix and docs for API (EleutherAI#2139) * encoding bugfix * encoding bugfix * overload logliklehood rather than loglikehood_tokens * add custom tokenizer * add docs * Update API_guide.md fix link; add note * Update API_guide.md typo * pre-commit * add link in readme * nit * nit * nit * Update API_guide.md nits * Update API_guide.md * Update API_guide.md * Update API_guide.md * Update API_guide.md * Update README.md * Update docs/API_guide.md * Update docs/API_guide.md * Update API_guide.md --------- Co-authored-by: Hailey Schoelkopf <[email protected]> * [Bugfix] add temperature=0 to logprobs and seed args to API models (EleutherAI#2149) * add temperature for log probs * add seed * nit * add new args to test * added warning for api chat models * refactor: limit usage of `scipy` and `skilearn` dependencies (EleutherAI#2097) * refactor: move scipy and sklearn module imports to func imports Signed-off-by: Nathan Weinberg <[email protected]> * refactor: consolidate weighted_f1_score func into lm_eval utils Signed-off-by: Nathan Weinberg <[email protected]> * lint: allow for utils file to have unused imports this allows for shared functions to be defined only once while allowing for the YAML function importing to continue working Signed-off-by: Nathan Weinberg <[email protected]> --------- Signed-off-by: Nathan Weinberg <[email protected]> --------- Signed-off-by: changwangss <[email protected]> Signed-off-by: Elron Bandel <[email protected]> Signed-off-by: elronbandel <[email protected]> Signed-off-by: Nathan Weinberg <[email protected]> Co-authored-by: Nick Doiron <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: Zafir Stojanovski <[email protected]> Co-authored-by: zhabuye <[email protected]> Co-authored-by: Edward Gan <[email protected]> Co-authored-by: DongGeon Lee <[email protected]> Co-authored-by: Huazhong Ji <[email protected]> Co-authored-by: Lintang Sutawika <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: jiaqiw09 <[email protected]> Co-authored-by: zhabuye <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: KonradSzafer <[email protected]> Co-authored-by: Nathan Habib <[email protected]> Co-authored-by: Alina Lozovskaia <[email protected]> Co-authored-by: Clémentine Fourrier <[email protected]> Co-authored-by: LSinev <[email protected]> Co-authored-by: anthony-dipofi <[email protected]> Co-authored-by: Harish Vadaparty <[email protected]> Co-authored-by: Maxime <[email protected]> Co-authored-by: MorishT <[email protected]> Co-authored-by: Iker García-Ferrero <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: Zafir Stojanovski <[email protected]> Co-authored-by: Sadra Barikbin <[email protected]> Co-authored-by: Nikita Lozhnikov <[email protected]> Co-authored-by: Baber Abbasi <[email protected]> Co-authored-by: johnwee1 <[email protected]> Co-authored-by: Wang, Chang <[email protected]> Co-authored-by: Yazeed Alnumay <[email protected]> Co-authored-by: Julen Etxaniz <[email protected]> Co-authored-by: achervyakov <[email protected]> Co-authored-by: Stella Biderman <[email protected]> Co-authored-by: jonabur <[email protected]> Co-authored-by: Brendan Murphy <[email protected]> Co-authored-by: Steven Basart <[email protected]> Co-authored-by: Ogundepo Odunayo <[email protected]> Co-authored-by: Nathan Habib <[email protected]> Co-authored-by: Hanwool Albert Lee <[email protected]> Co-authored-by: Choyunhui <[email protected]> Co-authored-by: yhjo <[email protected]> Co-authored-by: Elron Bandel <[email protected]> Co-authored-by: Pankaj Mathur <[email protected]> Co-authored-by: meg <[email protected]> Co-authored-by: Wonung Kim <[email protected]> Co-authored-by: SuperCat <[email protected]> Co-authored-by: Jess <[email protected]> Co-authored-by: Israel Abebe Azime <[email protected]> Co-authored-by: Israel Abebe Azime <[email protected]> Co-authored-by: Nathan Weinberg <[email protected]> Co-authored-by: Ben Shoham Ofir <[email protected]> Co-authored-by: jab13x <[email protected]> Co-authored-by: Jungwhan Kim <[email protected]> Co-authored-by: Jennifer Cwagenberg <[email protected]>

refactored lm.apply_chat_template

59a35d7

baberabb requested review from haileyschoelkopf and lintangsutawika as code owners June 22, 2024 13:50

baberabb added 9 commits June 22, 2024 18:55

nit

e1d4477

fix weird type error

e2788bf

fixed!

8eb1b2f

skip failing test

e93dc01

pre-commit run all

482a6ab

add type hints

fd43a05

nit

4259f8a

nit

1cdeb6e

fixup

6f94e86

haileyschoelkopf approved these changes Jun 25, 2024

View reviewed changes

haileyschoelkopf merged commit 9b6179b into EleutherAI:main Jun 25, 2024
8 checks passed

baberabb deleted the chat_ branch June 25, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `LM` dependency from `build_all_requests` #2011

Remove `LM` dependency from `build_all_requests` #2011

baberabb commented Jun 22, 2024 •

edited

Loading

haileyschoelkopf left a comment

Remove LM dependency from build_all_requests #2011

Remove LM dependency from build_all_requests #2011

Conversation

baberabb commented Jun 22, 2024 • edited Loading

haileyschoelkopf left a comment

Choose a reason for hiding this comment

Remove `LM` dependency from `build_all_requests` #2011

Remove `LM` dependency from `build_all_requests` #2011

baberabb commented Jun 22, 2024 •

edited

Loading