You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for a great tool. I want to run the evaluation harness on just 1 example, but the progress bar runs to 4, which I am confused by. Can you help me understand why, or point me to a relevant place in the documentation?
Example code followed by the output is here below
paperspace@psgwzz6bpkub:~$ lm_eval --model hf --model_args pretrained=gpt2 --tasks sciq --device cuda:0 --batch_size 1 --limit 1
2024-01-13:14:13:59,717 INFO [utils.py:148] Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-01-13:14:13:59,717 INFO [utils.py:160] NumExpr defaulting to 8 threads.
2024-01-13:14:13:59,953 INFO [config.py:58] PyTorch version 2.0.1+cu117 available.
2024-01-13:14:13:59,954 INFO [config.py:95] TensorFlow version 2.9.2 available.
2024-01-13:14:13:59,956 INFO [config.py:108] JAX version 0.4.8 available.
2024-01-13:14:14:08,172 INFO [__main__.py:156] Verbosity set to INFO
2024-01-13:14:14:12,073 WARNING [__init__.py:178] Some tasks could not be loaded due to missing dependencies. Run with `--verbosity DEBUG` for full details.
2024-01-13:14:14:15,790 WARNING [__init__.py:178] Some tasks could not be loaded due to missing dependencies. Run with `--verbosity DEBUG` for full details.
2024-01-13:14:14:15,790 WARNING [__main__.py:162] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.
2024-01-13:14:14:15,791 INFO [__main__.py:229] Selected Tasks: ['sciq']
2024-01-13:14:14:15,804 WARNING [logging.py:61] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2024-01-13:14:14:15,804 INFO [huggingface.py:146] Using device 'cuda:0'
2024-01-13:14:14:21,039 INFO [task.py:337] Building contexts for task on rank 0...
2024-01-13:14:14:21,041 INFO [evaluator.py:314] Running loglikelihood requests
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.21it/s]
The text was updated successfully, but these errors were encountered:
I think I understood - because there are four options, the harness splits each question + options set into a question x options tuple and evaluates each question + option pair, creating 4x the number of questions.
Hi folks,
Thanks for a great tool. I want to run the evaluation harness on just 1 example, but the progress bar runs to 4, which I am confused by. Can you help me understand why, or point me to a relevant place in the documentation?
Example code followed by the output is here below
The text was updated successfully, but these errors were encountered: