output_path may break postprocessing #1918

artemorloff · 2024-06-03T21:07:38Z

after recent updates (~few weeks ago) Evaluation Tracker has been introduced and also the have been a change in storing outputs. Now outputs_path is not the FINAL destination, where the outputs are stored. There is one mode (sub) directory is created. This causes some problems:

postprocessing becomes harder. There is one more dir inside output_path. You do not now its name till end of running the code. Or you have to include the code for finding model_name_sanitized inside of your processing script. Once this logic for defining model_name_sanitized changes, all scripts will fail
there could be more errors for users, when someone uses the same output_path for running two different models. This way two diff subfolders are created
if I accidentaly run the same lm_eval (for the same model and tasks + output_path), the files are not overwritten. The names now contain the exact time. So, there would be 2 copies of the same file with no reason

What are the future plans for using model_source param of Evaluation Tracker?

The text was updated successfully, but these errors were encountered:

thehir0 · 2024-06-03T23:11:51Z

#1842

Same issue

KonradSzafer · 2024-06-05T09:55:11Z

Hi @artemorloff!

I have addressed problems raised here and in How to use Zeno #1842 in the new PR Results filenames handling fix #1926, where I have moved functions for handling the new results files to utils, so that they can be conveniently used in other parts of the codebase. For example, the Zeno script or the EvaluationTracker class.
In general, it's now better to just provide the results path without the model name, and for each model a separate directory will be created inside, making it easier not to accidentally mix model results.
Yes, now results won't be overwritten, and this is done intentionally, so it's harder to lose results. Some users prefer this option, e.g. here

StellaAthena · 2024-06-07T14:54:07Z

if I accidentaly run the same lm_eval (for the same model and tasks + output_path), the files are not overwritten. The names now contain the exact time. So, there would be 2 copies of the same file with no reason

If you don't desire this behavior, you can use the --use_cache flag to avoid rerunning tasks that are already run and have results stored.

LSinev mentioned this issue Jun 4, 2024

evaluation tracker implementation #1766

Merged

KonradSzafer mentioned this issue Jun 5, 2024

Results filenames handling fix #1926

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output_path may break postprocessing #1918

output_path may break postprocessing #1918

artemorloff commented Jun 3, 2024

thehir0 commented Jun 3, 2024

KonradSzafer commented Jun 5, 2024

StellaAthena commented Jun 7, 2024

output_path may break postprocessing #1918

output_path may break postprocessing #1918

Comments

artemorloff commented Jun 3, 2024

thehir0 commented Jun 3, 2024

KonradSzafer commented Jun 5, 2024

StellaAthena commented Jun 7, 2024