[QUESTION] How can I convert checkpoint tuned by EE-Tuning to Huggingface format? #15

Mr-lonely0 · 2024-06-11T14:46:13Z

I have fine-tuned the llama-7b model using EE-Tuning, and I now need to convert the checkpoint to the Hugging Face format to proceed with the evaluation process. How should I do this?

pan-x-c · 2024-06-12T02:37:23Z

Same as #10 and #7. There is no way to convert the checkpoint to the huggingface format currently.

Mr-lonely0 · 2024-06-12T02:59:47Z

Thanks for your information!

I am also curious about how I can reproduce the results demonstrated in your paper and perform the downstream evaluation on the HELM benchmark. Could you please provide more details on this?

pan-x-c · 2024-06-12T03:51:17Z

We modified the MegatronClient, adding parameters related to EE-LLM. All other parts are directly inherited from HELM.

Mr-lonely0 · 2024-06-12T07:21:28Z

Actually, I'm not familiar with HELM. Could you provide some demos or guidance on how to use the script MegatronClient?

pan-x-c · 2024-06-12T08:00:44Z

You can refer to the demo in data-juicer.
Note that HELM itself is a heavy evaluation framework, and there are many difficulties in its installation and usage. You may need to go to the Helm official repository for help

Mr-lonely0 · 2024-06-12T08:09:03Z

Really appreciate for your help!
I'll check the demo you mentioned and give it a try.
Thanks again for your time!

Mr-lonely0 · 2024-06-16T13:31:56Z

Hello again!

I have tried the evaluation framework proposed in data-juicer and get some benchmark results, such as ROUGE-2 in CNN/DM, F1 score in NarrativeQA, and EM in MMLU. However, I'm confused about how can I get efficiency results like inference time throughout the generation process.

What should I modify in mymodel_example.yaml to parse the corresponding metric from HELM output?

I would greatly appreciate your help and look forward to your prompt response.

pan-x-c · 2024-06-17T02:18:50Z

If you use the HELM provided by Data-Juicer, you can modify src/helm/benchmark/static/schema.yaml to adjust the metrics. For example, we modified the efficiency item to:

  - name: efficiency
    display_name: Efficiency
    metrics:
    - name: inference_runtime
      split: ${main_split}

The inference_runtime is the metric used in our paper.

In addition, you also need to modify your megatron_client.py to return the new metric in your response. For example,

        return RequestResult(
            success=True,
            cached=cached,
            request_time=response['request_time'],
            request_datetime=response['request_datetime'],
            completions=completions,
            embedding=[]
        )

Helm will use the request_time field to calculate the inference_time metric.

Note that the demo script provided by Data-Juicer is not for EE models, it only records some metrics for pretraining.
To view the full evaluation result, you should refer to the standard usage process of HELM, e.g. helm-server after helm-summarize.

Mr-lonely0 · 2024-06-18T14:05:50Z

Thank you!

I have tested the standard usage process of HELM with the original llama-2. On the helm-server generated website, I noticed that there are no efficiency metrics recorded in the leaderboard presented by HELM.

However, I found the Observed inference runtime (s) in the Predictions section for the corresponding dataset (cnn_dailymail as shown below).

Could you please clarify how I can obtain the efficiency metrics if the former is correct? Alternatively, why is there only one count in the Predictions section where I set the --max-eval-instances=100 if the latter is correct?

pan-x-c · 2024-06-19T02:33:38Z

Your client must return those metrics in response, and then HELM can summarize them, so you need to modify your client first, as shown in my previous comment.
For example, HELM will use request_time field in the response to calculate the inference_runtime metric.

And in our paper's experiment, we set --max-eval-instances to 500.

Mr-lonely0 · 2024-06-19T02:39:16Z

Thanks!
I have figured it out. Really appreciate your time!

github-actions · 2024-08-18T18:30:33Z

Marking as stale. No activity in 60 days.

pan-x-c mentioned this issue Jun 17, 2024

How to add a metric to parse from HELM output in wandb_writer.py? modelscope/data-juicer#330

Closed

3 tasks

pan-x-c mentioned this issue Jul 11, 2024

[QUESTION] Questions on the performance of EE-LLM on HELM benchmark #18

Open

github-actions bot added the stale label Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How can I convert checkpoint tuned by EE-Tuning to Huggingface format? #15

[QUESTION] How can I convert checkpoint tuned by EE-Tuning to Huggingface format? #15

Mr-lonely0 commented Jun 11, 2024

pan-x-c commented Jun 12, 2024

Mr-lonely0 commented Jun 12, 2024

pan-x-c commented Jun 12, 2024

Mr-lonely0 commented Jun 12, 2024

pan-x-c commented Jun 12, 2024

Mr-lonely0 commented Jun 12, 2024

Mr-lonely0 commented Jun 16, 2024

pan-x-c commented Jun 17, 2024 •

edited

Loading

Mr-lonely0 commented Jun 18, 2024 •

edited

Loading

pan-x-c commented Jun 19, 2024

Mr-lonely0 commented Jun 19, 2024

github-actions bot commented Aug 18, 2024

[QUESTION] How can I convert checkpoint tuned by EE-Tuning to Huggingface format? #15

[QUESTION] How can I convert checkpoint tuned by EE-Tuning to Huggingface format? #15

Comments

Mr-lonely0 commented Jun 11, 2024

pan-x-c commented Jun 12, 2024

Mr-lonely0 commented Jun 12, 2024

pan-x-c commented Jun 12, 2024

Mr-lonely0 commented Jun 12, 2024

pan-x-c commented Jun 12, 2024

Mr-lonely0 commented Jun 12, 2024

Mr-lonely0 commented Jun 16, 2024

pan-x-c commented Jun 17, 2024 • edited Loading

Mr-lonely0 commented Jun 18, 2024 • edited Loading

pan-x-c commented Jun 19, 2024

Mr-lonely0 commented Jun 19, 2024

github-actions bot commented Aug 18, 2024

pan-x-c commented Jun 17, 2024 •

edited

Loading

Mr-lonely0 commented Jun 18, 2024 •

edited

Loading