Add Selfcheckgpt evaluation to tasks #1080

erenup · 2023-12-07T21:03:04Z

No description provided.

accesslint

There are accessibility issues in these changes.

lm_eval/tasks/selfcheckgpt/README.md

CLAassistant · 2023-12-07T21:03:11Z

All committers have signed the CLA.

accesslint

👏 You fixed the issue(s)! Great work.

erenup · 2023-12-07T21:10:14Z

I am trying to add a new task for llm hallucination. This is the very initial code. There is something I'd like to discuss with everyone about:

How to generate multiple generations for one prompt elegantly？This could be very useful for many llm self-consistent evaluations.

I have implemented one possible solution. But I am not sure it is the best one.

Thank you very much.

haileyschoelkopf

I've left a few comments as review! I believe the functionality you want (N generations per document created) is already supported without any novel YAML config options. Please let me know if I am misunderstanding your intended functionality

lm_eval/tasks/selfcheckgpt/selfcheckgpt.yaml

lm_eval/tasks/selfcheckgpt/utils.py

lm_eval/api/task.py

erenup · 2023-12-18T20:13:06Z

hi @haileyschoelkopf @lintangsutawika, I have refactored the code so that it has its own task.py to deal with its unique multiple generations kwargs. In order to make it to be recognized by the llm-eval. I imported the selfcheckgpt in task/init.py which is similar to squadv2. Thank you very much.

StellaAthena · 2023-12-18T22:32:31Z

@erenup Thank you for your contribution to our library. We will not be able to merge this PR until you do the following:

Sign the CLA
Change the readme to accord with our template. It currently looks like you copied the readme from the official code, which will confuse users as it refers to code not found in this repo and uses the term "we" to refer to people other than EleutherAI.
Run some models through the code and compare results with the official implementation. Results on LLaMA, RWKV, Pythia, and T0 would cover all the major bases.

erenup · 2023-12-23T00:16:43Z

hi @StellaAthena Thank you.

Sign the CLA.
- done
Change the readme to accord with our template. It currently looks like you copied the readme from the official code, which will confuse users as it refers to code not found in this repo and uses the term "we" to refer to people other than EleutherAI.
- done
Run some models through the code and compare results with the official implementation. Results on LLaMA, RWKV, Pythia, and T0 would cover all the major bases.
- I have run some experiments and the results should be the same as the official repo since the evaluation code in task.py copies the selfchckgpt readme API and does not change other things. below numbers are results:
some results:
-- gpt-6b on SelfCheckNLI

export SELFCHECKGPTDEVICE=cuda
export SELFCHECKGPTTYPE=SelfCheckNLI
lm_eval --model hf \
    --model_args pretrained=${model_path_gpt-j-6b} \
    --tasks selfcheckgpt \
    --device cuda:0 \
    --batch_size 32

--

-llama2 7b on SelfCheckNLI

export SELFCHECKGPTDEVICE=cuda
export SELFCHECKGPTTYPE=SelfCheckNLI
lm_eval --model hf \
    --model_args pretrained=${model_path_Llama-2-7b-chat-hf} \
    --tasks selfcheckgpt \
    --device cuda:0 \
    --batch_size 32

--

It has been addressed and a new review is needed

StellaAthena · 2024-01-10T21:23:04Z

@lintangsutawika @haileyschoelkopf this is now ready for your review again.

erenup added 2 commits December 8, 2023 04:32

support selfcheckgpt

990839c

support selfcheckgpt

392b8cf

erenup requested review from haileyschoelkopf, lintangsutawika and StellaAthena as code owners December 7, 2023 21:03

accesslint bot reviewed Dec 7, 2023

View reviewed changes

lm_eval/tasks/selfcheckgpt/README.md Outdated Show resolved Hide resolved

support selfcheckgpt

bbb2170

accesslint bot reviewed Dec 7, 2023

View reviewed changes

erenup changed the title ~~Selfcheckgpt~~ Add Selfcheckgpt evaluation to tasks Dec 7, 2023

haileyschoelkopf previously requested changes Dec 8, 2023

View reviewed changes

lm_eval/tasks/selfcheckgpt/selfcheckgpt.yaml Outdated Show resolved Hide resolved

lm_eval/tasks/selfcheckgpt/selfcheckgpt.yaml Outdated Show resolved Hide resolved

lintangsutawika previously requested changes Dec 8, 2023

View reviewed changes

lm_eval/tasks/selfcheckgpt/selfcheckgpt.yaml Outdated Show resolved Hide resolved

lm_eval/tasks/selfcheckgpt/utils.py Outdated Show resolved Hide resolved

add readme

59fa54d

erenup commented Dec 8, 2023

View reviewed changes

lm_eval/api/task.py Outdated Show resolved Hide resolved

erenup added 4 commits December 17, 2023 05:25

change selfcheck to task.py

e300652

change selfcheck to task.py

be538e8

support selfchkgpt

0b736db

support selfchkgpt

bda5a7f

erenup added 2 commits December 20, 2023 03:49

change readme

d7f494c

fix nli

44859fc

StellaAthena requested review from lintangsutawika and haileyschoelkopf and removed request for StellaAthena January 2, 2024 19:39

haileyschoelkopf self-assigned this Jan 22, 2024

erenup mentioned this pull request Jan 25, 2024

Suboptimal Performance on Generation Tasks #1353

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Selfcheckgpt evaluation to tasks #1080

Add Selfcheckgpt evaluation to tasks #1080

erenup commented Dec 7, 2023

accesslint bot left a comment

CLAassistant commented Dec 7, 2023 •

edited

accesslint bot left a comment

erenup commented Dec 7, 2023

haileyschoelkopf left a comment

erenup commented Dec 18, 2023

StellaAthena commented Dec 18, 2023 •

edited

erenup commented Dec 23, 2023

StellaAthena commented Jan 10, 2024

Add Selfcheckgpt evaluation to tasks #1080

Are you sure you want to change the base?

Add Selfcheckgpt evaluation to tasks #1080

Conversation

erenup commented Dec 7, 2023

accesslint bot left a comment

Choose a reason for hiding this comment

CLAassistant commented Dec 7, 2023 • edited

accesslint bot left a comment

Choose a reason for hiding this comment

erenup commented Dec 7, 2023

haileyschoelkopf left a comment

Choose a reason for hiding this comment

erenup commented Dec 18, 2023

StellaAthena commented Dec 18, 2023 • edited

erenup commented Dec 23, 2023

StellaAthena commented Jan 10, 2024

CLAassistant commented Dec 7, 2023 •

edited

StellaAthena commented Dec 18, 2023 •

edited