Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The prompt engineering leaderboard is cut off in half... #51

Closed
zhimin-z opened this issue Feb 21, 2024 · 5 comments
Closed

The prompt engineering leaderboard is cut off in half... #51

zhimin-z opened this issue Feb 21, 2024 · 5 comments

Comments

@zhimin-z
Copy link
Contributor

zhimin-z commented Feb 21, 2024

image
The second leaderboard seems to be part of the first one (since it does not come with any name on top like the first one) but it is cut off mysteriously...
Check https://llm-eval.github.io/pages/leaderboard/pe.html

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Feb 21, 2024

It seems the unified leaderboard looks like below:
image
However, many evaluation results are missing from the table in this case...
@madhavMathur @jindongwang @msftgits @dnfclas

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Feb 21, 2024

Additionally, would you consider merging the initial cells in the first row into a single cell? Currently, the segmented display somehow detracts from the overall readability and aesthetic appeal.
image
I created a PR accordingly and hope you to take a look :)

@icecream-and-tea
Copy link
Contributor

Thank you for your attention and advice!
The first and second issues stem from a common cause: different prompt engineering methods are applicable to different tasks.
The 'Least to Most' method aims to help LLM solve complex problems through Decomposition and Subproblem solving.
image
So compared with methods in first table, 'Least to Most' is more applicable to math, symbolic tasks, rather than common sense reasoning, which led to this method using different datasets from the first table. Finally, we presented results to two tables.

@zhimin-z
Copy link
Contributor Author

zhimin-z commented Feb 21, 2024

Thank you for your attention and advice! The first and second issues stem from a common cause: different prompt engineering methods are applicable to different tasks. The 'Least to Most' method aims to help LLM solve complex problems through Decomposition and Subproblem solving. image So compared with methods in first table, 'Least to Most' is more applicable to math, symbolic tasks, rather than common sense reasoning, which led to this method using different datasets from the first table. Finally, we presented results to two tables.

ok, that makes more sense. As I said, the current table looks split, and hope there could be a better display for demonstration purposes. Would you mind taking a look at my llm-eval/llm-eval.github.io#4?

@jindongwang
Copy link
Collaborator

@zhimin-z Merged your PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants