Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Repo: "Open Source, Daily Auto-Generated SOTA LLM Model Benchmarks" #505

Open
Marviel opened this issue Mar 29, 2023 · 1 comment

Comments

@Marviel
Copy link

Marviel commented Mar 29, 2023

✨ Open Source Daily Auto-Generated SOTA LLM Model-Comparisons Repository

(Sorry for posting here, just not sure where to ask)

Does ^^ this already exist?

If So, Where??

It's hard to keep up.

All the open source LLM/AI repositories are becoming impossible for basically any human to keep pace with.

There are many awesome "snippets" that are posted in public channels, but not all models hold up well and generalize after practice.

💭 Feature Requests

Here's what I'd like to see: (Please add your own in comments)

License

  • GPLv3.0 Affero

Update Cadence

  • Daily Evaluation Runs -- auto-updating the Github Repo with up-to-date Evaluation Results as described below.

Eval Result DB

Description

Each time the cron job is run (daily) the evaluations should be written into a database.

Properties

The Output of the Cron Job should be a set of entries into the EvalDB, showing:
(1) The prompt / Input
(2) The model, including its known current parameters and limitations at the time of the eval run
(3) The output

Up-To-Date Model DB

Model DB Description

There should be an updated database of models which is displayed on the README.md

Model DB Properties

This should include the following:

API or Self-Hosted

This is critical for both speed and price reasons.

Modes & Mode Parameters

  • Text
    • Chat vs. Standard Completion
    • Cost-Per-Token
    • Average Speed-Per-Token
    • Context Length
    • Unicode-Support (most models?)
  • Image
    • Max Context Size
    • Cost-Per-Pixel(?)
  • Video
    • Max Context Length
    • Max Input

Included Training Datasets

Tags

A free-form field for anything that doesn't fit into the above schema.

Does Something Like This Exist?

?????

What else do we need?

?????

I'm trying to make something to fill this niche myself and will link here shortly.

@Marviel Marviel changed the title RFC: "Open Source Daily Auto-Generated SOTA LLM Model-Comparisons Repository" Request for Repo: "Open Source Daily Auto-Generated SOTA LLM Model-Comparisons Repository" Mar 29, 2023
@Marviel Marviel changed the title Request for Repo: "Open Source Daily Auto-Generated SOTA LLM Model-Comparisons Repository" Request for Repo: "Open Source Daily Auto-Generated SOTA LLM Model-Comparisons" Mar 29, 2023
@Marviel Marviel changed the title Request for Repo: "Open Source Daily Auto-Generated SOTA LLM Model-Comparisons" Request for Repo: "Open Source, Daily Auto-Generated SOTA LLM Model Benchmarks" Mar 29, 2023
@SierotkaM
Copy link

SierotkaM commented Mar 29, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants