Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGI support - API evaluation of HF models #869

Open
ManuelFay opened this issue Sep 19, 2023 · 10 comments
Open

TGI support - API evaluation of HF models #869

ManuelFay opened this issue Sep 19, 2023 · 10 comments
Labels
feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.

Comments

@ManuelFay
Copy link
Contributor

Since HF TGI's PR was merged, it should possible to integrate TGI endpoints to the lm-evaluation-harness supported APIs.

Any plans to do so ? This would enable decorrelating the evaluation machine from the served model and largely help facilitate evaluation and hosting !

Thanks a lot for the great work !

@haileyschoelkopf haileyschoelkopf added help wanted Contributors and extra help welcome. feature request A feature that isn't implemented yet. labels Sep 19, 2023
@haileyschoelkopf
Copy link
Contributor

Hi! We'd love to move toward hosting models as endpoints to make evaluation faster and more lightweight than using HF models locally.

Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term, but we don't have an ETA on it--if you are interested in helping contribute such a feature, let us know!

@ManuelFay
Copy link
Contributor Author

I am, but won't have time over the next couple of weeks and will probably resort to using the lm-eval-harness as is (or add a few tasks) ! Thanks again for the great work !

@sfriedowitz
Copy link

sfriedowitz commented Oct 23, 2023

Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term

@haileyschoelkopf I've been looking into this idea a bit, as it's something that would be incredibly useful for my organization. One thing I'm curious about is if it is clear what API protocol an external model would need to satisfy to be compatible with lm-eval-harness. For instance, HELM recently introduced support for externally hosted models for the Neurips challenge, where encoding/decoding of tokens is handled externally by the service. That protocol involves three POST endpoints, /encode, /decode, and /process.

Is there a single protocol that a vLLM or TGI powered service would have to satisfy to be queryable by lm-eval-harness?

Cheers,
Sean

@ishaan-jaff
Copy link

ishaan-jaff commented Nov 1, 2023

I believe LiteLLM can help with this - we allow you to call TGI LLMs in the Completion Input/Output format
Thanks @Vinno97 cc @ManuelFay @haileyschoelkopf

@ishaan-jaff
Copy link

here's a tutorial on using our openai proxy server to call HF TGI models with lm-evaluation harness
docs: https://docs.litellm.ai/docs/tutorials/lm_evaluation_harness

Usage

Step 1: Start the local proxy

litellm --model huggingface/bigcode/starcoder

OpenAI Compatible Endpoint at http:https://0.0.0.0:8000/

Step 2: Set OpenAI API Base

$ export OPENAI_API_BASE="http:https://0.0.0.0:8000"

Step 3: Run LM-Eval-Harness

$ python3 main.py \
  --model gpt3 \
  --model_args engine=huggingface/bigcode/starcoder \
  --tasks hellaswag

@ManuelFay
Copy link
Contributor Author

That's very cool, thanks !

@ManuelFay
Copy link
Contributor Author

I have a problem with your code snippet @ishaan-jaff:

KeyError: 'Could not automatically map huggingface/my_model to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

@ishaan-jaff
Copy link

@ManuelFay are you on the big refactor branch ?

can i see your code

  • how you start the litellm proxy
  • the command you're using to call lm harness

@ManuelFay
Copy link
Contributor Author

Yup big-refactor branch:

  • Start proxy: litellm --model "huggingface/manu/llama-oscar-fr"
  • Command to start: python main.py --model openai-completions --model_args engine=huggingface/manu/llama-oscar-fr --tasks hellaswag
    (not sure we should continue this discussion here though, does not relate to the issue)

@ishaan-jaff
Copy link

Agreed - I send you a linkedin request @ManuelFay - you can also DM on discord about this: https://discord.com/invite/wuPM9dRgDw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.
Projects
None yet
Development

No branches or pull requests

4 participants