-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TGI support - API evaluation of HF models #869
Comments
Hi! We'd love to move toward hosting models as endpoints to make evaluation faster and more lightweight than using HF models locally. Adding vLLM, TGI, and support for inference on a separate machine / in a subprocess is on the roadmap long-term, but we don't have an ETA on it--if you are interested in helping contribute such a feature, let us know! |
I am, but won't have time over the next couple of weeks and will probably resort to using the lm-eval-harness as is (or add a few tasks) ! Thanks again for the great work ! |
@haileyschoelkopf I've been looking into this idea a bit, as it's something that would be incredibly useful for my organization. One thing I'm curious about is if it is clear what API protocol an external model would need to satisfy to be compatible with lm-eval-harness. For instance, HELM recently introduced support for externally hosted models for the Neurips challenge, where encoding/decoding of tokens is handled externally by the service. That protocol involves three POST endpoints, Is there a single protocol that a vLLM or TGI powered service would have to satisfy to be queryable by lm-eval-harness? Cheers, |
I believe LiteLLM can help with this - we allow you to call TGI LLMs in the Completion Input/Output format |
here's a tutorial on using our openai proxy server to call HF TGI models with lm-evaluation harness UsageStep 1: Start the local proxy litellm --model huggingface/bigcode/starcoder OpenAI Compatible Endpoint at http:https://0.0.0.0:8000/ Step 2: Set OpenAI API Base $ export OPENAI_API_BASE="http:https://0.0.0.0:8000" Step 3: Run LM-Eval-Harness $ python3 main.py \
--model gpt3 \
--model_args engine=huggingface/bigcode/starcoder \
--tasks hellaswag |
That's very cool, thanks ! |
I have a problem with your code snippet @ishaan-jaff: KeyError: 'Could not automatically map huggingface/my_model to a tokeniser. Please use |
@ManuelFay are you on the big refactor branch ? can i see your code
|
Yup big-refactor branch:
|
Agreed - I send you a linkedin request @ManuelFay - you can also DM on discord about this: https://discord.com/invite/wuPM9dRgDw |
Since HF TGI's PR was merged, it should possible to integrate TGI endpoints to the lm-evaluation-harness supported APIs.
Any plans to do so ? This would enable decorrelating the evaluation machine from the served model and largely help facilitate evaluation and hosting !
Thanks a lot for the great work !
The text was updated successfully, but these errors were encountered: