Add vLLM Invocation Layer #52

LLukas22 · 2023-09-11T14:35:32Z

This contribution introduces a vLLM invocation layer for the prompt model. A primary benefit of utilizing vLLM is its capacity to cache an extensive number of tokens, attributed to the implementation of PagedAttention. This feature offers significant throughput, positioning it as a viable alternative to Hugging Face's Text Generation Inference, especially in light of its recent licensing modifications.

The invocation layer wrapper primarily manages tokenization and ensures that the prompt length remains within the defined limits. Beyond these functions, it is fundamentally built upon the OpenAIInvocationLayer.

TuanaCelik

More of a question rather than a comment. So what would we add to the model_name_or_path? Wold it be one of the model names listed here? Under 'vLLM seamlessly supports many Huggingface models, including the following architectures'
https://github.com/vllm-project/vllm

And then I guess one nitpick but the vLLM requirement below could maybe be mentioned in Installation if they have to install separately to your package 🙏

Thanks for the contribution! This looks great!

LLukas22 · 2023-09-11T18:40:04Z

More of a question rather than a comment. So what would we add to the model_name_or_path? Wold it be one of the model names listed here? Under 'vLLM seamlessly supports many Huggingface models, including the following architectures'
https://github.com/vllm-project/vllm

It depends on what invocation layer you are using. If you are using the vLLMInvocationLayer you have to provide nothing, as the model will be infered form the vLLM server hosting the model, meaning we don't have to know what model is hosted on the server in advance. If you use the vLLMLocalInvocationLayer you have to provide a supported huggingface model, as the model will be downloaded and the inference performed locally.

And then I guess one nitpick but the vLLM requirement below could maybe be mentioned in Installation if they have to install separately to your package 🙏

The vLLM dependency is only required if you want to use the vLLMLocalInvocationLayer and as the main usecase for vLLM is to host a server somewhere on you network on a gpu node, which gets hit by many request to take advantage of the paged-attention, i decided not to include it as a requirement. This has the advantage of not pulling in transformers and pytorch as dependencies if i'm only using the vLLMInvocationLayer, which saves about ~2-3 GB in dependencies.

@LLukas22

Do these updates look ok to you @LLukas22 ?

TuanaCelik · 2023-09-11T19:28:14Z

Thanks for the context @LLukas22 - I tried to create a PR on your fork with some edit suggestions but it didn't work for some reason. Does the commit I made here look good to you? If yes, I will merge it 🙌

integrations/vllm.md

TuanaCelik · 2023-09-11T20:12:01Z

Good catch, comments fixed @LLukas22

LLukas22 · 2023-09-12T06:10:17Z

Should be good to go 👍

Create vllm entry

d1d5bff

TuanaCelik reviewed Sep 11, 2023

View reviewed changes

Update vllm.md

88c086a

Do these updates look ok to you @LLukas22 ?

LLukas22 commented Sep 11, 2023

View reviewed changes

integrations/vllm.md Outdated Show resolved Hide resolved

integrations/vllm.md Outdated Show resolved Hide resolved

Update vllm.md

37e3764

TuanaCelik approved these changes Sep 12, 2023

View reviewed changes

TuanaCelik merged commit 4f6a9da into deepset-ai:main Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM Invocation Layer #52

Add vLLM Invocation Layer #52

LLukas22 commented Sep 11, 2023

TuanaCelik left a comment

LLukas22 commented Sep 11, 2023

TuanaCelik commented Sep 11, 2023

TuanaCelik commented Sep 11, 2023

LLukas22 commented Sep 12, 2023

Add vLLM Invocation Layer #52

Add vLLM Invocation Layer #52

Conversation

LLukas22 commented Sep 11, 2023

TuanaCelik left a comment

Choose a reason for hiding this comment

LLukas22 commented Sep 11, 2023

TuanaCelik commented Sep 11, 2023

TuanaCelik commented Sep 11, 2023

LLukas22 commented Sep 12, 2023