Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Inference Engine extensions to Backend #2783

Closed
Tracked by #2782 ...
louis-jan opened this issue Apr 22, 2024 · 1 comment
Closed
Tracked by #2782 ...

Refactor Inference Engine extensions to Backend #2783

louis-jan opened this issue Apr 22, 2024 · 1 comment
Assignees

Comments

@louis-jan
Copy link
Contributor

Description:

All current inference engines primarily function within the browser host. We now require them to be refactored to operate on the backend. This adjustment will enable UI components to simply request with minimal model knowledge, such as the model's ID and messages.

Without requiring extensive knowledge of running the model, the client can simply send an OpenAI-compatible request without additional parameters. Models will be loaded using default settings, read from model.json, on the server side.
See: #2758

This approach would also contribute to scaling the model hub, as clients can easily retrieve the latest supported model list from the backend, which is dynamically updatable.

graph LR
UI[UI Components]-->|chat/completion| Backend
Backend -->|retrieve| Model[model.json]
Model-->|settings| Load[Model Loader]
Load-->|inference| Inference[Inference Engines]
Inference -->|Response| UI
Loading
@imtuyethan
Copy link
Contributor

deprecated

@imtuyethan imtuyethan closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants