You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rerank models are very useful to empower RAG, help a lot with search on RAG and they are resource intensive. It would be very nice to accelerate rerank via llama.cpp, to make it accessible just like embedding.
Colbert models are a more complex tool, between rerank and embedding, but at the end, just an optimized alternative to rerank, very welcome if supported by llama.cpp too.
Actual implementations are strictly transformers based.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Rerank models are very useful to empower RAG, help a lot with search on RAG and they are resource intensive. It would be very nice to accelerate rerank via llama.cpp, to make it accessible just like embedding.
Colbert models are a more complex tool, between rerank and embedding, but at the end, just an optimized alternative to rerank, very welcome if supported by llama.cpp too.
Actual implementations are strictly transformers based.
https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v1
https://huggingface.co/mixedbread-ai/mxbai-colbert-large-v1
This could allow Open-webui to offload this to Ollama. (open-webui+ollama , maybe the most accessible tools for local RAG)
Beta Was this translation helpful? Give feedback.
All reactions