- Ottawa, ON
- https://www.cedricblondeau.com/
Block or Report
Block or report cedricblondeau
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Open source codebase powering the HuggingChat app
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
The Triton TensorRT-LLM Backend
Fast inference engine for Transformer models
LLMPerf is a library for validating and benchmarking LLMs
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Declarative Continuous Deployment for Kubernetes
Netflix's Hystrix latency and fault tolerance library, for Go
✨ Textbase is a simple framework for building AI chatbots. ✨
Rich is a Python library for rich text and beautiful formatting in the terminal.
The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser.
🤖 The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformer…
Chat with your favourite LLaMA models in a native macOS app
A Gradio web UI for Large Language Models.
French instruction-following and chat models