Starred repositories
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
OCR, layout analysis, reading order, table recognition in 90+ languages
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Build real-time multimodal AI applications 🤖🎙️📹
rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
React app for inspecting, building and debugging with the Realtime API
This node is primarily based on Easy-OCR to implement OCR text recognition functionality.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
A tool for Python developers to easily debug the HTTP(S) client requests in a Python program.
Things you can do with the token embeddings of an LLM
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
A JAX research toolkit for building, editing, and visualizing neural networks.
A JavaScript library that brings vector search and RAG to your browser!
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
A curated list of awesome open-source libraries for production LLM
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
An AI personal tutor built with Llama 3.1
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
The all-in-one solution for RAG. Build, scale, and deploy state of the art Retrieval-Augmented Generation applications
Automatic Korean word spacing with Python
This repository is an implementation of inferring the PaliGemma Vision Language Model on Android using Hugging Face-Gradio Client API for tasks such as zero-shot object detection, image captioning …
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.