Stars
A toolkit to create optimal Production-ready RAG setup for your data
A memory-efficient implementation of DenseNets
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
An enterprise friendly way of detecting and preventing secrets in code.
The Security Toolkit for LLM Interactions
A Python library to perform NER on structured data and generate PII with Faker
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
A version 1.1 of the Alexander Koch low cost robot arm with some small changes.
The Universe of Data. All about data, data science, and data engineering
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Sample Python code for comparing documents using MinHash
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
MinHash implementation in Python
A vector search SQLite extension that runs anywhere!
SGLang is a fast serving framework for large language models and vision language models.
Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
Utilities intended for use with Llama models.
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
A command-line tool for using CommonCrawl Index API at https://index.commoncrawl.org/
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence