Stars
Extract frames and motion vectors from H.264 and MPEG-4 encoded video.
A utility tool to create a tarball of existing objects in Amazon S3
Any Install makes easier to maintain installer shell scripts by its own manifest DSL.
Verification of the effect of speculative decoding in Japanese.
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
Integrate GraphQL with your Pydantic models
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A library for squeakily cleaning and filtering language datasets.
Checkpointable dataset utilities for foundation model training
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Few Shot Text Classification with Large Language Models
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the informati…
Support Continual pre-training & Instruction Tuning forked from llama-recipes
DagStream is the Python package in order to manage relationship between functions, especially for data-preprocessing functions for machine learning applications.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
Libraries for efficient and scalable group-structured dataset pipelines.
A collection of useful audio datasets and transforms for PyTorch.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.
Easily create large video dataset from video urls