Stars
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Retrieval and Retrieval-augmented LLMs
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
Source code and data for Like a Good Nearest Neighbor
🐝 GPTSwarm: LLM agents as (Optimizable) Graphs
Easily embed, cluster and semantically label text datasets
🗺️ Data Cleaning and Textual Data Visualization 🗺️
A benchmark to evaluate language models on questions I've previously asked them to solve.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Python packaging and dependency management made easy
A MIT-licensed, deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize.
DSPy: The framework for programming—not prompting—foundation models
Generative Agents: Interactive Simulacra of Human Behavior
Rift: an AI-native language server for your personal AI software engineer
Track and predict the energy consumption and carbon footprint of training deep learning models.
LlamaIndex is a data framework for your LLM applications
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
An autoregressive character-level language model for making more things
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
A curated, but incomplete, list of data-centric AI resources.