Starred repositories
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Incorporating Agile methodology into agents to create complex real-world softwares
Adaptive Swarm Retrieval: A Hierarchical Agent-Based Approach for Blind and Stateful Information Retrieval
The repository for all Azure OpenAI Samples complementing the OpenAI cookbook.
Code for paper "Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement", Neurips 2023
A collection of benchmarks and datasets for evaluating LLM.
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
In Generative AI with Large Language Models (LLMs), you’ll learn the fundamentals of how generative AI works, and how to deploy it in real-world applications.
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Self-Supervised Euphemism Detection and Identification for Content Moderation, IEEE S&P (Oakland) 2021
Deep learning based content moderation from text, audio, video & image input modalities.
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
Finetune LLM to convert an invoice or receipt image to receipt XML or JSON object.
Enforce the output format (JSON Schema, Regex etc) of a language model
Use commands in English to control Blender with OpenAI's GPT-4
Refine high-quality datasets and visual AI models
Patient Flow Analysis using Sankey Plotly Diagram
MedAlign is a clinician-generated dataset for instruction following with electronic medical records.
A machine learning software for extracting information from scholarly documents
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377