-
Waseda University
- Tokyo
- https://o-suke12.github.io/
Highlights
- Pro
Block or Report
Block or report O-suke12
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Reference implementation for DPO (Direct Preference Optimization)
A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
This is the repository for our paper "Learning Latent Representations to Co-Adapt to Humans" [http:https://arxiv.org/abs/2212.09586]
A minimum example of aligning language models with RLHF similar to ChatGPT
A guide to contributing to open source
A collection of MARL benchmarks based on TorchRL
This is a repository for Hidden-utility Self-Play.
Official codebase for Generating Diverse Cooperative Agents by Learning Incompatible Policies (notable-top-25% @ ICLR 2023)
A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates!
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method
A benchmark environment for fully cooperative human-AI performance.
An index of algorithms for offline reinforcement learning (offline-rl)
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization