-
NAVER Corp
- Seongnam, South Korea
- https://zuoxingdong.github.io/
Block or Report
Block or report zuoxingdong
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152, I…
🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Robust recipes to align language models with human and AI preferences
Code for "Learning to Model the World with Language."
A curated list of reinforcement learning with human feedback resources (continually updated)
RLHF implementation details of OAI's 2019 codebase
Generative Agents: Interactive Simulacra of Human Behavior
精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
MTM Masked Trajectory Models for Prediction, Representation, and Control.
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Mastering Diverse Domains through World Models
Accompanies and reproduces results from the paper "Control Variates for Slate Off-Policy Evaluation"
Victor-YG / PILCO_victor
Forked from nrontsis/PILCOBayesian Reinforcement Learning in Tensorflow
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Train transformer language models with reinforcement learning.
【浅梦学习笔记】文章汇总:包含 排序&CXR预估,召回匹配,用户画像&特征工程,推荐搜索综合 计算广告,大数据,图算法,NLP&CV,求职面试 等内容
Python implementations of contextual bandits algorithms
The Fuzzy Labs guide to the universe of open source MLOps
Recommendations at "Reasonable Scale": joining dataOps with recSys through dbt, Merlin and Metaflow
Behavioral "black-box" testing for recommender systems
This is the official implementation for the paper: "CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System"
An up-to-date, comprehensive and flexible recommendation library