-
The Hong Kong University of Science and Technology
- jxhe.github.io
- @junxian_he
Stars
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
OpenPI dataset for tracking entities in open domain procedural text
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Entropy Based Sampling and Parallel CoT Decoding
O1 Replication Journey: A Strategic Progress Report – Part I
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
[NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
🙌 OpenHands: Code Less, Make More
The model, data and code for the visual GUI Agent SeeClick
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Recipes to train reward model for RLHF.
A series of math-specific large language models of our Qwen2 series.
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
✨✨Latest Advances on Multimodal Large Language Models
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
A benchmark that challenges language models to code solutions for scientific problems
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise descriptions to help readers g…
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
DSPy: The framework for programming—not prompting—foundation models
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)