Skip to content
View junwucs's full-sized avatar
Block or Report

Block or report junwucs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Summarize existing representative LLMs text datasets.

741 66 Updated Jun 15, 2024

Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*

Jupyter Notebook 40 1 Updated Jul 24, 2024

A collection of guides and examples for the Gemma open models from Google.

Jupyter Notebook 298 44 Updated Jul 23, 2024

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python 169 2 Updated Jul 15, 2024

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

Python 248 32 Updated May 19, 2024

Temporal question answering dataset for Wikidata

12 4 Updated Dec 7, 2023

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

Python 4 Updated Jun 27, 2024

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

Jupyter Notebook 35 1 Updated Jun 10, 2024
Python 74 1 Updated Nov 11, 2022

Official implementation of DPFM @ ICLR 2024 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09003)

Python 9 Updated Mar 4, 2024

APPS: Automated Programming Progress Standard (NeurIPS 2021)

Python 377 50 Updated Jun 19, 2024

A (deprecated) framework for building exercises to work with Khan Academy.

HTML 1,610 864 Updated Oct 21, 2020

Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"

Python 17 Updated Mar 1, 2024

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

27 Updated Jul 24, 2024
Python 8 1 Updated Jun 27, 2024

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 2,313 157 Updated Jul 20, 2024

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Python 5,840 724 Updated Jul 22, 2024
Python 10 Updated Jun 4, 2024

The opensoure repository of FuzzLLM

Python 12 2 Updated May 4, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

909 17 Updated Jul 10, 2024

Arena-Hard-Auto: An automatic LLM benchmark.

Jupyter Notebook 347 37 Updated Jul 23, 2024

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

Python 1,523 252 Updated Jul 22, 2024

Scalable toolkit for efficient model alignment

Python 452 48 Updated Jul 25, 2024

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Python 148 12 Updated Jul 24, 2024

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 11,267 762 Updated Jul 10, 2024

A platform for developing AI systems as described in A Roadmap towards Machine Intelligence - http:https://arxiv.org/abs/1511.08130

1,327 210 Updated Sep 16, 2020

Simple language-driven navigation tasks for studying compositional learning

177 27 Updated Nov 5, 2020

Specify what you want it to build, the AI asks for clarification, and then builds it.

Python 51,473 6,695 Updated Jul 23, 2024

🐚 OpenDevin: Code Less, Make More

Python 28,866 3,340 Updated Jul 25, 2024
Next