Skip to content
View qiugen's full-sized avatar

Block or report qiugen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 34,024 4,190 Updated Nov 10, 2024

记录本人整理的一些数据集

1,003 131 Updated Jun 16, 2022

All-in-one text de-duplication

Python 619 71 Updated May 21, 2024

TencentLLMEval is a comprehensive and extensive benchmark for artificial evaluation of large models that includes task trees, standards, data verification methods, and more.

38 1 Updated Aug 20, 2024

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,569 350 Updated Oct 17, 2024

Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.

Python 6 2 Updated Mar 25, 2020

A tool for extracting plain text from Wikipedia dumps

Python 3,748 967 Updated May 23, 2024

This is a repository using the Wiki Extractor to build and prepare WIKIPEDIA for use in tensorflow.

Python 1 Updated Jul 21, 2018

We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scale 1 to 100) generated though human evaluations that represen…

81 14 Updated Aug 31, 2021

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook 1,514 242 Updated Oct 23, 2024

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,889 344 Updated Oct 18, 2024

活字通用大模型

Python 352 21 Updated Sep 12, 2024

沉浸式双语网页翻译扩展 , 支持输入框翻译, 鼠标悬停翻译, PDF, Epub, 字幕文件, TXT 文件翻译 - Immersive Dual Web Page Translation Extension

14,208 786 Updated Nov 5, 2024

TigerBot: A multi-language multi-task LLM

Python 2,240 194 Updated Jun 7, 2024

12306 订票程序,自动登录,自动下单

Python 23 10 Updated Jan 17, 2023

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 94,632 15,313 Updated Nov 9, 2024

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

HTML 7,908 759 Updated Oct 16, 2024

Aligning pretrained language models with instruction data generated by themselves.

Python 1 Updated Mar 10, 2023

Personal short implementations of Machine Learning papers

Jupyter Notebook 232 53 Updated Jan 6, 2024

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python 4,499 473 Updated Jan 8, 2024

天涯 kkndme 神贴聊房价

18,777 3,828 Updated Aug 27, 2023

Code for "Learning to summarize from human feedback"

Python 990 143 Updated Sep 5, 2023

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,614 127 Updated Sep 19, 2023

ONNX Model Exporter for PaddlePaddle

Python 728 172 Updated Nov 8, 2024

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

Python 1,013 223 Updated Nov 8, 2024

Windows Calculator: A simple yet powerful calculator that ships with Windows

C++ 29,734 5,389 Updated Nov 7, 2024

Learn Classical Statistical Machine Translation Systems.

Python 18 8 Updated May 27, 2020

GIZA++ is a statistical machine translation toolkit that is used to train IBM Models 1-5 and an HMM word alignment model. This package also contains the source for the mkcls tool which generates th…

C++ 264 83 Updated Mar 31, 2023

TensorFlow code and pre-trained models for BERT

Python 38,167 9,600 Updated Jul 23, 2024
Next