WenhaoZhang-Git

Follow

2rd WenhaoZhang-Git

Follow

2 followers · 21 following

Block or Report

Block or report WenhaoZhang-Git

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Stars

125 results for source starred repositories

modelscope / data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！

Python 1,653 108 Updated Jun 21, 2024

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,018 285 Updated Jun 22, 2024

ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python 17,808 1,829 Updated Apr 30, 2024

shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，开箱即用。

Python 5,305 1,075 Updated May 17, 2024

jiangnanboy / jcorrector

jcorrector 中文文本纠错工具， Text Error Correction Tool，Spelling Check

Java 46 14 Updated Jan 18, 2023

clash-verge-rev / clash-verge-rev

Continuation of Clash Verge - A Clash Meta GUI based on Tauri (Windows, MacOS, Linux)

TypeScript 23,975 1,800 Updated Jun 22, 2024

2dust / clashN

A clash client for Windows, support Mihomo

C# 4,572 581 Updated May 5, 2024

2dust / v2rayN

A GUI client for Windows, support Xray core and v2fly core and others

C# 63,522 10,814 Updated Jun 22, 2024

wangyuxinwhy / uniem

unified embedding model

Python 778 58 Updated Sep 1, 2023

transpect / docx2tex

Converts Microsoft Word docx to LaTeX

XSLT 503 48 Updated Jun 18, 2024

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 924 81 Updated Jun 22, 2024

01-ai / Yi

A series of large language models trained from scratch by developers @01-ai

Python 7,400 453 Updated Jun 19, 2024

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,701 101 Updated Jun 22, 2024

clinicalml / TabLLM

Python 251 41 Updated Nov 2, 2023

Zjh-819 / LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

2,167 141 Updated Nov 28, 2023

Shenzhi-Wang / Llama3-Chinese-Chat

This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO based on the Meta-Llama-3-8B-Instruct model.

277 14 Updated May 6, 2024

VikParuchuri / surya

OCR, layout analysis, reading order, line detection in 90+ languages

Python 8,830 549 Updated Jun 21, 2024

VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy

Python 13,132 650 Updated Jun 17, 2024

QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 5,406 293 Updated Jun 21, 2024

AI4Finance-Foundation / FinGPT

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

Jupyter Notebook 12,465 1,751 Updated Jun 16, 2024

DSXiangLi / DecryptPrompt

总结Prompt&LLM论文，开源数据&模型，AIGC应用

2,316 220 Updated Jun 20, 2024

pany8125 / ShareGPTQAExtractor-mnbvc

MNBVC项目-ShareGPT语料清洗

Python 12 Updated Oct 4, 2023

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,151 219 Updated Jun 18, 2024

LianjiaTech / BELLE

BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）

HTML 7,676 743 Updated Mar 15, 2024

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,371 536 Updated Apr 16, 2024

princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

Python 455 28 Updated Jun 2, 2024

xianshang33 / llm-paper-daily

Daily updated LLM papers. 每日更新 LLM 相关的论文，欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

706 23 Updated Jun 18, 2024

yangjianxin1 / Firefly

Firefly: 大模型训练工具，支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 5,123 469 Updated Jun 7, 2024

yangjianxin1 / Firefly-LLaMA2-Chinese

Firefly中文LLaMA-2大模型，支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型

Python 381 27 Updated Oct 21, 2023

SqueezeAILab / LLM2LLM

[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Python 125 8 Updated Mar 25, 2024