Skip to content
View WenhaoZhang-Git's full-sized avatar
Block or Report

Block or report WenhaoZhang-Git

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!

Python 1,643 108 Updated Jun 19, 2024

A series of large language models developed by Baichuan Intelligent Technology

Python 4,015 284 Updated May 22, 2024

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python 17,793 1,829 Updated Apr 30, 2024

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。

Python 5,302 1,075 Updated May 17, 2024

jcorrector 中文文本纠错工具, Text Error Correction Tool,Spelling Check

Java 45 14 Updated Jan 18, 2023

A Clash GUI based on tauri. Supports Windows, macOS and Linux.

TypeScript 20,579 3,086 Updated Nov 3, 2023

Continuation of Clash Verge - A Clash Meta GUI based on Tauri (Windows, MacOS, Linux)

TypeScript 23,645 1,780 Updated Jun 19, 2024

A clash client for Windows, support Mihomo

C# 4,562 580 Updated May 5, 2024

A GUI client for Windows, support Xray core and v2fly core and others

C# 63,423 10,806 Updated Jun 18, 2024

unified embedding model

Python 778 58 Updated Sep 1, 2023

Converts Microsoft Word docx to LaTeX

XSLT 504 48 Updated Jun 18, 2024

Minimalistic large language model 3D-parallelism training

Python 917 80 Updated Jun 18, 2024

A series of large language models trained from scratch by developers @01-ai

Python 7,394 453 Updated Jun 19, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,686 100 Updated Jun 19, 2024
Python 250 41 Updated Nov 2, 2023

A quick guide (especially) for trending instruction finetuning datasets

2,163 140 Updated Nov 28, 2023

This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO based on the Meta-Llama-3-8B-Instruct model.

275 14 Updated May 6, 2024

OCR, layout analysis, reading order, line detection in 90+ languages

Python 8,782 548 Updated Jun 15, 2024

Convert PDF to markdown quickly with high accuracy

Python 12,997 639 Updated Jun 17, 2024

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 5,315 290 Updated Jun 17, 2024

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

Jupyter Notebook 12,442 1,741 Updated Jun 16, 2024

总结Prompt&LLM论文,开源数据&模型,AIGC应用

2,303 220 Updated Jun 20, 2024

MNBVC项目-ShareGPT语料清洗

Python 12 Updated Oct 4, 2023

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,145 219 Updated Jun 18, 2024

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

HTML 7,670 743 Updated Mar 15, 2024

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,353 538 Updated Apr 16, 2024

SimPO: Simple Preference Optimization with a Reference-Free Reward

Python 447 27 Updated Jun 2, 2024

Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

699 23 Updated Jun 18, 2024

Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 5,107 468 Updated Jun 7, 2024

Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型

Python 380 27 Updated Oct 21, 2023
Next