Block or Report
Block or report fh-Zh
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
#1 Locally hosted web application that allows you to perform various operations on PDF files
Distributed get quora questions and answers
本项目主要对开源的MOSS SFT数据进行整理 ,转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面,共353w样本,MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数,共630w样本,
MIracleyin / surya
Forked from VikParuchuri/suryaOCR, layout analysis, reading order, line detection in 90+ languages
Extract Keywords from sentence or Replace keywords in sentences.
Your Next SaaS Template or Boilerplate ! A magic trip start with `bun create saasfly` . The more stars, the more surprises
A script engine for "yu-gi-oh!" and sample gui
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
Tools for managing datasets for governance and training.
Content Farm Terminator browser extension/「終結內容農場」瀏覽器套件
Code used for sourcing and cleaning the BigScience ROOTS corpus
Capturing SSL/TLS plaintext without a CA certificate using eBPF. Supported on Linux/Android kernels for amd64/arm64.
本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作
this repo is mnbvc text quality classification using fastText
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
A simple, fast and user-friendly alternative to 'find'
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A modern download manager that supports all platforms. Built with Golang and Flutter.
📚 Jupyter notebook tutorials for OpenVINO™
LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt
A library to visualize algorithm by tracing your code.
LAV Filters - Open-Source DirectShow Media Splitter and Decoders
code for ACL 2020 paper: FLAT: Chinese NER Using Flat-Lattice Transformer