Skip to content
View fh-Zh's full-sized avatar
Block or Report

Block or report fh-Zh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 23,217 2,390 Updated Jul 22, 2024

#1 Locally hosted web application that allows you to perform various operations on PDF files

Java 32,169 2,389 Updated Jul 22, 2024

pipeline for docx to json

Jupyter Notebook 2 1 Updated Feb 26, 2024

Distributed get quora questions and answers

Python 3 Updated Jun 15, 2024

本项目主要对开源的MOSS SFT数据进行整理 ,转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面,共353w样本,MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数,共630w样本,

Python 2 Updated Dec 3, 2023

OCR, layout analysis, reading order, line detection in 90+ languages

Python 1 Updated Apr 25, 2024

Extract Keywords from sentence or Replace keywords in sentences.

Python 5,571 598 Updated Jul 3, 2024

Your Next SaaS Template or Boilerplate ! A magic trip start with `bun create saasfly` . The more stars, the more surprises

TypeScript 1,457 136 Updated Jul 21, 2024

A script engine for "yu-gi-oh!" and sample gui

C++ 1,805 583 Updated Jul 20, 2024

Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.

Python 173 9 Updated Mar 8, 2024

Tools for managing datasets for governance and training.

HTML 76 49 Updated Jun 10, 2024
Java 16 Updated Jun 1, 2024

Content Farm Terminator browser extension/「終結內容農場」瀏覽器套件

JavaScript 1,285 48 Updated Jul 21, 2024

Code used for sourcing and cleaning the BigScience ROOTS corpus

Jupyter Notebook 292 40 Updated Mar 20, 2023

Capturing SSL/TLS plaintext without a CA certificate using eBPF. Supported on Linux/Android kernels for amd64/arm64.

C 9,065 867 Updated Jul 21, 2024

本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作

Python 47 11 Updated Jun 22, 2024

文本去重

Python 62 6 Updated May 23, 2024

this repo is mnbvc text quality classification using fastText

Python 13 1 Updated Oct 2, 2023

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,243 223 Updated Jul 21, 2024

A simple, fast and user-friendly alternative to 'find'

Rust 32,768 786 Updated Jul 22, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 29,562 3,412 Updated Jul 22, 2024

A modern download manager that supports all platforms. Built with Golang and Flutter.

Dart 14,891 1,082 Updated Jul 17, 2024

📚 Jupyter notebook tutorials for OpenVINO™

Jupyter Notebook 2,196 766 Updated Jul 22, 2024

A simple PDF to LaTeX converter

Rust 13 1 Updated Dec 13, 2023

C++ library for loading XDF files

C++ 13 10 Updated Jul 10, 2024

SysY2022:基于antlr4的词法分析(C++语言实现)

ANTLR 4 Updated Apr 20, 2023

LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt

Jupyter Notebook 4,897 430 Updated Jul 4, 2024

A library to visualize algorithm by tracing your code.

C++ 11 1 Updated May 26, 2024

LAV Filters - Open-Source DirectShow Media Splitter and Decoders

C++ 7,224 786 Updated Jul 1, 2024

code for ACL 2020 paper: FLAT: Chinese NER Using Flat-Lattice Transformer

Python 994 176 Updated May 10, 2022
Next