Block or Report
Block or report sisterdong
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
Demos, examples and utilities using PyMuPDF
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
The hub for EleutherAI's work on interpretability and learning dynamics
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.
Convert PDF to markdown quickly with high accuracy
A tool for extracting plain text from Wikipedia dumps
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!
Truly universal encoding detector in pure Python
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
Doing simple retrieval from LLM models at various context lengths to measure accuracy
TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)
Web Content Extraction Benchmark
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
Modin: Scale your Pandas workflows by changing a single line of code
GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)
Chat Templates for 🤗 HuggingFace Large Language Models