Skip to content
View sisterdong's full-sized avatar
Block or Report

Block or report sisterdong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Demos, examples and utilities using PyMuPDF

Jupyter Notebook 500 139 Updated Jun 13, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 6,671 970 Updated Jun 19, 2024

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,103 154 Updated Jun 18, 2024

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

125 9 Updated Jan 3, 2022

Convert PDF to markdown quickly with high accuracy

Python 12,994 638 Updated Jun 17, 2024

A tool for extracting plain text from Wikipedia dumps

Python 3,669 955 Updated May 23, 2024

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

Python 2,083 205 Updated Jun 17, 2024

Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!

Jupyter Notebook 1,610 305 Updated Apr 6, 2024

Truly universal encoding detector in pure Python

Python 536 49 Updated Jun 19, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,686 100 Updated Jun 19, 2024

[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".

Python 194 15 Updated May 31, 2024

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

Python 1,388 230 Updated Jun 18, 2024

Public Inflection Benchmarks

67 2 Updated Mar 6, 2024

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

2,809 654 Updated Jun 19, 2024

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,237 124 Updated Jun 20, 2024

An Awesome Collection for LLM Survey

245 22 Updated May 2, 2024

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)

Python 163 10 Updated Nov 17, 2023

非常全的文言文(古文)-现代文平行语料

Python 941 204 Updated Apr 21, 2024

KenLM: Faster and Smaller Language Model Queries

C++ 2,429 508 Updated Feb 25, 2024

Thin wrapper for "pandoc" (MIT)

Python 832 108 Updated Jun 4, 2024

🕷️ The pipeline for the OSCAR corpus

Rust 153 14 Updated Dec 18, 2023

Web Content Extraction Benchmark

Python 14 1 Updated May 24, 2024

Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.

Python 3,113 234 Updated Jun 19, 2024

Modin: Scale your Pandas workflows by changing a single line of code

Python 9,557 647 Updated Jun 19, 2024

GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)

477 41 Updated Aug 31, 2021

Chat Templates for 🤗 HuggingFace Large Language Models

Jinja 307 28 Updated Jun 7, 2024

兜哥出品 <一本开源的NLP入门书籍>

Python 2,264 556 Updated Feb 11, 2020

Universal markup converter

Haskell 33,022 3,290 Updated Jun 19, 2024
Next