Skip to content
View sisterdong's full-sized avatar

Block or report sisterdong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
225 results for source starred repositories
Clear filter

Weaponizing WaybackUrls for Recon, BugBounties , OSINT, Sensitive Endpoints and what not

Python 263 32 Updated Sep 12, 2024

A framework for the evaluation of autoregressive code generation language models.

Python 782 208 Updated Sep 26, 2024

Work in progress transmit from Google Code

Java 1,107 289 Updated Jan 3, 2018

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...

Python 304 23 Updated Dec 9, 2023

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code inclu…

Python 389 26 Updated Jul 15, 2024

Demos, examples and utilities using PyMuPDF

Jupyter Notebook 555 152 Updated Jul 1, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 6,858 996 Updated Oct 3, 2024

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,228 163 Updated Aug 21, 2024

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

140 10 Updated Jan 3, 2022

Convert PDF to markdown quickly with high accuracy

Python 16,821 955 Updated Sep 7, 2024

A tool for extracting plain text from Wikipedia dumps

Python 3,739 965 Updated May 23, 2024

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

Python 2,135 205 Updated Oct 6, 2024

Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!

Jupyter Notebook 1,937 352 Updated Sep 11, 2024

Truly universal encoding detector in pure Python

Python 573 51 Updated Oct 2, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,971 138 Updated Oct 3, 2024

[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".

Python 208 18 Updated Aug 28, 2024

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

Python 1,814 311 Updated Sep 3, 2024

Public Inflection Benchmarks

69 2 Updated Mar 6, 2024

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

2,900 663 Updated Aug 5, 2024

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,482 152 Updated Aug 17, 2024

An Awesome Collection for LLM Survey

298 30 Updated Sep 12, 2024

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)

Python 172 11 Updated Nov 17, 2023

非常全的文言文(古文)-现代文平行语料

Python 1,137 262 Updated Apr 21, 2024

KenLM: Faster and Smaller Language Model Queries

C++ 2,498 512 Updated Jul 30, 2024

Thin wrapper for "pandoc" (MIT)

Python 875 111 Updated Sep 17, 2024

🕷️ The pipeline for the OSCAR corpus

Rust 162 14 Updated Dec 18, 2023

Web Content Extraction Benchmark

Python 14 4 Updated May 24, 2024

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python 3,502 254 Updated Oct 4, 2024
Next