Lists (1)
Sort Name ascending (A-Z)
Stars
Weaponizing WaybackUrls for Recon, BugBounties , OSINT, Sensitive Endpoints and what not
A framework for the evaluation of autoregressive code generation language models.
Work in progress transmit from Google Code
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code inclu…
Demos, examples and utilities using PyMuPDF
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
The hub for EleutherAI's work on interpretability and learning dynamics
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.
Convert PDF to markdown quickly with high accuracy
A tool for extracting plain text from Wikipedia dumps
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!
Truly universal encoding detector in pure Python
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
Doing simple retrieval from LLM models at various context lengths to measure accuracy
TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)
Web Content Extraction Benchmark
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML