Skip to content
View peterwilliams97's full-sized avatar

Block or report peterwilliams97

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

C 4,063 418 Updated Aug 14, 2024

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo…

Python 4,197 293 Updated Nov 1, 2024

A high-performance topological machine learning toolbox in Python

Python 853 174 Updated Jun 18, 2024

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

Python 121 12 Updated Nov 2, 2024

UniTable: Towards a Unified Table Foundation Model

Jupyter Notebook 369 27 Updated Jun 4, 2024

Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files

Jupyter Notebook 126 11 Updated Nov 17, 2023

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

Python 12,482 2,993 Updated Nov 2, 2024

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

Python 5,316 448 Updated Sep 23, 2024

Mass document analytics platform based on LlamaIndex, Pgvector, React and Django.

Python 702 56 Updated Nov 1, 2024

A realtime serving engine for Data-Intensive Generative AI Applications

Rust 903 112 Updated Nov 2, 2024

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 5,801 472 Updated Jul 11, 2024

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,928 565 Updated Apr 16, 2024

Python bindings to PDFium

Python 413 17 Updated Oct 30, 2024

Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"

Python 61 9 Updated Jun 13, 2024

IPP sample implementations.

C 225 83 Updated Oct 15, 2024

Data processing with ML and LLM

Python 3,595 373 Updated Oct 24, 2024

Python implementation of binary and multi-class Venn-ABERS calibration

Python 131 12 Updated Sep 10, 2024

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…

Python 2,284 253 Updated Jun 24, 2024

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Python 630 97 Updated Nov 1, 2024

DocILE: Document Information Localization and Extraction Benchmark

Python 117 9 Updated May 15, 2024

10x faster matrix and vector operations

C++ 2,473 171 Updated Oct 12, 2022

You were probably looking for our website... this is it. We moved our website here, so you can see the insides of how we work.

1,554 323 Updated Oct 30, 2024

A POSIX-compliant AWK interpreter written in Go, with CSV support

Go 1,939 84 Updated Sep 18, 2024
Python 11 1 Updated Sep 21, 2021

PDF table extractor

JavaScript 174 63 Updated Apr 3, 2024

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown

JavaScript 4,364 206 Updated Nov 1, 2024

Go module for communicating with the Veryfi OCR API.

Go 22 4 Updated Nov 9, 2023

A PDF processor written in Go.

Go 6,952 479 Updated Nov 1, 2024

Make PDFs easily

Python 314 21 Updated May 4, 2022

Interactive prompt for command-line applications

Go 6,067 336 Updated Aug 6, 2024
Next