-
This is my home repo
- Melbourne, Australia.
-
20:52
(UTC +11:00) - peterwilliams97.blogspot.com
Stars
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo…
A high-performance topological machine learning toolbox in Python
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
UniTable: Towards a Unified Table Foundation Model
Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Mass document analytics platform based on LlamaIndex, Pgvector, React and Django.
A realtime serving engine for Data-Intensive Generative AI Applications
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Implementation of Nougat Neural Optical Understanding for Academic Documents
Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"
Python implementation of binary and multi-class Venn-ABERS calibration
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
DocILE: Document Information Localization and Extraction Benchmark
You were probably looking for our website... this is it. We moved our website here, so you can see the insides of how we work.
A POSIX-compliant AWK interpreter written in Go, with CSV support
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
Interactive prompt for command-line applications