Skip to content
View MaxHalford's full-sized avatar

Sponsors

@AdilZouitine
@casperdcl
@raphaelsty

Highlights

  • Pro

Organizations

@online-ml @carbonfact

Block or report MaxHalford

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🔬 Document processing

20 repositories

Transforms PDF, Documents and Images into Enriched Structured Data

JavaScript 5,823 310 Updated Dec 3, 2023

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 3,842 442 Updated Nov 13, 2024

An OCR evaluation tool

Python 64 14 Updated Oct 11, 2024

🪼 a python library for doing approximate and phonetic matching of strings.

Jupyter Notebook 2,067 160 Updated Oct 29, 2024

A tool for handwritten text (straight and skewed) line segmentation based on a statistical approach.

C++ 39 21 Updated Jun 29, 2018

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 4,905 470 Updated Aug 15, 2024

A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases

Rust 7,298 173 Updated Nov 8, 2024

Community maintained fork of pdfminer - we fathom PDF

Python 5,947 930 Updated Aug 2, 2024

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

Haskell 4,083 726 Updated Oct 3, 2024

Receipt Scanner Prototype using AngularJS, (PYTHON) Flask & OpenCV. University full-stack SPA web app course project 2014.

JavaScript 118 47 Updated Dec 7, 2022

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Python 4,901 415 Updated Oct 26, 2024

Line based ATR Engine based on OCRopy

Python 1,048 209 Updated Nov 12, 2024

A synthetic data generator for text recognition

Python 3,283 977 Updated Jul 18, 2024

A Python module to convert natural language numerics into ints and floats.

Python 224 23 Updated Sep 26, 2024

Custom recipe and utilities for document processing

Python 198 20 Updated Jun 19, 2022

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 24,477 3,158 Updated Sep 24, 2024

竜 TatSu generates Python parsers from grammars in a variation of EBNF

Python 408 48 Updated Nov 6, 2024

🔖 A toolkit for making domain-specific probabilistic parsers

Python 797 82 Updated Sep 26, 2024

A Python library for reading and writing PDF, powered by QPDF

Python 2,182 191 Updated Nov 13, 2024