Skip to content
View MaxHalford's full-sized avatar

Sponsors

@AdilZouitine
@casperdcl
Private Sponsor
@raphaelsty

Highlights

  • Pro

Organizations

@online-ml @carbonfact
Block or Report

Block or report MaxHalford

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🔬 Document processing

20 repositories

Transforms PDF, Documents and Images into Enriched Structured Data

JavaScript 5,716 303 Updated Dec 3, 2023

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 3,331 398 Updated Jun 30, 2024

An OCR evaluation tool

Python 56 12 Updated May 26, 2024

🪼 a python library for doing approximate and phonetic matching of strings.

Jupyter Notebook 2,017 156 Updated Jun 3, 2024

A tool for handwritten text (straight and skewed) line segmentation based on a statistical approach.

C++ 39 21 Updated Jun 29, 2018

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 4,640 449 Updated Mar 7, 2024

A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases

Rust 7,017 169 Updated Jul 5, 2024

Community maintained fork of pdfminer - we fathom PDF

Python 5,633 905 Updated Jul 8, 2024

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

Haskell 4,040 719 Updated Jun 8, 2024

Receipt Scanner Prototype using AngularJS, Flask & OpenCV. University full-stack SPA web app course project 2014.

JavaScript 120 47 Updated Dec 7, 2022

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

Python 4,613 395 Updated Jun 30, 2024

Line based ATR Engine based on OCRopy

Python 1,025 210 Updated Jul 5, 2024

A synthetic data generator for text recognition

Python 3,142 940 Updated May 22, 2024

A Python module to convert natural language numerics into ints and floats.

Python 213 23 Updated May 1, 2023

Custom recipe and utilities for document processing

Python 199 19 Updated Jun 19, 2022

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 22,815 3,006 Updated Jul 3, 2024

竜 TatSu generates Python parsers from grammars in a variation of EBNF

Python 401 47 Updated Jun 5, 2024

🔖 A toolkit for making domain-specific probabilistic parsers

Python 789 85 Updated Apr 27, 2023

A Python library for reading and writing PDF, powered by QPDF

Python 2,084 186 Updated Jul 4, 2024