🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
-
Updated
Oct 13, 2023 - HTML
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Simple app to extract text from pictures using Tesseract
技术栈在线总结文档,包含编程语言、数据结构与算法、机器学习、数据库等。
A simple web application built with React which allows to upload images containing text, select the language of the text for recognition, and extract the text from the image. As quick as a finger snap - SnapText.
Article title, authors, date and body extraction dataset.
Go package that cleans a HTML page for better readability.
Tesseract-OCR quick implementation. Linked with stack-overflow question
This is a Project Assignment where I have Learned to Classify the Different Texts Using Clustering Techniques. Natural Language Processing and Clustering both of these Concepts are Being Used. I have Used K-means Clustering Techniques to Implement the Problem.
HR Assistant: Web application for efficient HR recruitment and resume management. Utilizes OCR for text extraction and similarity analysis to rearrange resumes based on job descriptions. Simplifies the hiring process for HR recruiters and enhances candidate selection.
MediLink is a web application that revolutionizes health record management by seamlessly integrating NLP techniques for handwritten text extraction on prescriptions and blockchain technology for secure data storage.
Extracts multiple URLs from text, and if downloadable, downloads them into a ZIP
Collection of NLP projects from classowrk.
Version 0.1 of Planned Dashboard for Dashboards
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."