html-parser

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

html pdf ocr table-of-contents excel html-parser docx documents doc scanned-documents txt document-analysis odt pdf-parser table-recognition docx-parser document-content-extraction logical-structure-extraction

Updated Aug 5, 2024
Python

Scorpi-ON / citatyinfo_bot

Star

An asynchronous bot parser of the Russian quotes portal citaty.info

python bot quotes parsing telegram-bot asynchronous html-parser uvloop pyrogram lexbor selectolax

Updated Aug 4, 2024
Python

skrapeit / skrape.it

Sponsor

Star

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

kotlin testing crawler scraper parse dom integration-testing test-automation jsoup html-parser kotlin-dsl hacktoberfest system-testing skrape

Updated Aug 3, 2024
Kotlin

Morni-Cloud-3 / 30-seconds-of-code

Star

Short code snippets for all your development needs

github kotlin config python git blog java go html bot html5 algorithms html-parser youtube-downloader algorithms-python algorithms-java github-actions github-config algorithms-and-data-structures-interview-questions

Updated Aug 2, 2024
JavaScript

craigbarnes / lua-gumbo

Star

Moved to https://gitlab.com/craigbarnes/lua-gumbo

html parser html5 lua dom html-parser sanitize-html

Updated Aug 2, 2024
C

pagescrape / toks.rs

Star

html-parser

Updated Aug 2, 2024
HTML

jgarber623 / micromicro

Star

A Ruby gem for extracting microformats2-encoded data from HTML documents.

ruby rubygems microformats html-parser microformats2

Updated Aug 2, 2024
Ruby

stfsy / node-html-light

Star

HTML Parser for NodeJS providing a lightweight object oriented interface

nodejs server-side-rendering html-parser node-html-light

Updated Aug 1, 2024
JavaScript

d-alejandro / training-level2

Star

Go, OOP, SOLID, Design Patterns, Golang 1.22, Unit tests, API tests, ServeMux, Socket, WB Tech, Wildberries

go golang socket patterns solid oop design-patterns html-parser goroutines unit-tests api-tests servemux net-http wildberries goroutines-channels wb-tech wbtech

Updated Aug 1, 2024
Go

brucificus / html-antlr4-typescript

Star

HTML lexer & parser written in TypeScript using ANTLR 4 & ANTLR4TS

typescript npm-package html-parser antlr4

Updated Aug 6, 2024
TypeScript

zzzprojects / html-agility-pack

Sponsor

Star

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.

parse html-parser xpath hap htmlagilitypack