Skip to content
#

html-parser

Here are 422 public repositories matching this topic...

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

  • Updated Aug 5, 2024
  • Python
skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

  • Updated Aug 3, 2024
  • Kotlin

Improve this page

Add a description, image, and links to the html-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the html-parser topic, visit your repo's landing page and select "manage topics."

Learn more