-
Notifications
You must be signed in to change notification settings - Fork 511
Insights: DS4SD/docling
Overview
Could not load contribution data
Please try again later
14 Pull requests merged by 8 people
-
fix: force pydantic < 2.10.0
#407 merged
Nov 22, 2024 -
chore: update the README
#409 merged
Nov 21, 2024 -
docs: add DocETL, Kotaemon, spaCy integrations; minor docs improvements
#408 merged
Nov 21, 2024 -
chore: add downloads in README, security policy and update ci actions
#401 merged
Nov 21, 2024 -
fix: python3.9 support
#396 merged
Nov 20, 2024 -
feat: add support for
ocrmac
OCR engine on macOS#276 merged
Nov 20, 2024 -
fix: propagate document limits to converter
#388 merged
Nov 20, 2024 -
Sample chunking notebook that includes merging, etc.
#193 merged
Nov 19, 2024 -
feat: added support for exporting DocItem to an image when page image is available
#379 merged
Nov 19, 2024 -
docs: fixed typo in v2 example v2
#378 merged
Nov 19, 2024 -
feat: expose ocr-lang in CLI
#375 merged
Nov 19, 2024 -
feat: added excel backend
#334 merged
Nov 19, 2024 -
chore: update lock of deps
#371 merged
Nov 19, 2024 -
feat: Extracting picture data for raster images found in PPTX
#349 merged
Nov 18, 2024
4 Pull requests opened by 3 people
-
Advanced chunking example
#384 opened
Nov 19, 2024 -
feat(ocr): added support for PaddleOCR engine
#393 opened
Nov 20, 2024 -
feat(ocr): added support for RapidOCR engine
#415 opened
Nov 22, 2024 -
fix(layout_utils): correct conditional logic in adapt_bbox function
#416 opened
Nov 22, 2024
25 Issues closed by 7 people
-
do we have a function to generate a folder which contains images folder and markdown file
#387 closed
Nov 22, 2024 -
Graphical user interface for parsed JSON?
#403 closed
Nov 22, 2024 -
Is it possible to fine tune with our own datasets?
#411 closed
Nov 22, 2024 -
Title differenciation
#412 closed
Nov 22, 2024 -
I get an error trying to export figures
#406 closed
Nov 21, 2024 -
Can support for widgets in Dify be considered?
#394 closed
Nov 21, 2024 -
analyzing the pdf is too slow
#398 closed
Nov 21, 2024 -
Python 3.9 Support?
#385 closed
Nov 20, 2024 -
Convert pdf to md simplified Chinese character issue
#225 closed
Nov 19, 2024 -
Allow extraction of formula images similar to tables and pages
#299 closed
Nov 19, 2024 -
Specific language for easyOCR
#255 closed
Nov 19, 2024 -
Newcomers who want to start source code, how should I do it?
#372 closed
Nov 19, 2024 -
Support Excel files
#258 closed
Nov 19, 2024 -
Support for HOCR?
#366 closed
Nov 19, 2024 -
Bug
#370 closed
Nov 19, 2024 -
How to give HTML code as a string
#368 closed
Nov 19, 2024 -
export_to_markdown page separator
#359 closed
Nov 18, 2024 -
LXML versions greater or equal than 5.0.0 are not allowed
#363 closed
Nov 18, 2024 -
Docling <page_assemble_model> reading order algorithm
#358 closed
Nov 18, 2024 -
cannot import name 'TextPipelineOptions' from 'docling.datamodel.pipeline_options'
#360 closed
Nov 18, 2024 -
Using Docling with costume layout and table recognition models
#250 closed
Nov 18, 2024 -
Streamline the dependence, the dependence is too heavy now
#252 closed
Nov 18, 2024 -
Analyzing PDf files is too slow
#346 closed
Nov 18, 2024 -
Syntax error while parsing object key (pdf with Chinese characters)
#351 closed
Nov 18, 2024 -
OCR Extracted Information
#244 closed
Nov 18, 2024
11 Issues opened by 11 people
-
parse docx file error :
#417 opened
Nov 23, 2024 -
Using .DOCX format in cloud - suggestion on the below error?
#410 opened
Nov 22, 2024 -
Support Image path/url
#405 opened
Nov 21, 2024 -
Which type of Markdown is supported?
#404 opened
Nov 21, 2024 -
Document normalization: warning on `checkbox-unselected`
#399 opened
Nov 21, 2024 -
Docx cannot get pic info
#391 opened
Nov 20, 2024 -
Loading a pdf results in a StopIteration error
#383 opened
Nov 19, 2024 -
Table representation misaligned between PDF and DOCX
#382 opened
Nov 19, 2024 -
Add Parallelization Support to `convert_all()` Function with `num_worker` Parameter
#369 opened
Nov 19, 2024 -
Should the second "if" keyword in adapt_bbox from layout_utils.py rather be an "elif" keyword ?
#362 opened
Nov 18, 2024 -
docling identified my entire page as a picture
#357 opened
Nov 18, 2024
14 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add LaTex and mathpix-markdown-it as outputs
#343 commented on
Nov 18, 2024 • 0 new comments -
Standardized Access to Common Email and Calendar Formats
#327 commented on
Nov 18, 2024 • 0 new comments -
Docling crashes when using EasyOCR on Windows 11
#318 commented on
Nov 18, 2024 • 0 new comments -
Deployment of docling using Docker
#303 commented on
Nov 18, 2024 • 0 new comments -
Leverage word bbox from pdf-parser-v2 in the layout- and table-model
#285 commented on
Nov 18, 2024 • 0 new comments -
For long tables, fields are being truncated
#278 commented on
Nov 18, 2024 • 0 new comments -
cli and PDF: wrong table output
#268 commented on
Nov 18, 2024 • 0 new comments -
Support export of DoclingDocument to HTML
#300 commented on
Nov 18, 2024 • 0 new comments -
Add option to export_to_markdown to mark page breaks
#309 commented on
Nov 19, 2024 • 0 new comments -
Python 3.13 support
#136 commented on
Nov 19, 2024 • 0 new comments -
EasyOCR does not extract text properly
#295 commented on
Nov 21, 2024 • 0 new comments -
Result viewer application
#277 commented on
Nov 22, 2024 • 0 new comments -
Dev/update html parser with h1
#240 commented on
Nov 19, 2024 • 0 new comments -
enhancement: Add timeout limit to document parsing job. #270
#320 commented on
Nov 23, 2024 • 0 new comments