Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

版面矫正网络DocTr++论文复现 #10379

Closed
shiyutang opened this issue Jul 13, 2023 · 12 comments
Closed

版面矫正网络DocTr++论文复现 #10379

shiyutang opened this issue Jul 13, 2023 · 12 comments
Assignees
Labels
Code PR is needed This issue could inspire a code PR stale status/close

Comments

@shiyutang
Copy link
Collaborator

shiyutang commented Jul 13, 2023

背景

经过需求征集#10334 和每周技术研讨会 #10223 讨论,我们确定了DocTr++版面矫正任务,该任务在文档比对、关键字提取、合同篡改确认等重要场景发挥作用。本任务的完成能显著OCR结果的细粒度,并有众多场景应用。
通过定量实验和定性对比,作者团队验证了 DocTr++ 的性能优势及泛化性,并在现有及所提出的基准测试中刷新了多项最佳记录,是目前最优的文档矫正方案。
暂时没有预训练权重和训练代码,需要按照论文描述重新训练尝试。

解决步骤

  1. 根据开源代码进行网络结构、评估指标转换。代码链接:https://github.com/fh2019ustc/DocTr-Plus
  2. 结合论文复现指南,进行前反向对齐等操作,达到论文Table.1中的指标。
  3. 参考PR提交规范提交代码PR到ppocr中。

数据集:

  1. 训练数据集:获取Doc3D数据集后进行边缘裁剪,使得分成论文中的三类图片(全部包含边缘、部分包含边缘、不包含边缘)
image
  1. 验证数据集:Doc Unet数据集
image
@GreatV
Copy link
Collaborator

GreatV commented Jul 13, 2023

The training set is extended from the classic Doc3D dataset

这个训练集是自制的,还得自己构建训练集

To construct the training set for unrestricted document image rectification, we randomly crop such distorted document images to meet one of the following three conditions, including (a) with complete document boundaries, (b) with partial document boundaries, and (c) without any document boundaries

@GreatV
Copy link
Collaborator

GreatV commented Jul 13, 2023

认领 约需1个月完成

@shiyutang
Copy link
Collaborator Author

数据集的构造已经在问题中进一步说明,有任何问题我们可以持续交流~

@shiyutang
Copy link
Collaborator Author

进行了论文解读,可以参考
DocTr++文档矫正.pdf

@GreatV
Copy link
Collaborator

GreatV commented Jul 14, 2023

等有时间了写一下训练部分

Copy link
Contributor

github-actions bot commented Jan 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 3, 2024
@zhuxiaobin
Copy link

hello,进展如何?

@GreatV
Copy link
Collaborator

GreatV commented Jan 30, 2024

@zhuxiaobin 可以看下这个PR

@Li-Yidong
Copy link

你好,进展如何?

@GreatV
Copy link
Collaborator

GreatV commented Mar 19, 2024

@Li-Yidong 可以看下这个PR

@GreatV
Copy link
Collaborator

GreatV commented Apr 8, 2024

@Li-Yidong 可以看看这个仓库 https://github.com/GreatV/DocTrPP

@Li-Yidong
Copy link

@Li-Yidong 可以看看这个仓库 https://github.com/GreatV/DocTrPP

感谢分享!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code PR is needed This issue could inspire a code PR stale status/close
Projects
None yet
Development

No branches or pull requests

5 participants