Skip to content

ZZR8066/SEM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Split, Embed and Merge: An accurate table structure recognizer

This repository contains the source code of our Pattern Recognition 2022 paper: Split, Embed and Merge: An accurate table structure recognizer.

Introduction

pipeline

Split, Embed and Merge (SEM) is a new framework for parsing the tabular data into the structured format, which is mainly composed of three parts, splitter, embedder and merger. We won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing.

Dataset

We provide scripts for processing the SciTSR dataset, which contains 15,000 tables in PDF format as well as their corresponding structure labels.

It’s worth noting that we need to align the text information with the table cells in order to generate labels of splitter.

Requirements

  • torch==1.7.1

Training and Testing

python runner/train.py --cfg default

Citation

If you find SEM useful in your research, please consider citing:

@article{zhang2022split,
  title={Split, embed and merge: An accurate table structure recognizer},
  author={Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wang, Fengren},
  journal={Pattern Recognition},
  volume={126},
  pages={108565},
  year={2022},
  publisher={Elsevier}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages