Skip to content
/ TF-ID Public
forked from ai8hyf/TF-ID

TF-ID: Table/Figure IDentifier for academic papers

License

Notifications You must be signed in to change notification settings

aohan237/TF-ID

 
 

Repository files navigation

TF-ID

This repository contains the full training code to reproduce all TF-ID models. We also open-source the model weights and human annotated dataset all under mit license.

Model Summary

TF-ID

TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers created by Yifei Hu. They come in four versions:

Model Model size Model Description
TF-ID-base[HF] 0.23B Extract tables/figures and their caption text
TF-ID-large[HF] (Recommended) 0.77B Extract tables/figures and their caption text
TF-ID-base-no-caption[HF] 0.23B Extract tables/figures without caption text
TF-ID-large-no-caption[HF] (Recommended) 0.77B Extract tables/figures without caption text
All TF-ID models are finetuned from microsoft/Florence-2 checkpoints.

Train TF-ID models from scratch

  1. Clone the repo: git clone https://github.com/ai8hyf/TF-ID
  2. cd TF-ID
  3. Download the huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers from Hugging Face
  4. Move annotations_with_caption.json to ./annotations (Use annotations_no_caption.json if you don't want the bounding boxes to include text captions)
  5. Unzip the arxiv_paper_images.zip and move the .png images to ./images
  6. Convert the coco format dataset to florence 2 format: python coco_to_florence.py
  7. You should see train.jsonl and test.jsonl under ./annotations
  8. Train the model with Accelerate: accelerate launch train.py
  9. The checkpoints will be saved under ./model_checkpoints

Hardware Requirement

With microsoft/Florence-2-large-ft, BATCH_SIZE=4 will require at least 40GB VRAM on a single GPU. The microsoft/Florence-2-base-ft model takes much less VRAM. Please modify the BATCH_SIZE and CHECKPOINT parameter in the train.py before you start training.

Acknowledgement

  • I learned how to work with Florence 2 models from this Roboflow's awesome tutorial.
  • My friend Yi Zhang helped annotate some data to train our proof-of-concept models including a yolo-based TF-ID model.

Citation

If you find TD-ID useful, please cite this project as:

@misc{TF-ID,
  author = {Yifei Hu},
  title = {TF-ID: Table/Figure IDentifier for academic papers},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
}

About

TF-ID: Table/Figure IDentifier for academic papers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%