Skip to content
View zhangjx123's full-sized avatar
Block or Report

Block or report zhangjx123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

fault-tolerant Python3 package for searching, navigating, and modifying LaTeX documents

Python 270 42 Updated Apr 10, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models

75 3 Updated Jul 23, 2024

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,507 545 Updated Apr 16, 2024

Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)

Python 28 2 Updated Jun 6, 2024

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 1,167 70 Updated Jul 16, 2024

EfficientViT is a new family of vision models for efficient high-resolution vision.

Python 1,654 147 Updated Jul 11, 2024

[arXiv preprint] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Python 165 9 Updated Jun 29, 2024

一个基于可视水印检测识别的数字媒体溯源应用系统,是我的大作业项目,包含这个系统以及一个开源的大规模常见水印图像数据集(Large-scale Common Watermark Dataset, LCWD)。 输入一个带有可视水印的图片或视频,系统会检测定位到水印所在的区域,然后将其提取出来,然后借助百度AI开放平台的OCR和logo识别以及Bing搜索引擎,溯源到这个图片或视频的源头。

Python 80 14 Updated Nov 7, 2022

A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".

Python 2,054 472 Updated Mar 11, 2024

(CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.

Python 40 Updated Jun 11, 2024

The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.

Python 24 3 Updated Jun 20, 2024

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Python 35 4 Updated Jun 14, 2024
Python 7,029 540 Updated Jul 13, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 20,843 1,973 Updated Jul 16, 2024

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,585 110 Updated Jul 14, 2024

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 55,002 6,705 Updated Jul 22, 2024

An index of PDF-centric corpora

92 8 Updated Jul 13, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,789 715 Updated Jul 23, 2024

🎨 Python Echarts Plotting Library

Python 14,677 2,841 Updated Jul 10, 2024

A comprehensive list of awesome document image rectification papers.

2 Updated Dec 26, 2023

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 11,454 953 Updated Jul 5, 2024

(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

Python 71 6 Updated Apr 9, 2024

Deep Learning for Camera Calibration and Beyond: A Survey

553 55 Updated Jun 4, 2024

A comprehensive list of awesome document image rectification papers.

341 30 Updated Apr 2, 2024
Python 1 Updated May 30, 2023

The official implementation of SPTS v2: Single-Point Text Spotting

Python 119 16 Updated Jun 29, 2023
20 Updated May 30, 2023

Official implementation of SPTS: Single-Point Text Spotting (ACM MM 2022 Oral)

Python 135 12 Updated Jul 26, 2023

Google Research

Jupyter Notebook 33,484 7,791 Updated Jul 22, 2024
Next