Overview - ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling

This challenge focuses on scene text reading in natural images, which can be broken down into scene text detection and spotting problems, based on the proposed Large-scale Street View Text with Partial Labeling (LSVT) dataset. LSVT consists of 450,000 images, is at least 14 times as large as existing robust reading benchmarks [1], and is also the first ever scene text dataset labeled with partial annotations for the text detection and recognition challenges. The amount of fully annotated part of data is also greater than that of previous robust reading benchmarks. There are two main tasks in this competition, which will be detailed in the tasks page.

 

Overview

LSVT consists of 20,000 testing data, 30,000 training data in full annotations and 400,000 training data in weak annotations, which are referred to as partial labels. We intend to challenge the community to look into novel solutions which can further boost the performance from partial labels. For most of the training data in weak labels, only one transcription per image is provided, which we refer to as `text-of-interest'. All the images were captured from streets, which consist of a large variety of complicated real-world scenarios, e.g., store fronts and landmarks, making the challenge extreme high by narrowing gaps between research and real applications. Examples of these images in full and weak annotations are shown in Fig. 1 and Fig. 2, respectively.

LSVT_Overview_Figure1.jpg

Figure 1. Examples of images with full annotations. The ground truth locations and corresponding text are shown in these images. The labeled characters in yellow include Chinese, numbers and Latin characters. The labeled text regions demonstrate the diversity of text in our dataset, including horizontal text, vertical text, curved text and text with perspective distortion. The horizontal and vertical texts are annotated with quadrilateral, and the curved text instances are annotated with polygon shaped bounding regions.

 

LSVT_Overview_Figure21.jpg

Figure 2. Examples of images with weak annotations. Note that only the transcription of the text-of-interest in these images
is given as ground truth without location annotations, which is much cheaper to collect. 

The regions of "Text-of-interest" usually contain the names of store fonts or descriptions of landmarks, providing meaning information for localization and navigation.

Text instances in the LSVT dataset were annotated with (a) quadrilateral bounding boxes, 8 and 12 vertexes polygon bounding box (more details in ‘tasks’ page), and (b) transcription. Both of these annotations cater for the (a) text detection, (b) text spotting tasks proposed by this challenge.

 

 

References

[1] Yuan, Tai Ling, et al. "Chinese Text in the Wild." arXiv preprint arXiv:1803.00085, 2018.
[2] Y. Sun, J. Liu, W. Liu, J. Han, E. Ding, “Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning”, in Proc. of ICCV 2019.

Important Dates

1st January to 1st March

i) Q&A period for the competition,

ii) The launching of initial website

15th Feb to 1st March

i) Competition formal announcement,

ii) Publicity,

iii) Sample training images available,

iv) Evaluation protocol, file formats etc. available.

25th February

i) Evaluation tools ready,

ii) Full website ready.

1st March

i) Competition kicks off officially,

ii) Release of training set images and ground truth.

9th April

Release of the first part of test set images (10,000 images),

20th April

i) Release of the second part of test set images (10,000 images).

ii) Website opens for results submission

30th April

i) Deadline of the competition and result submission closes (at PDT 23:59)

ii) Release of the evaluation results.

5th May

i) Submission deadline for one page competition report, and the final ranking will be released after results checking.

20th to 25th September

i) Announcement of competition results at ICDAR2019.