CN111611933B

CN111611933B - Information extraction method and system for document image

Info

Publication number: CN111611933B
Application number: CN202010441086.7A
Authority: CN
Inventors: 王春恒; 杜臣; 史存召; 肖柏华; 王燕娜
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2023-07-14
Anticipated expiration: 2040-05-22
Also published as: CN111611933A

Abstract

The invention relates to an information extraction method and system of a document image, wherein the extraction method comprises the following steps: based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified; dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images; character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained; based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character; and merging the characters according to the position information to obtain the identification information of the image to be identified. The method comprises the steps of determining characters in each divided image of a document image to be recognized through a full convolution neural network, a watershed algorithm and a connected domain extraction method, and determining position information of each character based on a character recognition model of a deep neural network; and combining the characters according to the position information, so that the identification information of the image to be identified can be accurately obtained.

Description

Information extraction method and system for document image

Technical Field

The invention relates to the technical fields of image processing and text detection and recognition, in particular to an information extraction method and system of a document image.

Background

The character detection and recognition are an important technology, and have huge application value and wide application prospect, in particular to automatic recognition and input of document images. For example, the automatic identification and input of the document image can be directly applied to bill identification, report identification, identification card identification and input, automatic identification of a bank card number, business card identification and the like.

The existing text detection and recognition method based on deep learning is mostly aimed at detecting and recognizing character strings in scene images. Compared with scene text detection and recognition, the document image takes a text area as a main body part, the text density is high, and the text area has structural information. The direct adoption of the scene character string detection method can cause the problems of wrong division, uneven length, crossing text fields and the like of a character string detection result detection frame, and causes difficulties in document identification result processing and structured output.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to improve the accuracy of information extraction in document images, the present invention aims to provide a method and a system for extracting information of document images.

In order to solve the technical problems, the invention provides the following scheme:

an information extraction method of a document image, the extraction method comprising:

based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;

dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;

character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;

based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;

and merging the characters according to the position information to obtain the identification information of the image to be identified.

Optionally, the character perception response graph is an intensity graph with gradually decreasing intensity from the central position of the rectangular surrounding frame of the character to the edge position of the character.

Optionally, the full convolution neural network comprises a first feature extraction layer, a cross-layer feature fusion layer and an intensity perception prediction layer which are sequentially connected;

the method for obtaining the character perception response graph based on the full convolution neural network comprises the following steps of:

extracting features of the document image to be identified through the first feature extraction layer to obtain image features of different layers;

through the cross-layer feature fusion layer, the features of different layers are sequentially and automatically assigned with weights, combined in a weighting way, combined in a cascading way and fused in a feature way, and a multi-channel fusion feature image is obtained;

and mapping the multi-channel fusion characteristic image into a single-channel intensity image through an intensity perception prediction layer, wherein the single-channel intensity image is a character perception response image.

Optionally, the dividing the character perception response graph by adopting a watershed algorithm to obtain a divided image specifically includes:

performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;

performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;

and dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image.

Optionally, the dividing the character perception response graph by adopting a watershed algorithm to obtain a divided image further includes:

dividing the text image to be recognized into a plurality of text fields to be recognized according to each character area, and determining the type attribute of each text field to be recognized.

Optionally, the merging the characters according to the position information to obtain the identification information of the image to be identified specifically includes:

for each text field to be identified,

according to the position information, ordering the characters in the text field to be identified in a top-down mode;

dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;

sorting the characters in each text line from left to right according to the position information;

merging characters in each text line from left to right to obtain a recognition result of each text line;

combining the text line recognition results from top to bottom to obtain a character string;

and determining the identification information of the image to be identified according to the character strings of the text fields to be identified.

Optionally, the information extraction method further includes:

and storing the character strings of each text field to be identified under the corresponding type entry in the database according to the type attribute of the text field to be identified.

In order to solve the technical problems, the invention also provides the following scheme:

an information extraction system of a document image, the information extraction system comprising:

the determining unit is used for obtaining a character perception response diagram according to the document image to be identified based on the full convolution neural network;

the segmentation unit is used for segmenting the character perception response graph by adopting a watershed algorithm to obtain a plurality of segmented images;

the extraction unit is used for extracting characters from each divided image by a connected domain extraction method to obtain characters in each divided image;

the recognition unit is used for recognizing each character based on a character recognition model of the deep neural network and determining the position information of each character;

and the merging unit is used for merging the characters according to the position information to obtain the identification information of the image to be identified.

an information extraction system of a document image, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:

According to the embodiment of the invention, the following technical effects are disclosed:

the method comprises the steps of determining characters in each divided image of a document image to be recognized through a full convolution neural network, a watershed algorithm and a connected domain extraction method, and determining position information of each character based on a character recognition model of a deep neural network; and combining the characters according to the position information, so that the identification information of the image to be identified can be accurately obtained.

Drawings

FIG. 1 is a flow chart of a method of information extraction of a document image of the present invention;

FIG. 2 is a schematic illustration of determining a character perception response map based on a full convolutional neural network;

FIG. 3 is a schematic diagram of a channel sensitive attention mechanism for cross-layer feature fusion;

fig. 4 is a schematic block diagram of an information extraction system of a document image of the present invention.

Symbol description:

the device comprises a determining unit-1, a dividing unit-2, an extracting unit-3, a recognizing unit-4 and a merging unit-5.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

The invention aims to provide an information extraction method of a document image, which is characterized in that characters in each divided image of the document image to be identified are determined through a full convolution neural network, a watershed algorithm and a connected domain extraction method, and the position information of each character is determined based on a character identification model of a deep neural network; and combining the characters according to the position information, so that the identification information of the image to be identified can be accurately obtained.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

As shown in fig. 1, the information extraction method of the document image of the present invention includes:

step 100: based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;

step 200: dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;

step 300: character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;

step 400: based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;

step 500: and merging the characters according to the position information to obtain the identification information of the image to be identified.

The character perception response graph is an intensity graph with gradually reduced intensity from the central position of a rectangular surrounding frame of the character to the edge position of the character.

As shown in fig. 2, the full convolution neural network includes a first feature extraction layer, a cross-layer feature fusion layer and an intensity perception prediction layer connected in sequence.

In step 100, the obtaining a character perception response diagram based on the full convolution neural network according to the document image to be identified specifically includes:

step 110: and extracting the characteristics of the document image to be identified through the first characteristic extraction layer to obtain image characteristics of different layers.

The first feature extraction layer is a multi-layer convolutional neural network and is used for extracting image features of different layers.

Step 120: and through the cross-layer feature fusion layer, automatically distributing weights, weighting, combining in a cascading way and fusing the features of different layers in sequence to obtain a multi-channel fusion feature image.

The purpose of cross-layer feature fusion is to effectively fuse features of different layers, as shown in fig. 3, and the cross-layer feature fusion layer is to adopt a channel sensitive attention mechanism to process features of different layersThe rows are automatically assigned weights and combined by weighting. For two feature layers X to be fused ₁ And X ₂ ，X ₁ The size of (2) is X ₂ 1/2 of (C). X is to be ₂ Up-sampling by 2 times to obtain uninol ₂ (X ₂ ) Then with X ₁ Performing cascade combination; then the number of channels is reduced by 1X 1 convolution to obtain the number of channels as ₂ Weight matrix [ A ] of (2) ₁ ，A ₂ ]First channel A in weight matrix ₁ For characteristic layer X ₁ Weighting the second channel A in the weight matrix ₂ To the feature layer Unpool ₂ (X ₂ ) Weighting is performed. And carrying out cascade combination on the weighted feature layers, and then carrying out feature fusion through 1X 1 convolution to obtain the fused features.

[A ₁ ，A ₂ ]＝Conv _1×1 ([X ₁ ；unpool ₂ (X ₂ )])

G＝conv _1×1 ([A ₁ *X ₁ ；A ₂ *unpool ₂ (X ₂ )])；

Wherein G is a feature map after feature fusion, operator [ ■; ■ And represents cascading along the feature channel, and the operator represents feature map and weight matrix dot product.

Step 130: and mapping the multi-channel fusion characteristic image into a single-channel intensity image through an intensity perception prediction layer, wherein the single-channel intensity image is a character perception response image.

The intensity perception prediction layer maps the multi-channel characteristic map into a single-channel intensity map. In this embodiment, the character perception response graph predicted by the deep neural network is an intensity graph with intensity gradually attenuated from the center position of the rectangular bounding box of the character to the edge position of the character, and in the training stage, training labels are generated by filling the text character bounding box with two-dimensional gaussian intensity distribution.

Optionally, in step 200, the segmenting the character perception response graph by using a watershed algorithm to obtain segmented images specifically includes:

step 210: performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;

step 220: performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;

step 230: and dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image.

Wherein the first threshold is higher than the second threshold.

Preferably, the dividing the character perception response graph by adopting a watershed algorithm to obtain a divided image further includes:

In step 400, the character recognition model of the deep neural network includes a second feature extraction layer including a multi-layer convolutional neural network, a full connection layer, and a Softmax classification output layer.

In step 500, the merging the characters according to the position information to obtain the identification information of the image to be identified specifically includes:

step 510: aiming at each text field to be identified, ordering the characters in the text field to be identified in a top-to-bottom mode according to the position information;

step 520: dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;

step 530: sorting the characters in each text line from left to right according to the position information;

step 540: merging characters in each text line from left to right to obtain a recognition result of each text line;

step 550: combining the text line recognition results from top to bottom to obtain a character string;

step 560: and determining the identification information of the image to be identified according to the character strings of the text fields to be identified.

Further, the information extraction method of the document image of the present invention further comprises:

The invention is suitable for identifying and inputting various document images, and can effectively improve the speed and accuracy of automatic identification of document type images.

In addition, the invention also provides an information extraction system of the document image, which can improve the accuracy of information extraction in the document image.

As shown in fig. 4, the information extraction system of the document image of the present invention includes a determination unit 1, a segmentation unit 2, an extraction unit 3, an identification unit 4, and a merging unit 5.

Specifically, the determining unit 1 is configured to obtain a character perception response diagram according to a document image to be identified based on a full convolution neural network;

the segmentation unit 2 is used for segmenting the character perception response graph by adopting a watershed algorithm to obtain a plurality of segmented images;

the extraction unit 3 is used for extracting characters from each divided image by a connected domain extraction method to obtain characters in each divided image;

the recognition unit 4 is used for recognizing each character based on a character recognition model of the deep neural network and determining the position information of each character;

the merging unit 5 is configured to merge characters according to the position information, so as to obtain identification information of the image to be identified.

In addition, the invention also provides an information extraction system of the document image, which comprises the following steps:

a processor; and

Further, the present invention provides a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:

Compared with the prior art, the information extraction system and the computer readable storage medium of the document image have the same beneficial effects as the information extraction method of the document image, and are not repeated here.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. An information extraction method of a document image, characterized in that the extraction method comprises:

dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images:

dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image;

further comprises: dividing the document image to be identified into a plurality of text fields to be identified according to each character area, and determining the type attribute of each text field to be identified;

combining the characters according to the position information to obtain the identification information of the document image to be identified:

for each text field to be identified,

and determining the identification information of the document image to be identified according to the character strings of the text fields to be identified.

2. The method for extracting information from a document image according to claim 1, wherein the character perception response map is an intensity map in which intensity gradually decays from a center position of a rectangular surrounding frame of a character to an edge position of the character.

3. The method for extracting information from a document image according to claim 1, wherein the full convolution neural network comprises a first feature extraction layer, a cross-layer feature fusion layer and an intensity perception prediction layer which are sequentially connected;

4. The information extraction method of a document image according to claim 1, characterized in that the information extraction method further comprises:

5. An information extraction system of a document image, characterized by comprising:

the segmentation unit is used for segmenting the character perception response graph by adopting a watershed algorithm to obtain a plurality of segmented images:

the merging unit is used for merging the characters according to the position information to obtain the identification information of the document image to be identified:

for each text field to be identified,

6. An information extraction system of a document image, comprising:

a processor; and

for each text field to be identified,

7. A computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:

for each text field to be identified,