CN111611933B - Information extraction method and system for document image - Google Patents

Information extraction method and system for document image Download PDF

Info

Publication number
CN111611933B
CN111611933B CN202010441086.7A CN202010441086A CN111611933B CN 111611933 B CN111611933 B CN 111611933B CN 202010441086 A CN202010441086 A CN 202010441086A CN 111611933 B CN111611933 B CN 111611933B
Authority
CN
China
Prior art keywords
character
identified
characters
image
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010441086.7A
Other languages
Chinese (zh)
Other versions
CN111611933A (en
Inventor
王春恒
杜臣
史存召
肖柏华
王燕娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202010441086.7A priority Critical patent/CN111611933B/en
Publication of CN111611933A publication Critical patent/CN111611933A/en
Application granted granted Critical
Publication of CN111611933B publication Critical patent/CN111611933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an information extraction method and system of a document image, wherein the extraction method comprises the following steps: based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified; dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images; character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained; based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character; and merging the characters according to the position information to obtain the identification information of the image to be identified. The method comprises the steps of determining characters in each divided image of a document image to be recognized through a full convolution neural network, a watershed algorithm and a connected domain extraction method, and determining position information of each character based on a character recognition model of a deep neural network; and combining the characters according to the position information, so that the identification information of the image to be identified can be accurately obtained.

Description

Information extraction method and system for document image
Technical Field
The invention relates to the technical fields of image processing and text detection and recognition, in particular to an information extraction method and system of a document image.
Background
The character detection and recognition are an important technology, and have huge application value and wide application prospect, in particular to automatic recognition and input of document images. For example, the automatic identification and input of the document image can be directly applied to bill identification, report identification, identification card identification and input, automatic identification of a bank card number, business card identification and the like.
The existing text detection and recognition method based on deep learning is mostly aimed at detecting and recognizing character strings in scene images. Compared with scene text detection and recognition, the document image takes a text area as a main body part, the text density is high, and the text area has structural information. The direct adoption of the scene character string detection method can cause the problems of wrong division, uneven length, crossing text fields and the like of a character string detection result detection frame, and causes difficulties in document identification result processing and structured output.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to improve the accuracy of information extraction in document images, the present invention aims to provide a method and a system for extracting information of document images.
In order to solve the technical problems, the invention provides the following scheme:
an information extraction method of a document image, the extraction method comprising:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
and merging the characters according to the position information to obtain the identification information of the image to be identified.
Optionally, the character perception response graph is an intensity graph with gradually decreasing intensity from the central position of the rectangular surrounding frame of the character to the edge position of the character.
Optionally, the full convolution neural network comprises a first feature extraction layer, a cross-layer feature fusion layer and an intensity perception prediction layer which are sequentially connected;
the method for obtaining the character perception response graph based on the full convolution neural network comprises the following steps of:
extracting features of the document image to be identified through the first feature extraction layer to obtain image features of different layers;
through the cross-layer feature fusion layer, the features of different layers are sequentially and automatically assigned with weights, combined in a weighting way, combined in a cascading way and fused in a feature way, and a multi-channel fusion feature image is obtained;
and mapping the multi-channel fusion characteristic image into a single-channel intensity image through an intensity perception prediction layer, wherein the single-channel intensity image is a character perception response image.
Optionally, the dividing the character perception response graph by adopting a watershed algorithm to obtain a divided image specifically includes:
performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;
performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;
and dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image.
Optionally, the dividing the character perception response graph by adopting a watershed algorithm to obtain a divided image further includes:
dividing the text image to be recognized into a plurality of text fields to be recognized according to each character area, and determining the type attribute of each text field to be recognized.
Optionally, the merging the characters according to the position information to obtain the identification information of the image to be identified specifically includes:
for each text field to be identified,
according to the position information, ordering the characters in the text field to be identified in a top-down mode;
dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;
sorting the characters in each text line from left to right according to the position information;
merging characters in each text line from left to right to obtain a recognition result of each text line;
combining the text line recognition results from top to bottom to obtain a character string;
and determining the identification information of the image to be identified according to the character strings of the text fields to be identified.
Optionally, the information extraction method further includes:
and storing the character strings of each text field to be identified under the corresponding type entry in the database according to the type attribute of the text field to be identified.
In order to solve the technical problems, the invention also provides the following scheme:
an information extraction system of a document image, the information extraction system comprising:
the determining unit is used for obtaining a character perception response diagram according to the document image to be identified based on the full convolution neural network;
the segmentation unit is used for segmenting the character perception response graph by adopting a watershed algorithm to obtain a plurality of segmented images;
the extraction unit is used for extracting characters from each divided image by a connected domain extraction method to obtain characters in each divided image;
the recognition unit is used for recognizing each character based on a character recognition model of the deep neural network and determining the position information of each character;
and the merging unit is used for merging the characters according to the position information to obtain the identification information of the image to be identified.
In order to solve the technical problems, the invention also provides the following scheme:
an information extraction system of a document image, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
and merging the characters according to the position information to obtain the identification information of the image to be identified.
In order to solve the technical problems, the invention also provides the following scheme:
a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
and merging the characters according to the position information to obtain the identification information of the image to be identified.
According to the embodiment of the invention, the following technical effects are disclosed:
the method comprises the steps of determining characters in each divided image of a document image to be recognized through a full convolution neural network, a watershed algorithm and a connected domain extraction method, and determining position information of each character based on a character recognition model of a deep neural network; and combining the characters according to the position information, so that the identification information of the image to be identified can be accurately obtained.
Drawings
FIG. 1 is a flow chart of a method of information extraction of a document image of the present invention;
FIG. 2 is a schematic illustration of determining a character perception response map based on a full convolutional neural network;
FIG. 3 is a schematic diagram of a channel sensitive attention mechanism for cross-layer feature fusion;
fig. 4 is a schematic block diagram of an information extraction system of a document image of the present invention.
Symbol description:
the device comprises a determining unit-1, a dividing unit-2, an extracting unit-3, a recognizing unit-4 and a merging unit-5.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
The invention aims to provide an information extraction method of a document image, which is characterized in that characters in each divided image of the document image to be identified are determined through a full convolution neural network, a watershed algorithm and a connected domain extraction method, and the position information of each character is determined based on a character identification model of a deep neural network; and combining the characters according to the position information, so that the identification information of the image to be identified can be accurately obtained.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, the information extraction method of the document image of the present invention includes:
step 100: based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
step 200: dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;
step 300: character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
step 400: based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
step 500: and merging the characters according to the position information to obtain the identification information of the image to be identified.
The character perception response graph is an intensity graph with gradually reduced intensity from the central position of a rectangular surrounding frame of the character to the edge position of the character.
As shown in fig. 2, the full convolution neural network includes a first feature extraction layer, a cross-layer feature fusion layer and an intensity perception prediction layer connected in sequence.
In step 100, the obtaining a character perception response diagram based on the full convolution neural network according to the document image to be identified specifically includes:
step 110: and extracting the characteristics of the document image to be identified through the first characteristic extraction layer to obtain image characteristics of different layers.
The first feature extraction layer is a multi-layer convolutional neural network and is used for extracting image features of different layers.
Step 120: and through the cross-layer feature fusion layer, automatically distributing weights, weighting, combining in a cascading way and fusing the features of different layers in sequence to obtain a multi-channel fusion feature image.
The purpose of cross-layer feature fusion is to effectively fuse features of different layers, as shown in fig. 3, and the cross-layer feature fusion layer is to adopt a channel sensitive attention mechanism to process features of different layersThe rows are automatically assigned weights and combined by weighting. For two feature layers X to be fused 1 And X 2 ,X 1 The size of (2) is X 2 1/2 of (C). X is to be 2 Up-sampling by 2 times to obtain uninol 2 (X 2 ) Then with X 1 Performing cascade combination; then the number of channels is reduced by 1X 1 convolution to obtain the number of channels as 2 Weight matrix [ A ] of (2) 1 ,A 2 ]First channel A in weight matrix 1 For characteristic layer X 1 Weighting the second channel A in the weight matrix 2 To the feature layer Unpool 2 (X 2 ) Weighting is performed. And carrying out cascade combination on the weighted feature layers, and then carrying out feature fusion through 1X 1 convolution to obtain the fused features.
[A 1 ,A 2 ]=Conv 1×1 ([X 1 ;unpool 2 (X 2 )])
G=conv 1×1 ([A 1 *X 1 ;A 2 *unpool 2 (X 2 )]);
Wherein G is a feature map after feature fusion, operator [ ■; ■ And represents cascading along the feature channel, and the operator represents feature map and weight matrix dot product.
Step 130: and mapping the multi-channel fusion characteristic image into a single-channel intensity image through an intensity perception prediction layer, wherein the single-channel intensity image is a character perception response image.
The intensity perception prediction layer maps the multi-channel characteristic map into a single-channel intensity map. In this embodiment, the character perception response graph predicted by the deep neural network is an intensity graph with intensity gradually attenuated from the center position of the rectangular bounding box of the character to the edge position of the character, and in the training stage, training labels are generated by filling the text character bounding box with two-dimensional gaussian intensity distribution.
Optionally, in step 200, the segmenting the character perception response graph by using a watershed algorithm to obtain segmented images specifically includes:
step 210: performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;
step 220: performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;
step 230: and dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image.
Wherein the first threshold is higher than the second threshold.
Preferably, the dividing the character perception response graph by adopting a watershed algorithm to obtain a divided image further includes:
dividing the text image to be recognized into a plurality of text fields to be recognized according to each character area, and determining the type attribute of each text field to be recognized.
In step 400, the character recognition model of the deep neural network includes a second feature extraction layer including a multi-layer convolutional neural network, a full connection layer, and a Softmax classification output layer.
In step 500, the merging the characters according to the position information to obtain the identification information of the image to be identified specifically includes:
step 510: aiming at each text field to be identified, ordering the characters in the text field to be identified in a top-to-bottom mode according to the position information;
step 520: dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;
step 530: sorting the characters in each text line from left to right according to the position information;
step 540: merging characters in each text line from left to right to obtain a recognition result of each text line;
step 550: combining the text line recognition results from top to bottom to obtain a character string;
step 560: and determining the identification information of the image to be identified according to the character strings of the text fields to be identified.
Further, the information extraction method of the document image of the present invention further comprises:
and storing the character strings of each text field to be identified under the corresponding type entry in the database according to the type attribute of the text field to be identified.
The invention is suitable for identifying and inputting various document images, and can effectively improve the speed and accuracy of automatic identification of document type images.
In addition, the invention also provides an information extraction system of the document image, which can improve the accuracy of information extraction in the document image.
As shown in fig. 4, the information extraction system of the document image of the present invention includes a determination unit 1, a segmentation unit 2, an extraction unit 3, an identification unit 4, and a merging unit 5.
Specifically, the determining unit 1 is configured to obtain a character perception response diagram according to a document image to be identified based on a full convolution neural network;
the segmentation unit 2 is used for segmenting the character perception response graph by adopting a watershed algorithm to obtain a plurality of segmented images;
the extraction unit 3 is used for extracting characters from each divided image by a connected domain extraction method to obtain characters in each divided image;
the recognition unit 4 is used for recognizing each character based on a character recognition model of the deep neural network and determining the position information of each character;
the merging unit 5 is configured to merge characters according to the position information, so as to obtain identification information of the image to be identified.
In addition, the invention also provides an information extraction system of the document image, which comprises the following steps:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
and merging the characters according to the position information to obtain the identification information of the image to be identified.
Further, the present invention provides a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
and merging the characters according to the position information to obtain the identification information of the image to be identified.
Compared with the prior art, the information extraction system and the computer readable storage medium of the document image have the same beneficial effects as the information extraction method of the document image, and are not repeated here.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims (7)

1. An information extraction method of a document image, characterized in that the extraction method comprises:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images:
performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;
performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;
dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image;
further comprises: dividing the document image to be identified into a plurality of text fields to be identified according to each character area, and determining the type attribute of each text field to be identified;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
combining the characters according to the position information to obtain the identification information of the document image to be identified:
for each text field to be identified,
according to the position information, ordering the characters in the text field to be identified in a top-down mode;
dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;
sorting the characters in each text line from left to right according to the position information;
merging characters in each text line from left to right to obtain a recognition result of each text line;
combining the text line recognition results from top to bottom to obtain a character string;
and determining the identification information of the document image to be identified according to the character strings of the text fields to be identified.
2. The method for extracting information from a document image according to claim 1, wherein the character perception response map is an intensity map in which intensity gradually decays from a center position of a rectangular surrounding frame of a character to an edge position of the character.
3. The method for extracting information from a document image according to claim 1, wherein the full convolution neural network comprises a first feature extraction layer, a cross-layer feature fusion layer and an intensity perception prediction layer which are sequentially connected;
the method for obtaining the character perception response graph based on the full convolution neural network comprises the following steps of:
extracting features of the document image to be identified through the first feature extraction layer to obtain image features of different layers;
through the cross-layer feature fusion layer, the features of different layers are sequentially and automatically assigned with weights, combined in a weighting way, combined in a cascading way and fused in a feature way, and a multi-channel fusion feature image is obtained;
and mapping the multi-channel fusion characteristic image into a single-channel intensity image through an intensity perception prediction layer, wherein the single-channel intensity image is a character perception response image.
4. The information extraction method of a document image according to claim 1, characterized in that the information extraction method further comprises:
and storing the character strings of each text field to be identified under the corresponding type entry in the database according to the type attribute of the text field to be identified.
5. An information extraction system of a document image, characterized by comprising:
the determining unit is used for obtaining a character perception response diagram according to the document image to be identified based on the full convolution neural network;
the segmentation unit is used for segmenting the character perception response graph by adopting a watershed algorithm to obtain a plurality of segmented images:
performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;
performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;
dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image;
further comprises: dividing the document image to be identified into a plurality of text fields to be identified according to each character area, and determining the type attribute of each text field to be identified;
the extraction unit is used for extracting characters from each divided image by a connected domain extraction method to obtain characters in each divided image;
the recognition unit is used for recognizing each character based on a character recognition model of the deep neural network and determining the position information of each character;
the merging unit is used for merging the characters according to the position information to obtain the identification information of the document image to be identified:
for each text field to be identified,
according to the position information, ordering the characters in the text field to be identified in a top-down mode;
dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;
sorting the characters in each text line from left to right according to the position information;
merging characters in each text line from left to right to obtain a recognition result of each text line;
combining the text line recognition results from top to bottom to obtain a character string;
and determining the identification information of the document image to be identified according to the character strings of the text fields to be identified.
6. An information extraction system of a document image, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images:
performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;
performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;
dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image;
further comprises: dividing the document image to be identified into a plurality of text fields to be identified according to each character area, and determining the type attribute of each text field to be identified;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
combining the characters according to the position information to obtain the identification information of the document image to be identified:
for each text field to be identified,
according to the position information, ordering the characters in the text field to be identified in a top-down mode;
dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;
sorting the characters in each text line from left to right according to the position information;
merging characters in each text line from left to right to obtain a recognition result of each text line;
combining the text line recognition results from top to bottom to obtain a character string;
and determining the identification information of the document image to be identified according to the character strings of the text fields to be identified.
7. A computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:
based on the full convolution neural network, obtaining a character perception response graph according to the document image to be identified;
dividing the character perception response graph by adopting a watershed algorithm to obtain a plurality of divided images:
performing binarization segmentation on the character perception response graph through a preset first threshold value to determine seed points;
performing binarization segmentation on the character perception response graph through a preset second threshold value, and determining all character areas in the character perception response graph;
dividing each character area by adopting a seed point diffusion mode according to the seed points to obtain a divided image;
further comprises: dividing the document image to be identified into a plurality of text fields to be identified according to each character area, and determining the type attribute of each text field to be identified;
character extraction is carried out on each segmented image through a connected domain extraction method, so that characters in each segmented image are obtained;
based on a character recognition model of the deep neural network, recognizing each character, and determining the position information of each character;
combining the characters according to the position information to obtain the identification information of the document image to be identified:
for each text field to be identified,
according to the position information, ordering the characters in the text field to be identified in a top-down mode;
dividing the characters sequenced from top to bottom into different text lines according to the gaps among the characters;
sorting the characters in each text line from left to right according to the position information;
merging characters in each text line from left to right to obtain a recognition result of each text line;
combining the text line recognition results from top to bottom to obtain a character string;
and determining the identification information of the document image to be identified according to the character strings of the text fields to be identified.
CN202010441086.7A 2020-05-22 2020-05-22 Information extraction method and system for document image Active CN111611933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010441086.7A CN111611933B (en) 2020-05-22 2020-05-22 Information extraction method and system for document image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010441086.7A CN111611933B (en) 2020-05-22 2020-05-22 Information extraction method and system for document image

Publications (2)

Publication Number Publication Date
CN111611933A CN111611933A (en) 2020-09-01
CN111611933B true CN111611933B (en) 2023-07-14

Family

ID=72199241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010441086.7A Active CN111611933B (en) 2020-05-22 2020-05-22 Information extraction method and system for document image

Country Status (1)

Country Link
CN (1) CN111611933B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132150B (en) * 2020-09-15 2024-05-28 上海高德威智能交通系统有限公司 Text string recognition method and device and electronic equipment
CN112580655B (en) * 2020-12-25 2021-10-08 特赞(上海)信息科技有限公司 Text detection method and device based on improved CRAFT
CN113554549B (en) * 2021-07-27 2024-03-29 深圳思谋信息科技有限公司 Text image generation method, device, computer equipment and storage medium
CN113554027B (en) * 2021-08-09 2024-10-15 深圳市迪博企业风险管理技术有限公司 Method for calibrating and extracting reimbursement bill image text information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain
CN109117713A (en) * 2018-06-27 2019-01-01 淮阴工学院 A kind of drawing printed page analysis of full convolutional neural networks and character recognition method
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain
CN109117713A (en) * 2018-06-27 2019-01-01 淮阴工学院 A kind of drawing printed page analysis of full convolutional neural networks and character recognition method
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device

Also Published As

Publication number Publication date
CN111611933A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111611933B (en) Information extraction method and system for document image
CN112200161B (en) Face recognition detection method based on mixed attention mechanism
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN108830188B (en) Vehicle detection method based on deep learning
CN110210475B (en) License plate character image segmentation method based on non-binarization and edge detection
CN103049763B (en) Context-constraint-based target identification method
CN111460927B (en) Method for extracting structured information of house property evidence image
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN110516676A (en) A kind of bank's card number identifying system based on image procossing
CN111242144A (en) Method and device for detecting abnormality of power grid equipment
CN113435319B (en) Classification method combining multi-target tracking and pedestrian angle recognition
CN113808123B (en) Dynamic detection method for liquid medicine bag based on machine vision
Li et al. Pixel‐Level Recognition of Pavement Distresses Based on U‐Net
CN110991447A (en) Train number accurate positioning and identification method based on deep learning
CN113609895A (en) Road traffic information acquisition method based on improved Yolov3
Satti et al. R‐ICTS: Recognize the Indian cautionary traffic signs in real‐time using an optimized adaptive boosting cascade classifier and a convolutional neural network
Mei et al. A conditional wasserstein generative adversarial network for pixel-level crack detection using video extracted images
CN111680691B (en) Text detection method, text detection device, electronic equipment and computer readable storage medium
CN114078106B (en) Defect detection method based on improved Faster R-CNN
CN113989624A (en) Infrared low-slow small target detection method and device, computing equipment and storage medium
CN117437647B (en) Oracle character detection method based on deep learning and computer vision
CN114998689B (en) Track data set generation method, track identification method and system
CN104573663B (en) A kind of English scene character recognition method based on distinctive stroke storehouse
Wu et al. Research on asphalt pavement disease detection based on improved YOLOv5s
KR102026280B1 (en) Method and system for scene text detection using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant