Specific embodiment
Firstly, the application scenarios to the disclosure are illustrated, the disclosure can be applied to the scene of character recognition, at this
Under scape, character recognition algorithm mainly includes two steps of character machining and character recognition.Currently, character machining can be divided into single word
Symbol detection and line of text extract two ways, wherein single character machining be directly to the single character in target image into
Row detection, line of text extract the character zone for mainly extracting distribution of embarking on journey.For above two mode, single character machining
Easily there is a situation where missing inspections, i.e., one or more characters in target image are not detected, to influence character recognition
Accuracy rate;It is that will the embark on journey character of distribution is not susceptible to missing inspection as entirety, but needs after detecting line of text that line of text, which is extracted,
Each character in line of text is split, to have higher requirement to the accuracy rate of segmentation.For above-mentioned different word
Detection mode is accorded with, character recognition mode is also different:It, can be directly to the single character of extraction point when using single character machining
It is not identified, and permutation and combination is carried out to all single characters according to the character location information of single character, to generate most
Whole recognition result;Using line of text extract when, need first to be split the character in each line of text, then to segmentation after
Character is identified, and carries out arrangement group according to character identification result of the location information of each line of text to each line of text
It closes, to generate final recognition result.
Since current text image can be divided into file and picture and scene image, wherein what file and picture generally included
Character quantity is more, the character regularity of distribution, and image background is single;Characters different from file and picture, that scene image generally includes
Negligible amounts, character types are abundant, and arbitrarily, image background is complicated for character distribution.For file and picture and scene image, due to tool
Standby above-mentioned different characteristics of image, so that current character recognition algorithm can not carry out word to file and picture and scene image simultaneously
Symbol identification, and need to carry out character recognition respectively by different character recognition algorithms, to cause character recognition algorithm
Versatility is poor.
To solve the above-mentioned problems, the present disclosure proposes a kind of character identifying method, device, storage medium and electronics to set
It is standby, it is possible, firstly, to determine that the image category of target image then determines the corresponding correction of the target image according to image category
Then processing mode is corrected processing to the target image according to the corresponding correction process mode of the target image, secondly,
At least one line of text image can be extracted from the target image after correction process, finally, identifying according to character recognition model
Character to be identified at least one line of text image.Since different image categories corresponds to different correction process modes, this
The image of different images classification can be corrected processing according to corresponding correction process mode by sample, and to correction process after
Image carry out character recognition, the disclosure, which can satisfy, carries out character recognition to text image and scene image, so as to avoid
The poor problem of the versatility of character recognition algorithm in the prior art.
The disclosure is described in detail below with reference to specific embodiment.
Fig. 1 is a kind of flow diagram of character identifying method shown according to an exemplary embodiment.As shown in Figure 1,
The method includes:
S101, the corresponding image category of target image including character to be identified is determined.
In this step, which may include file and picture and scene image, wherein file and picture generally includes
Character quantity it is more, the character regularity of distribution, image background is single;Words different from file and picture, that scene image generally includes
Accord with negligible amounts, character types are abundant, and arbitrarily, image background is complicated for character distribution, it is contemplated that file and picture and scene image it
Between have above-mentioned different characteristics of image, therefore, different images classification corresponds to different correction process modes, above-mentioned image category
It is merely illustrative, the disclosure is not construed as limiting this.
In one possible implementation, the available image pattern for having determined that image category, and according to the image
Sample determines the corresponding image category of the target image, and further, which may include file and picture sample and field
Scape image pattern, and the difference between the quantity of the document image pattern and the quantity of the scene image sample is less than or waits
In preset threshold, in this way, can be default by file and picture sample and scene image sample training based on the method for deep learning
Classifier obtains object classifiers, thus when the target image is input in the object classifiers, which can be with
Export the corresponding image category of the target image.
S102, processing is corrected to the target image by the corresponding correction process mode of the image category.
When the image category is file and picture, since character to be identified in file and picture is generally in dense distribution, this
Sample may influence whether the accurate of character recognition if the character to be identified in file and picture has inclination and/or distortion
Rate, in order to avoid the problem, the disclosure can be corrected processing to the document image, which includes direction school
Positive processing and/or distortion correction processing, at this point, being carried out by the corresponding correction process mode of the image category to the target image
Correction process may comprise steps of:
The first tilt angle between the character and trunnion axis to be identified in S11, acquisition the document image.
In one possible implementation, this can be obtained by projective analysis method or Hough transform method etc. first to incline
Rake angle, it is, of course, also possible to which carrying out Threshold segmentation to the document image obtains binary document image, and according to binary document image
In the pixel acquisition of information of character to be identified first tilt angle, detailed process can refer to the prior art, no longer superfluous
It states.
S12, determine whether first tilt angle is more than or equal to predetermined angle.
When first tilt angle is more than or equal to the predetermined angle, step S13 and S14 are executed;
When first tilt angle is less than the predetermined angle, step S14 is executed.
S13, correction for direction processing is carried out to the document image.
Wherein, correction for direction processing, which can be, constantly rotates the target image, until the word to be identified in text image
The first tilt angle between symbol and trunnion axis is less than the predetermined angle.
S14, determine that the character to be identified in the document image whether there is distortion.
When using scanner or camera acquisition text image, if text inclination itself and bending or shooting visual angle
Inclination etc. then will lead to text image and there is distortion, in this way, line of text originally horizontally or vertically is become bended, from
And cause the presence of interference between the line of text in text image, influence the final recognition result of character to be identified.
When character to be identified in the document image has distortion, step S15 is executed;
Character to be identified in the document image determines there is no when distortion and completes correction process.
S15, distortion correction processing is carried out to the document image.
Wherein, distortion correction processing can be by being corrected, so that line of text using the blank position between line of text
Horizontal distribution or vertical distribution are reverted to, detailed process can refer to the prior art, repeat no more.
It should be noted that for simple description, therefore, it is stated as a series of dynamic for above method embodiment
It combines, but those skilled in the art should understand that, the disclosure is not limited by the described action sequence, because of foundation
The disclosure, some steps may be performed in other sequences or simultaneously, for example, step S14 and S15 can step S11 it
Preceding execution, at this point it is possible to first distortion correction processing, then carry out correction for direction processing;Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, the related actions and modules not necessarily disclosure
It is necessary.
To sum up, based on the characteristics of image of text image, step S11 to S15 can be by character to be identified in text image
First tilt angle and distortion are corrected, to improve the accuracy rate of the character recognition in subsequent step.
When the image category is scene image, since character to be identified in scene image is generally in sparse distribution, and
And often there is a small amount of line of text for being arbitrarily distributed, in this way, influenced between line of text in scene image it is smaller, without into
Line distortion correction process, therefore, for scene image, corresponding correction process mode is correction for direction processing, specifically, is passed through
The corresponding correction process mode of the image category is corrected processing to the target image and includes the following steps:
S21, at least one character area is obtained to scene image progress word area detection.
Wherein, word area detection may include based on edge detection, be based on region detection, based on skin texture detection or base
In any one of study detection, it is, of course, also possible to be two kinds, three kinds or four kinds of knot in above-mentioned four kinds of detection methods
It closes, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
S22, the second inclination between the character and trunnion axis to be identified at least one character area is successively obtained
Angle.
May be used also certainly likewise it is possible to obtain second tilt angle by projective analysis method or Hough transform method etc.
Two-value scene image is obtained to carry out Threshold segmentation to the scene image, and according to the character to be identified in two-value scene image
Pixel acquisition of information second tilt angle, detailed process can refer to the prior art, repeat no more.
When second tilt angle is more than or equal to the predetermined angle, step S23 is executed;
When second tilt angle is less than the predetermined angle, determines and complete correction process.
S23, correction for direction processing is carried out at least one character area.
Wherein, correction for direction processing, which can be, constantly rotates the character area, until the word to be identified in this article one's respective area
The second tilt angle between symbol and trunnion axis is less than the predetermined angle.
To sum up, based on the characteristics of image of scene image, step S21 to S23 can be by character to be identified in scene image
Second tilt angle is corrected, to improve the accuracy rate of the character recognition in subsequent step.
S103, at least one line of text image is extracted from the target image after correction process.
In this step, at least one line of text image can be extracted based on the method for deep learning specifically can wrap
Include following steps:
S31, the space characteristics that target image is extracted by the multilayer convolutional layer in line of text detection model.
Wherein, which can be the correlativity in the target image between pixel.
S32, the Recognition with Recurrent Neural Network layer that the space characteristics are input in line of text detection model is obtained into the target image
Sequence signature.
In this step, which can be LSTM (long memory network in short-term;Long Short Term
Memory Network), BLSTM (two-way length memory network in short-term;Bi-directional Long Short Term
Memory Network) or GRU (Gated Recurrent Unit, LSTM variant) etc., above-mentioned example is merely illustrative,
The disclosure is not construed as limiting this.
S33, the candidate text box in the target image is obtained according to preset rules, and based on the sequence signature to the candidate
Text box is classified.
It in one possible implementation, can be sliding in the target image using the sliding window of default size and ratio
Dynamic, to intercept candidate's text box, detailed process refers to the prior art, and the disclosure repeats no more.
Wherein, which can be completed by the classification layer in this article current row detection model, illustratively, the classification layer
Can be softmax layers, and the softmax layers of input is consistent with the dimension exported, the softmax layers of input with it is defeated
When dimension out is inconsistent, need to increase full articulamentum before softmax layers, to reach softmax layers of input and output
Dimension it is consistent.
S34, the text box location information for returning convolutional layer and obtaining candidate text box in line of text detection model is used.
S35, using NMS, (non-maximum value inhibits;Non maximum suppression) method, according to text frame position
Confidence breath and classification results screen candidate text box to obtain line of text image.
S104, the character to be identified at least one this article current row image is identified by preset characters identification model.
Usual character recognition step is handled as unit of character, then carries out Character prediction using character classifier, still,
In line of text image complexity, Character segmentation is relatively difficult, may destroy charcter topology, since the precision of Character segmentation is direct
The final recognition result of character is influenced, in order to avoid the low problem of recognition accuracy caused by Character segmentation, the disclosure can be with
As a whole by line of text image, the character to be identified in this article current row image is not cut, Direct Recognition text
Whole character to be identified in row image, so as to make full use of character context relation to be identified.
It should be noted that further including before this step:The location information of at least one this article current row image is obtained,
In, after determining line of text image in step s 103, this article current row image pair can be determined according to text box location information
The location information answered, at this point, identifying at least one this article current row image by the preset characters identification model and the location information
In the character to be identified, which includes deep learning layer, circulating net network layers and coding layer, specifically
Ground, character recognition process may comprise steps of:
S41, character feature extraction is carried out at least one this article current row image according to the deep learning layer.
Wherein, which can be CNN (convolutional neural networks;Convolutional Neural
Networks), in this way, at least one this article current row image can be formed multiple slices along horizontal direction by CNN, each
Slice has corresponded to a character feature, since there may be overlappings between the contiguous slices, so that the character feature includes
Certain context relation.
S42, the character feature of extraction is input to the circulating net network layers, and to obtain at least one this article current row image corresponding
Feature vector.
Wherein, which can be LSTM, BLSTM or GRU etc., in this way, passing through the neural net layer
The character feature can further be learnt, to obtain being sliced corresponding feature vector, above-mentioned example is merely illustrative, this
It is open that this is not construed as limiting.
S43, it this feature vector is input to the coding layer obtains the coding result of at least one this article current row image, and root
The text information of at least one this article current row image is obtained according to the coding result.
In this step, which can be CTC (timing sorting algorithm;Connectionist Temporal
Classification) layer, in this way, coding result can be obtained according to CTC layers, due to may include more in this article current row image
Therefore a character to be identified may include multiple codings in the coding result, in this way, by each coding in the coding result
Matched to obtain the corresponding character of each coding with pre-arranged code corresponding relationship, it will be every according to the coded sequence of multiple coding
The corresponding character of a coding carries out ordered arrangement and obtains the text information of this article current row image, wherein the pre-arranged code is corresponding to close
It is the corresponding relationship between coded samples and character sample, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
S44, it is somebody's turn to do according to text information progress ordered arrangement of the location information at least one this article current row image
The target identification result of target image.
In this step, can be obtained according to the location information at least one line of text image in this article current row image it
Between sequencing, know to be ranked up to obtain target according to sequencing for the text information of at least one line of text image
Other result.
It should be noted that the disclosure be by the character to be identified in target image be it is horizontally arranged for be illustrated
, when the character to be identified is vertical arrangement, at least one text column image in the target image can be extracted, and pass through
Preset characters identification model identifies the character to be identified at least one text column image, and detailed process can refer to above-mentioned
The narration of line of text image, repeats no more.
The above method is used, it is possible, firstly, to determine the image category of target image, then, determining according to image category should
The corresponding correction process mode of target image, then, according to the corresponding correction process mode of the target image to the target image
It is corrected processing, secondly, at least one line of text image can be extracted from the target image after correction process, finally, root
The character to be identified at least one line of text image is identified according to character recognition model.Since different image categories is corresponding different
Correction process mode, in this way, the image of different images classification can be corrected place according to corresponding correction process mode
Reason, and character recognition is carried out to the image after correction process, the disclosure, which can satisfy, carries out word to text image and scene image
Symbol identification, so as to avoid the problem that the versatility of character recognition algorithm in the prior art is poor.
Fig. 2 is the block diagram of character recognition device 20 shown according to an exemplary embodiment, as shown in Fig. 2, including:
Determining module 201, for determining the corresponding image category of target image including character to be identified;Wherein, different
Image category correspond to different correction process modes;
Correction module 202, for being corrected by the corresponding correction process mode of the image category to the target image
Processing;
Extraction module 203, for extracting at least one line of text image from the target image after correction process;
Identification module 204, for by preset characters identification model identify at least one this article current row image should be to
Identify character.
Optionally, which includes file and picture and scene image.
Fig. 3 is the block diagram of determining module 201 shown according to an exemplary embodiment, as shown in figure 3, the determining module
201 include:
First acquisition submodule 2011, for obtaining the image pattern for having determined that image category;
First determines submodule 2012, for determining the corresponding image category of the target image according to the image pattern.
Fig. 4 is the block diagram of correction module 202 shown according to an exemplary embodiment, as shown in figure 4, in the image category
When for file and picture, which includes correction for direction processing and/or distortion correction processing;In the correction process mode
When handling including direction correction process and the distortion correction, which includes:
Second acquisition submodule 2021, for obtaining between the character and trunnion axis to be identified in text image
One tilt angle;
First correction module 2022 is used for when first tilt angle is more than or equal to predetermined angle, to this article
This image carries out correction for direction processing;
Second determines submodule 2023, for determining the character to be identified in text image with the presence or absence of distortion;
Second correction module 2024, when there is distortion for the character to be identified in text image, to this article
This image carries out distortion correction processing.
Fig. 5 is the block diagram of correction module 202 shown according to an exemplary embodiment, as shown in figure 5, in the image category
When for scene image, which includes correction for direction processing;The correction module 202 includes:
Detection sub-module 2025 obtains at least one character area for carrying out word area detection to the scene image;
Third acquisition submodule 2026, for successively obtaining the character and water to be identified at least one character area
The second tilt angle between flat axis;
Third correction module 2027, be greater than for second tilt angle at least one character area or
When equal to predetermined angle, correction for direction processing is carried out at least one character area.
Fig. 6 is the block diagram of character recognition device 20 shown according to an exemplary embodiment, as shown in fig. 6, further including:
Module 305 is obtained, for identifying being somebody's turn to do at least one this article current row image by preset characters identification model
Before character to be identified, the location information of at least one this article current row image is obtained;
The identification module 304, for identifying at least one this article by the preset characters identification model and the location information
The character to be identified in current row image.
Fig. 7 is the block diagram of identification module 304 shown according to an exemplary embodiment, as shown in fig. 7, the preset characters are known
Other model includes deep learning layer, circulating net network layers and coding layer, which includes:
Extracting sub-module 3041, for carrying out character feature at least one this article current row image according to the deep learning layer
It extracts;
4th acquisition submodule 3042 obtains at least one for the character feature of extraction to be input to the circulating net network layers
The corresponding feature vector of this article current row image;
5th acquisition submodule 3043 obtains at least one this article current row for this feature vector to be input to the coding layer
The coding result of image, and the text information of at least one this article current row image is obtained according to the coding result;
6th acquisition submodule 3044, for the text information according to the location information at least one this article current row image
It carries out ordered arrangement and obtains the target identification result of the target image.
Above-mentioned apparatus is used, it is possible, firstly, to determine the image category of target image, then, determining according to image category should
The corresponding correction process mode of target image, then, according to the corresponding correction process mode of the target image to the target image
It is corrected processing, secondly, at least one line of text image can be extracted from the target image after correction process, finally, root
The character to be identified at least one line of text image is identified according to character recognition model.Since different image categories is corresponding different
Correction process mode, in this way, the image of different images classification can be corrected place according to corresponding correction process mode
Reason, and character recognition is carried out to the image after correction process, the disclosure, which can satisfy, carries out word to text image and scene image
Symbol identification, so as to avoid the problem that the versatility of character recognition algorithm in the prior art is poor.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
Fig. 8 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.As shown in figure 8, the electronics is set
Standby 800 may include:Processor 801, memory 802.The electronic equipment 800 can also include multimedia component 803, input/
Export one or more of (I/O) interface 804 and communication component 805.
Wherein, processor 801 is used to control the integrated operation of the electronic equipment 800, to complete above-mentioned character recognition side
All or part of the steps in method.Memory 802 is for storing various types of data to support the behaviour in the electronic equipment 800
To make, these data for example may include the instruction of any application or method for operating on the electronic equipment 800, with
And the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory 802
It can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static random-access is deposited
Reservoir (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory
(Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable
Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory
(Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as
ROM), magnetic memory, flash memory, disk or CD.Multimedia component 803 may include screen and audio component.Wherein
Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include
One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage
Device 802 is sent by communication component 805.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O
Interface 804 provides interface between processor 801 and other interface modules, other above-mentioned interface modules can be keyboard, mouse,
Button etc..These buttons can be virtual push button or entity button.Communication component 805 is for the electronic equipment 800 and other
Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field
Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication
Component 805 may include:Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital
Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device,
Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array
(Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member
Part is realized, for executing above-mentioned character identifying method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should
The step of above-mentioned character identifying method is realized when program instruction is executed by processor.For example, the computer readable storage medium
It can be the above-mentioned memory 802 including program instruction, above procedure instruction can be executed by the processor 801 of electronic equipment 800
To complete above-mentioned character identifying method.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can
No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought equally should be considered as disclosure disclosure of that.