CN110110715A - Text detection model training method, text filed, content determine method and apparatus - Google Patents
Text detection model training method, text filed, content determine method and apparatus Download PDFInfo
- Publication number
- CN110110715A CN110110715A CN201910367675.2A CN201910367675A CN110110715A CN 110110715 A CN110110715 A CN 110110715A CN 201910367675 A CN201910367675 A CN 201910367675A CN 110110715 A CN110110715 A CN 110110715A
- Authority
- CN
- China
- Prior art keywords
- text
- candidate region
- image
- updated
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of text detection model training method, text filed, contents to determine method and apparatus;Wherein, text detection model training method includes: to extract multiple initial characteristics figures that network extracts target training image by fisrt feature;Fusion treatment is carried out to multiple initial characteristics figures by Fusion Features network, obtains fusion feature figure;Fusion feature figure is input to the first output network, exports the probability value of candidate region and each candidate region text filed in target training image;First-loss value is determined by preset Detectability loss function;The first initial model is trained according to first-loss value, until the parameter convergence in the first initial model, obtains text detection model.The present invention quickly can all-sidedly and accurately detect each class text in image under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes, and then be also beneficial to the accuracy of follow-up text identification, improve the effect of text identification.
Description
Technical field
The present invention relates to technical field of image processing, more particularly, to a kind of text detection model training method, text area
Domain, content determine method and apparatus.
Background technique
In the related technology, the detection and identification of text can be realized by character cutting mode or deep learning mode.
But in the simple scenarios such as these modes are commonly available to, and font size is single, background is simple, text alignment direction is single;In complexity
Under scene, such as a variety of font sizes, multiple fonts, various shapes, a variety of directions, the changeable scene of background, above-mentioned text detection identification
The effect of mode is poor.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of text detection model training method, text filed, content is true
Method and apparatus are determined, quickly all-sidedly and accurately to detect under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes
Each class text in image out, and then it is also beneficial to the accuracy of follow-up text identification, improve the effect of text identification.
In a first aspect, the embodiment of the invention provides a kind of text detection model training methods, this method comprises: based on pre-
If training gather determine target training image;Target training image is input to the first initial model;First initial model packet
It includes fisrt feature and extracts network, Fusion Features network and the first output network;Network, which is extracted, by fisrt feature extracts target instruction
Practice multiple initial characteristics figures of image;Scale between multiple initial characteristics figures is different;By Fusion Features network to multiple first
Beginning characteristic pattern carries out fusion treatment, obtains fusion feature figure;Fusion feature figure is input to the first output network, output target instruction
Practice the probability value of candidate region and each candidate region text filed in image;It is determined by preset Detectability loss function
The first-loss value of the probability value of candidate region and each candidate region;The first initial model is carried out according to first-loss value
Training, until the parameter convergence in the first initial model, obtains text detection model.
In some embodiments, it includes sequentially connected the first convolutional network of multiple groups that above-mentioned fisrt feature, which extracts network,;Often
The first convolutional network of group includes sequentially connected convolutional layer, batch normalization layer and activation primitive layer.
In some embodiments, fusion treatment is carried out to multiple initial characteristics figures above by Fusion Features network, obtained
The step of fusion feature figure, comprising: according to the scale of initial characteristics figure, multiple initial characteristics figures are arranged successively;Wherein, it most pushes up
The scale of the initial characteristics figure of level is minimum;The scale of the initial characteristics figure of bottom grade is maximum;By the initial spy of top grade
Sign figure is determined as the fusion feature figure of top grade;In addition to top grade, by the initial characteristics figure and current layer of current level
The fusion feature figure of a upper level for grade is merged, and the fusion feature figure of current level is obtained;The fusion of lowest hierarchical level is special
Sign figure is determined as final fusion feature figure.
In some embodiments, above-mentioned first output network includes the first convolutional layer and the second convolutional layer;It is above-mentioned to merge
Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image
Probability value the step of, comprising: fusion feature figure is separately input into the first convolutional layer and the second convolutional layer;Pass through the first convolution
Layer carries out the first convolution algorithm, output coordinate matrix to fusion feature figure;Coordinates matrix includes text area in target training image
The apex coordinate of the candidate region in domain;The second convolution algorithm, output probability square are carried out to fusion feature figure by the second convolutional layer
Battle array;Probability matrix includes the probability value of each candidate region.
In some embodiments, above-mentioned Detectability loss function includes first function and second function;Above-mentioned first function is
L1=| G*-G|;Wherein, G*For coordinates matrix text filed in the target training image that marks in advance;G is the first output network
The coordinates matrix of text filed candidate region in the target training image of output;Above-mentioned second function is L2=-Y*logY-(1-
Y*)log(1-Y);Wherein, Y*For probability matrix text filed in the target training image that marks in advance;Y is the first output net
The probability matrix of text filed candidate region in the target training image of network output;Log indicates logarithm operation;Above-mentioned candidate regions
The first-loss value L=L of the probability value in domain and each candidate region1+L2。
In some embodiments, above-mentioned that the first initial model is trained according to first-loss value, until first is initial
The step of parameter in model restrains, obtains text detection model, comprising: updated in the first initial model according to first-loss value
Parameter;Judge whether updated parameter restrains;It, will be at the beginning of parameter updated first if updated parameter restrains
Beginning model is determined as detection model;If updated parameter does not restrain, continue to execute true based on preset training set
Set the goal training image the step of, until updated parameter restrains.
In some embodiments, the step of above-mentioned parameter updated according to first-loss value in the first initial model, comprising:
According to preset rules, parameter to be updated is determined from the first initial model;Calculate first-loss value in the first initial model to more
The derivative of new parameterWherein, L is first-loss value;W is parameter to be updated;Parameter to be updated is updated, is obtained updated
Parameter to be updated Wherein, α is predetermined coefficient.
Second aspect, the embodiment of the invention provides a kind of text filed determining methods, this method comprises: obtaining to be detected
Image;Image to be detected is input to the text detection model that training is completed in advance, is exported text filed in image to be detected
The probability value of multiple candidate regions and each candidate region;The training that text detection model passes through above-mentioned text detection model
Method training obtains;According to the overlapping degree between the probability value of candidate region and multiple candidate regions, from multiple candidate regions
It is determined in domain text filed in image to be detected.
In some embodiments, the overlapping journey between above-mentioned probability value and multiple candidate regions according to candidate region
Degree, from the text filed step determined in multiple candidate regions in image to be detected, comprising: according to the probability of candidate region
Value, multiple candidate regions are arranged successively;Wherein, the probability value of first candidate region is maximum, the last one candidate region
Probability value is minimum;Using first candidate region as current candidate region, current candidate region is calculated one by one and removes current candidate
The overlapping degree of candidate region other than region;By in the candidate region in addition to current candidate region, overlapping degree is greater than pre-
If anti-eclipse threshold candidate region reject;Using next candidate region in current candidate region as new current candidate area
Domain continues to execute the step for calculating current candidate region with the overlapping degree of the candidate region in addition to current candidate region one by one
Suddenly, until reaching the last one candidate region;Remaining candidate region after rejecting is determined as the text in image to be detected
Region.
In some embodiments, the above-mentioned probability value according to candidate region, the step of multiple candidate regions are arranged successively
Before, this method further include: by multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained
To final multiple candidate regions.
The third aspect, the embodiment of the invention provides a kind of content of text to determine method, this method comprises: passing through above-mentioned text
One's respective area determines method, obtains text filed in image;The text identification mould that training is completed in advance is input to by text filed
Type exports text filed recognition result;The content of text in text filed is determined according to recognition result.
In some embodiments, it is above-mentioned by it is text filed be input in advance training complete identification model the step of before,
The above method further include: according to pre-set dimension, be normalized to text filed.
In some embodiments, above-mentioned text identification model is completed by following manner training: being based on preset training set
It closes and determines target training text image;Target training text image is input to the second initial model;Second initial model includes
Second feature extracts network, feature splits network, second exports network and classification function;Network is extracted by second feature to extract
The characteristic pattern of target training text image;Network is split by feature, and characteristic pattern is split into at least one subcharacter figure;It will be sub
Characteristic pattern is separately input into the second output network, exports the corresponding output matrix of each subcharacter figure;By each subcharacter figure pair
The output matrix answered is separately input into classification function, exports the corresponding probability matrix of each subcharacter figure;Pass through preset identification
Loss function determines the second penalty values of probability matrix;The second initial model is trained according to the second penalty values, until the
Parameter convergence in two initial models, obtains text identification model.
In some embodiments, it includes sequentially connected the second convolutional network of multiple groups that above-mentioned second feature, which extracts network,;Often
The second convolutional network of group includes sequentially connected convolutional layer, pond layer and activation primitive layer.
In some embodiments, characteristic pattern is split into the step of at least one subcharacter figure above by feature fractionation network
Suddenly, comprising: along the column direction of characteristic pattern, characteristic pattern is split into at least one subcharacter figure;The column direction of characteristic pattern is text
The vertical direction of this line direction.
In some embodiments, above-mentioned second output network includes multiple full articulamentums;The quantity and Zi Te of full articulamentum
The quantity for levying figure is corresponding;Subcharacter figure is separately input into the second output network, exports the corresponding output square of each subcharacter figure
The step of battle array, comprising: each subcharacter figure is separately input into corresponding full articulamentum, so that each full articulamentum output
The corresponding output matrix of characteristic pattern.
In some embodiments, above-mentioned classification function includes Softmax function;The Softmax function isWherein, e indicates natural constant;T indicates t-th of probability matrix;K indicates the target instruction of the training set
Practice the number for the kinds of characters that text image is included;M is indicated from 1 to K+1;∑ indicates summation operation;For the output square
I-th of element in battle array;It is describedFor the probability matrix ptIn i-th of element.
In some embodiments, above-mentioned identification loss function include L=-log p (y | { pt}T=1 ... T);Wherein, y is preparatory
The probability matrix of the target training text image of mark;T indicates t-th of probability matrix;ptFor classification function output
Each of the corresponding probability matrix of the subcharacter figure;T is the total quantity of the probability matrix;P indicates to calculate probability;Log table
Show logarithm operation.
In some embodiments, above-mentioned that the second initial model is trained according to the second penalty values, until second is initial
The step of parameter in model restrains, obtains text identification model, comprising: updated in the second initial model according to the second penalty values
Parameter;Judge whether updated parameter restrains;It, will be at the beginning of parameter updated second if updated parameter restrains
Beginning model is determined as text identification model;If updated parameter does not restrain, continue to execute based on preset training set
The step of determining target training text image is closed, until updated parameters are restrained.
In some embodiments, above-mentioned the step of updating parameters in the second initial model according to the second penalty values, packet
It includes: according to preset rules, determining parameter to be updated from the second initial model;Calculate the derivative that the second penalty values treat undated parameterWherein, L ' is the penalty values of probability matrix;W ' is parameter to be updated;Parameter to be updated is updated, is obtained updated to more
New parameterWherein, α ' is predetermined coefficient.
In some embodiments, above-mentioned text filed recognition result includes text filed corresponding multiple probability matrixs;
The step of content of text in text filed is determined according to recognition result, comprising: determine the maximum probability in each probability matrix
The position of value;From in the corresponding relationship of position each in pre-set probability matrix and character, the position of most probable value is obtained
Set corresponding character;According to putting in order for multiple probability matrixs, the character got is arranged;It is determined according to the character after arrangement
Content of text in text filed.
In some embodiments, the step of above-mentioned character according to after arrangement determines the content of text in text filed, packet
It includes: according to preset rules, deleting the repeat character (RPT) and null character in the character after arranging, obtain in the text in text filed
Hold.
In some embodiments, after the step of above-mentioned content of text determined according to recognition result in text filed, side
Method further include: if include in image it is multiple text filed, obtain it is each it is text filed in content of text;By building in advance
Whether it includes sensitive information that vertical sensitive dictionary determines in the corresponding content of text of image.
In some embodiments, above by the sensitive dictionary pre-established determine in the corresponding content of text of image whether
The step of including sensitive information, comprising: participle operation is carried out to the content of text got;It will be obtained after participle operation one by one
Sensitive with what is pre-established the dictionary of participle matched;If at least one participle successful match, determines the corresponding text of image
It include sensitive information in this content.
It in some embodiments, include above-mentioned side after sensitive information in the corresponding content of text of above-mentioned determining image
Method further include: obtain it is text filed belonging to the participle of successful match, identify in the picture get it is text filed, or
The participle of successful match.
Fourth aspect, the embodiment of the invention provides a kind of text detection model training apparatus, which includes: trained figure
As determining module, target training image is determined for gathering based on preset training;Training image input module is used for target
Training image is input to the first initial model;First initial model includes that fisrt feature extracts network, Fusion Features network and the
One output network;Characteristic extracting module, for extracting multiple initial spies that network extracts target training image by fisrt feature
Sign figure;Scale between multiple initial characteristics figures is different;Fusion Features module, for passing through Fusion Features network to multiple initial
Characteristic pattern carries out fusion treatment, obtains fusion feature figure;Output module, for fusion feature figure to be input to the first output net
Network exports the probability value of candidate region and each candidate region text filed in target training image;Penalty values determine and
Training module, first of the probability value for determining candidate region and each candidate region by preset Detectability loss function
Penalty values;The first initial model is trained according to first-loss value, until the parameter convergence in the first initial model, obtains
Text detection model.
In some embodiments, it includes sequentially connected the first convolutional network of multiple groups that above-mentioned fisrt feature, which extracts network,;Often
The first convolutional network of group includes sequentially connected convolutional layer, batch normalization layer and activation primitive layer.
In some embodiments, features described above Fusion Module is also used to:, will be multiple initial according to the scale of initial characteristics figure
Characteristic pattern is arranged successively;Wherein, the scale of the initial characteristics figure of top grade is minimum;The scale of the initial characteristics figure of bottom grade
It is maximum;The initial characteristics figure of top grade is determined as to the fusion feature figure of top grade;In addition to top grade, by current layer
The fusion feature figure of a upper level for the initial characteristics figure and current level of grade is merged, and the fusion feature of current level is obtained
Figure;The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.
In some embodiments, above-mentioned first output network includes the first convolutional layer and the second convolutional layer;Above-mentioned output mould
Block is also used to: fusion feature figure is separately input into the first convolutional layer and the second convolutional layer;It is special to fusion by the first convolutional layer
Sign figure carries out the first convolution algorithm, output coordinate matrix;Coordinates matrix includes candidate regions text filed in target training image
The apex coordinate in domain;The second convolution algorithm, output probability matrix are carried out to fusion feature figure by the second convolutional layer;Probability matrix
Probability value including each candidate region.
In some embodiments, above-mentioned Detectability loss function includes first function and second function;The first function is L1
=| G*-G|;Wherein, G*For coordinates matrix text filed in the target training image that marks in advance;G is that the first output network is defeated
The coordinates matrix of text filed candidate region in target training image out;The second function is L2=-Y*logY-(1-Y*)
log(1-Y);Wherein, Y*For probability matrix text filed in the target training image that marks in advance;Y is the first output network
The probability matrix of text filed candidate region in the target training image of output;Log indicates logarithm operation;Above-mentioned candidate region
And the first-loss value L=L of the probability value of each candidate region1+L2。
In some embodiments, above-mentioned penalty values are determining and training module is also used to: updating first according to first-loss value
Parameter in initial model;Judge whether updated parameter restrains;If updated parameter restrains, parameter is updated
The first initial model afterwards is determined as detection model;If updated parameter does not restrain, continue to execute based on preset
The step of determining target training image, is gathered in training, until updated parameter restrains.
In some embodiments, above-mentioned penalty values are determining and training module is also used to: initial from first according to preset rules
Model determines parameter to be updated;First-loss value is calculated to the derivative of parameter to be updated in the first initial modelWherein, L is
First-loss value;W is parameter to be updated;Parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α is predetermined coefficient.
5th aspect, the embodiment of the invention provides a kind of text filed determining device, which includes: that image obtains mould
Block, for obtaining image to be detected;Detection module, for image to be detected to be input to the text detection mould that training is completed in advance
Type exports the probability value of multiple candidate regions and each candidate region text filed in image to be detected;Text detection mould
Type is obtained by the training method training of above-mentioned text detection model;Text filed determining module, for according to candidate region
Overlapping degree between probability value and multiple candidate regions, from the text area determined in multiple candidate regions in image to be detected
Domain.
In some embodiments, above-mentioned text filed determining module is also used to:, will be multiple according to the probability value of candidate region
Candidate region is arranged successively;Wherein, the probability value of first candidate region is maximum, and the probability value of the last one candidate region is most
It is small;Using first candidate region as current candidate region, current candidate region is calculated one by one and in addition to current candidate region
Candidate region overlapping degree;By in the candidate region in addition to current candidate region, overlapping degree is greater than preset overlapping
It rejects the candidate region of threshold value;Using next candidate region in current candidate region as new current candidate region, continue to hold
Row calculates the step of overlapping degree in current candidate region and the candidate region in addition to current candidate region one by one, until reaching
The last one candidate region;Remaining candidate region after rejecting is determined as text filed in image to be detected.
In some embodiments, above-mentioned apparatus further include: module is rejected in region, for by multiple candidate regions, probability
Value is rejected lower than the candidate region of preset probability threshold value, obtains final multiple candidate regions.
6th aspect, the embodiment of the invention provides a kind of content of text determining device, which includes: that region obtains mould
Block, for obtaining text filed in image by above-mentioned text filed determining method;Identification module, being used for will be text filed
It is input to the text identification model that training is completed in advance, exports text filed recognition result;Content of text determining module, is used for
The content of text in text filed is determined according to recognition result.
In some embodiments, above-mentioned apparatus further include: normalization module is used for according to pre-set dimension, to text filed
It is normalized.
In some embodiments, above-mentioned apparatus further includes text identification model training module, for making text identification model
It is completed by following manner training: being gathered based on preset training and determine target training text image;By target training text figure
As being input to the second initial model;Second initial model includes that second feature extracts network, the second output network and classification function;
The characteristic pattern that network extracts target training text image is extracted by second feature;Characteristic pattern is split by the second initial model
At at least one subcharacter figure;Subcharacter figure is separately input into the second output network, it is corresponding defeated to export each subcharacter figure
Matrix out;The corresponding output matrix of each subcharacter figure is separately input into classification function, it is corresponding to export each subcharacter figure
Probability matrix;The second penalty values of probability matrix are determined by preset identification loss function;According to the second penalty values to second
Initial model is trained, until the parameter convergence in the second initial model, obtains text identification model.
In some embodiments, it includes sequentially connected the second convolutional network of multiple groups that above-mentioned second feature, which extracts network,;Often
The second convolutional network of group includes sequentially connected convolutional layer, pond layer and activation primitive layer.
In some embodiments, above-mentioned identification model training module is also used to: along the column direction of characteristic pattern, by characteristic pattern
Split at least one subcharacter figure;The column direction of characteristic pattern is the vertical direction of text line direction.
In some embodiments, above-mentioned second output network includes multiple full articulamentums;The quantity and Zi Te of full articulamentum
The quantity for levying figure is corresponding;Identification model training module is also used to: each subcharacter figure is separately input into corresponding full articulamentum
In, so that the corresponding output matrix of each full articulamentum output subcharacter figure.
In some embodiments, above-mentioned classification function includes Softmax function;Softmax function isWherein, e indicates natural constant;T indicates t-th of probability matrix;K indicates the target instruction of the training set
Practice the number for the kinds of characters that text image is included;M is indicated from 1 to K+1;∑ indicates summation operation;For the output square
I-th of element in battle array;It is describedFor the probability matrix ptIn i-th of element.
In some embodiments, above-mentioned identification loss function include L=-log p (y | { pt}T=1 ... T);Wherein, y is preparatory
The probability matrix of the target training text image of mark;T indicates t-th of probability matrix;ptFor classification function output
Each of the corresponding probability matrix of the subcharacter figure;T is the total quantity of the probability matrix;P indicates to calculate probability;Log table
Show logarithm operation.
In some embodiments, above-mentioned identification model training module is also used to: it is initial to update second according to the second penalty values
Parameter in model;Judge whether updated parameters restrain;If updated parameters are restrained, by parameter
Updated second initial model is determined as text identification model;If updated parameters are not restrained, continue to hold
Row gathers the step of determining target training text image based on preset training, until updated parameters are restrained.
In some embodiments, above-mentioned identification model training module is also used to: according to preset rules, from the second initial model
Determine parameter to be updated;Calculate the derivative that the second penalty values treat undated parameterWherein, L ' is the loss of probability matrix
Value;W ' is parameter to be updated;Parameter to be updated is updated, updated parameter to be updated is obtainedIts
In, α ' is predetermined coefficient.
In some embodiments, above-mentioned text filed recognition result includes text filed corresponding multiple probability matrixs;
Above-mentioned content of text determining module is also used to: determining the position of the most probable value in each probability matrix;From pre-set
In probability matrix in the corresponding relationship of each position and character, the corresponding character in position of most probable value is obtained;According to multiple
Probability matrix puts in order, and arranges the character got;The content of text in text filed is determined according to the character after arrangement.
In some embodiments, above-mentioned content of text determining module is also used to: the word according to preset rules, after deleting arrangement
Repeat character (RPT) and null character in symbol, obtain the content of text in text filed.
In some embodiments, above-mentioned apparatus further include: data obtaining module, if for including multiple texts in image
One's respective area, obtain it is each it is text filed in content of text;Sensitive information determining module, for the sensitive word by pre-establishing
Whether it includes sensitive information that library determines in the corresponding content of text of image.
In some embodiments, above-mentioned sensitive information determining module is also used to: being segmented to the content of text got
Operation;The participle dictionary sensitive with what is pre-established obtained after participle operation is matched one by one;If at least one is segmented
Successful match determines that in the corresponding content of text of image include sensitive information.
In some embodiments, above-mentioned apparatus further include: area identification module, for obtaining belonging to the participle of successful match
It is text filed, identify in the picture get it is text filed.
7th aspect, the embodiment of the invention provides a kind of electronic equipment, including processor and memory, memory storages
There is the machine-executable instruction that can be executed by processor, processor executes machine-executable instruction to realize above-mentioned text detection
The step of model training method, above-mentioned text filed determining method or above-mentioned content of text determine method.
Eighth aspect, the embodiment of the invention provides a kind of machine readable storage medium, which is deposited
Machine-executable instruction is contained, when being called and being executed by processor, machine-executable instruction promotes the machine-executable instruction
Processor realizes that above-mentioned text detection model training method, above-mentioned text filed determining method or above-mentioned content of text determine
The step of method.
The embodiment of the present invention bring it is following the utility model has the advantages that
The scale of text detection model training method provided in an embodiment of the present invention, first extraction target training image is mutual
Different multiple initial characteristics figures;Fusion treatment is carried out to multiple initial characteristics figures again, obtains fusion feature figure;And then it will fusion
Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image
Probability value;After determining first-loss value by preset Detectability loss function, according to the first-loss value to the first introductory die
Type is trained, and obtains detection model.In which, feature extraction network can automatically extract the feature of different scale, thus
Text detection model, it is only necessary to which inputting an image can be obtained the text filed candidate regions of various scales in the image
Domain no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of directions
Under scene, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to the standard of follow-up text identification
True property, improves the effect of text identification.
Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with
Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, better embodiment is cited below particularly, and match
Appended attached drawing is closed, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of text detection model training method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram that a kind of fisrt feature provided in an embodiment of the present invention extracts network;
Fig. 3 is a kind of schematic diagram that multiple initial characteristics figures are carried out with fusion treatment provided in an embodiment of the present invention;
Fig. 4 is the flow chart of the text filed determining method of one kind provided in an embodiment of the present invention;
Fig. 5 is the flow chart of the text filed determining method of another kind provided in an embodiment of the present invention;
Fig. 6 is the flow chart that a kind of content of text provided in an embodiment of the present invention determines method;
Fig. 7 is a kind of flow chart of the training method of text identification model provided in an embodiment of the present invention;
Fig. 8 is the structural schematic diagram that a kind of second feature provided in an embodiment of the present invention extracts network;
Fig. 9 is the flow chart that another content of text provided in an embodiment of the present invention determines method;
Figure 10 is the flow chart that another content of text provided in an embodiment of the present invention determines method;
Figure 11 is a kind of structural schematic diagram of text detection model training apparatus provided in an embodiment of the present invention;
Figure 12 is a kind of structural schematic diagram of text filed determining device provided in an embodiment of the present invention;
Figure 13 is a kind of structural schematic diagram of content of text determining device provided in an embodiment of the present invention;
Figure 14 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
In traditional text recognition technique, the text there may be text is detected from picture by the rule of artificial settings
One's respective area, then to the text filed carry out character cutting detected, the corresponding image block of each character is obtained, by training in advance
Classifier each image block is identified, and then obtain final text identification result.In which, due to being manually set
Regular limited amount, cause to detect it is text filed be mostly regular shape region, be of limited application, it is difficult to be suitable for
Text detection identification under complex scene, such as a variety of font sizes, multiple fonts, various shapes, a variety of directions, the changeable field of background
Scape, and which is the identification to single character, does not consider the relevance between character, leads to the detection under complex scene
Recognition effect is poor.
Furthermore it is also possible to realize text identification by way of deep learning;It is instructed firstly the need of by Recognition with Recurrent Neural Network
Practice identification model;Picture to be detected is transformed to a variety of scales again, be input in identification model one by one detect it is text filed simultaneously
Identify text;In which, artificial changing image scale is needed, the image of a variety of scales is separately input into identification model,
So that identification model identifies different size of text, operate relatively complicated, it is difficult to meet the needs of identifying in real time, in addition, due to
Recognition with Recurrent Neural Network needs to defer to time series and carries out recursive operation, it is difficult to which parallel processing, arithmetic speed are slower.Also, the knowledge
Other model is text filed usually using the detection of hough transform frame, thus is only capable of detecting and identifying the text of horizontal direction, for appointing
The text identification effect for angle of anticipating is poor, leads to the text detection for being difficult to be suitable under complex scene identification.
To sum up, the effect under complex scene of text detection identification method in the related technology is poor;Based on this, the present invention is real
Apply that example provides a kind of text detection model training method, text filed, content determines method and apparatus;The technology can answer extensively
For the text detection and text identification under various scenes, it is particularly possible to be applied to network direct broadcasting, cable television live streaming, game,
Text detection and text identification under the complex scenes such as video.
To be instructed to a kind of text detection model disclosed in the embodiment of the present invention first convenient for understanding the present embodiment
Practice method to describe in detail, text detection model can be used for text detection, and this document detects it is to be understood that from image
In orient include text image-region.As shown in Figure 1, this method comprises the following steps:
Step S102 is gathered based on preset training and determines target training image.
It is can wrap in the training set containing multiple images, in order to improve the widespread popularity of detection model, training set
In image may include the image under various scenes, for example, live scene image, scene of game image, Outdoor Scene image,
Indoor scene image etc.;Image in training set also may include the line of text of a variety of font sizes, shape, font, language, so that
The detection model trained is able to detect all kinds of line of text.
It include by the text filed of the line of text that manually marks in every image, this article one's respective area can pass through rectangle etc.
Quadrilateral frame mark, can also be labeled by other polygon frames;The text filed of mark usually can completely cover
Entire line of text, and text filed can be fitted closely with line of text.Furthermore it is also possible to by multiple figures in above-mentioned training set
As being divided into training subset and test subset according to preset ratio.In the training process, can from training subset from obtain target
Training image.After the completion of training, target detection image can be obtained from test subset, for testing the performance of detection model.
Target training image is input to the first initial model by step S104;First initial model includes fisrt feature
Extract network, Fusion Features network and the first output network.
Before being input to the first initial model, target training image can be adjusted to default size, such as 512*512.
Step S106 extracts multiple initial characteristics figures that network extracts target training image by fisrt feature;It is multiple first
Scale between beginning characteristic pattern is different.
Wherein, fisrt feature is extracted network and can be realized by multilayer convolutional layer, in general, multilayer convolutional layer is sequentially connected,
Every layer of convolutional layer is by being arranged different convolution kernels, to extract the characteristic pattern of different scale.Target training image it is multiple initial
In characteristic pattern, each initial characteristics figure can carry out convolutional calculation by corresponding convolutional layer and obtain.By taking four layers of convolutional layer as an example, often
Layer convolutional layer can export an initial characteristics figure;Different size of convolution kernel can be set in every layer of convolutional layer, so that every layer of volume
The scale of the initial characteristics figure of lamination output is different.In actual implementation, the convolutional layer of input target training image can be set
The scale of the initial characteristics figure of output is maximum, and the scale of the initial characteristics figure of subsequent every layer of convolutional layer output is gradually reduced.
Step S108 carries out fusion treatment to multiple initial characteristics figures by the Fusion Features network, is melted
Close characteristic pattern.
In general, lesser convolution kernel can use the convolutional network of lesser convolution kernel with the high-frequency characteristic in sensed image
The line of text feature of small scale is carried in the initial characteristics figure of output;Biggish convolution kernel can be special with the low frequency in sensed image
It levies, the line of text feature of large scale is carried in the initial characteristics figure exported using the convolutional layer of biggish convolutional network;It is based on
This, carries the line of text feature of various scales in the initial characteristics figure of multiple and different scales, carry out to multiple initial characteristics figures
Also the line of text feature of various scales is carried in the fusion feature figure obtained after fusion treatment.By this way, detection model
The line of text that can detecte various scales, without artificially carrying out image scale transform before testing.
It in actual implementation,, can will smaller ruler before being merged since the scale of multiple initial characteristics figures is different
The initial characteristics figure of degree carries out interpolation arithmetic and is allowed to the initial spy with large scale to extend the initial characteristics figure of smaller scale
Sign figure matches.In fusion process, between different initial characteristics figures, the characteristic point of same position can be multiplied or be added fortune
It calculates, to obtain final fusion feature figure.
Fusion feature figure is input to the first output network by step S110, is exported text filed in target training image
The probability value of candidate region and each candidate region.
The first output network is used to extract the feature of needs from fusion feature figure, obtains output result;If detection
The output result of model is unique as a result, then the first output network generally comprises one group of network;If detection model is defeated
Out result be it is a variety of as a result, then this first output network generally comprise multiple groups network, be set side by side between multiple groups network, every group of network
Correspondence exports a kind of result.It can be made of convolutional layer or full articulamentum in the first output network.In above-mentioned steps, first is defeated
Out network need to export candidate region and candidate region two kinds of probability value as a result, thus may include in the first output network
Two groups of networks, every group of network can be convolutional network or fully-connected network.
Step S112 determines the probability of above-mentioned candidate region and each candidate region by preset Detectability loss function
The first-loss value of value;The first initial model is trained according to the first-loss value, until the ginseng in the first initial model
Number convergence, obtains text detection model.
Normative text region is labeled in target training image in advance, the text filed position based on mark can give birth to
At text filed coordinates matrix and text filed probability matrix;It wherein, include mark in text filed coordinates matrix
Quasi- text filed apex coordinate;Text filed probability matrix includes text filed probability value, and the probability value is usual
It is 1.
Detectability loss function can compare the area of the coordinates matrix of candidate region and the coordinates matrix in normative text region
Not and the difference of the probability value in the probability value of candidate region and normative text region, usually difference is bigger, above-mentioned first damage
Mistake value is bigger.Based on the parameter of various pieces in adjustable above-mentioned first initial model of the first-loss value, to reach trained
Purpose.When parameters are restrained in model, training terminates, and obtains detection model.
The scale of text detection model training method provided in an embodiment of the present invention, first extraction target training image is mutual
Different multiple initial characteristics figures;Fusion treatment is carried out to multiple initial characteristics figures again, obtains fusion feature figure;And then it will fusion
Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image
Probability value;After determining first-loss value by preset Detectability loss function, according to the first-loss value to the first introductory die
Type is trained, and obtains detection model.In which, feature extraction network can automatically extract the feature of different scale, thus
Text detection model, it is only necessary to which inputting an image can be obtained the text filed candidate regions of various scales in the image
Domain no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of directions
Under scene, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to the standard of follow-up text identification
True property, improves the effect of text identification.
The embodiment of the present invention also provides another text detection model training method, this method side described in above-described embodiment
It is realized on the basis of method;This method emphasis describes the specific implementation process of each step in above-mentioned training method;This method includes
Following steps:
Step 202, gathered based on preset training and determine target training image.
Step 204, target training image is input to the first initial model;First initial model includes that fisrt feature mentions
Take network, Fusion Features network and the first output network.
Step 206, multiple initial characteristics figures that network extracts target training image are extracted by fisrt feature;It is multiple initial
Scale between characteristic pattern is different.
In actual implementation, in order to improve the performance that fisrt feature extracts network, which extracts network and can wrap
Include sequentially connected the first convolutional network of multiple groups;Every group of first convolutional network includes sequentially connected convolutional layer, batch normalization layer
With activation primitive layer.Fig. 2 shows the structural schematic diagrams that a kind of fisrt feature extracts network;With four group of first convolution net in Fig. 2
It is illustrated for network, the activation primitive layer of convolutional layer connection the first convolutional network of previous group of the first convolutional network of later group.
In addition, fisrt feature extracts the first convolutional network that can also include more multiple groups in network or less organize.
The characteristic pattern that batch normalization layer in first convolutional network is used to export convolutional layer is normalized, the mistake
Journey can accelerate fisrt feature and extract the convergence rate of network and detection model, and can alleviate in multilayer convolutional network
The problem of gradient disperse, so that fisrt feature extraction network is more stable.Activation primitive layer in first convolutional network can be right
Characteristic pattern after normalized carries out functional transformation, which breaks the linear combination of convolutional layer input, can be improved
The feature representation ability of first convolutional network.The activation primitive layer is specifically as follows Sigmoid function, tanh function, Relu letter
Number etc..
Step 208, fusion treatment is carried out to multiple initial characteristics figures by features described above converged network, is merged
Characteristic pattern.
Following step 02-08 provides a kind of concrete implementation mode of step 208, in which, is with pyramid feature
Example is illustrated, i.e., the scale of the initial characteristics figure of each convolutional layer output is sequentially reduced:
Step 02, according to the scale of initial characteristics figure, multiple initial characteristics figures are arranged successively;Wherein, top grade
The scale of initial characteristics figure is minimum;The scale of the initial characteristics figure of bottom grade is maximum;
Step 04, the initial characteristics figure of top grade is determined as to the fusion feature figure of top grade;
Step 06, in addition to top grade, by melting for a upper level for the initial characteristics figure of current level and current level
It closes characteristic pattern to be merged, obtains the fusion feature figure of current level;
Since the scale of the fusion feature figure of a upper level for current level is less than the initial characteristics figure of current level, the two
It, can be by interpolation arithmetic, extremely by the scale expansion of the fusion feature figure of a upper level for current level before being merged
Fusion treatment identical, and then being added or be multiplied point by point again point by point as the scale of initial characteristics figure of current level, obtains
The fusion feature figure of current level.
Step 08, the fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.
Fig. 3 shows a kind of schematic diagram that multiple initial characteristics figures are carried out with fusion treatment;Target training image is through first
Feature extraction network obtains four layers of initial characteristics figure after carrying out process of convolution;The initial characteristics figure of top grade is as top grade
Fusion feature figure;The fusion feature figure of top grade is merged with the initial characteristics figure of the second level, obtains the second level
Fusion feature figure;The fusion feature figure of second level is merged with the initial characteristics figure of third level, obtains third level
Fusion feature figure;The fusion feature figure of third level is merged with the initial characteristics figure of the 4th level, obtains the 4th level
Fusion feature figure;The fusion feature figure of the fusion feature figure of 4th level, that is, final.
Step 210, fusion feature figure is input to the first output network, exports time text filed in target training image
The probability value of favored area and each candidate region.
By taking convolutional network as an example, above-mentioned first output network includes the first convolutional layer and the second convolutional layer;Wherein, the first volume
Lamination and the second convolutional layer are set side by side, the first convolutional layer and the second convolutional layer be respectively used to output favored area apex coordinate and
The probability value of candidate region, above-mentioned steps 210 can also be realized by following step 12-16:
Step 12, fusion feature figure is separately input into the first convolutional layer and the second convolutional layer;
Step 14, the first convolution algorithm, output coordinate matrix are carried out to fusion feature figure by the first convolutional layer;The coordinate
Matrix includes the apex coordinate of candidate region text filed in target training image;
For example, the coordinates matrix can be expressed as n*H*W, wherein H and W is respectively the height and width of coordinates matrix, and n is
The dimension of coordinates matrix;For example, one candidate region needs true by four apex coordinates when candidate region is quadrangle
It is fixed, thus n is 8;When candidate region is other polygons, then the numerical value of n is usually twice of candidate region number of edges.
Step 16, the second convolution algorithm, output probability matrix are carried out to fusion feature figure by the second convolutional layer;The probability
Matrix includes the probability value of each candidate region.
The probability value of each candidate region is referred to as the score of each candidate region, and probability value can be used for characterizing time
Favored area can complete packet contain the probability of line of text.
Step 212, the probability of above-mentioned candidate region and each candidate region is determined by preset Detectability loss function
The first-loss value of value;The first initial model is trained according to the first-loss value, until the ginseng in the first initial model
Number convergence, obtains text detection model.
In actual implementation, above-mentioned Detectability loss function includes first function and second function, is respectively used to calculate candidate
The penalty values of the apex coordinate in region and the probability value of each candidate region;Wherein, first function L1=| G*-G|;Wherein,
G*For coordinates matrix text filed in the target training image that marks in advance;G is the target training of the first output network output
The coordinates matrix of text filed candidate region in image;Second function is L2=-Y*logY-(1-Y*)log(1-Y);Wherein,
Y*For probability matrix text filed in the target training image that marks in advance;Y is the target training of the first output network output
The probability matrix of text filed candidate region in image;Log indicates logarithm operation.The apex coordinate of above-mentioned candidate region and
The first-loss value of the probability value of each candidate region is the sum of above-mentioned first function and second function, i.e. L=L1+L2。
Based on the above-mentioned description to first-loss value, in above-mentioned steps, according to the first-loss value to the first initial model
The process being trained can also be realized by following step 22-28:
Step 22, the parameter in the first initial model is updated according to first-loss value;
In actual implementation, Function Mapping relationship can be preset, initial parameter and first-loss value are input to this
In Function Mapping relationship, the parameter of update can be calculated.The Function Mapping relationship of different parameters can be identical, can also not
Together.
Specifically, can determine parameter to be updated first, in accordance with preset rules;The parameter to be updated can be at the beginning of first
All parameters in beginning model, the partial parameters that can also be determined from the first initial model at random;First-loss value is calculated again
To the derivative of parameter to be updated in the first initial modelWherein, L is first-loss value;W is parameter to be updated;This is to be updated
Parameter is referred to as the weight of each neuron.The process is referred to as back-propagation algorithm;If first-loss value is larger,
The output and desired output result for then illustrating the first current initial model are not inconsistent, then find out above-mentioned first-loss value at the beginning of first
The derivative of parameter to be updated, the derivative can be used as the foundation for adjusting parameter to be updated in beginning model.
After obtaining the derivative of each parameter to be updated, then each parameter to be updated is updated, obtains updated ginseng to be updated
NumberWherein, α is predetermined coefficient.The process is referred to as stochastic gradient descent algorithm;It is each to more
For the derivative of new parameter it can be appreciated that relative to parameter current, first-loss value declines most fast direction, passes through direction tune
Whole parameter can be such that first-loss value quickly reduces, and restrain the parameter.In addition, when the first initial model is after primary training,
Obtain a first-loss value, at this time can from randomly choosed in parameters in the first initial model one or more parameters into
The above-mentioned renewal process of row, the model training time of which is shorter, and algorithm is very fast;It can certainly be in the first initial model
All parameters carry out above-mentioned renewal process, and the model training of which is more accurate.
Step 24, judge whether updated parameters restrain;If updated parameters are restrained, execute
Step 26;If updated parameters are not restrained, step 28 is executed;
Step 26, updated first initial model of parameter is determined as detection model;Terminate.
Step 28, it continues to execute and the step of determining target training image is gathered based on preset training, until updated
Parameters are restrained.
Specifically, can from training set in reacquire new image as target training image, can continue to by
Current target training image is trained as target training image.
In aforesaid way, feature extraction network can automatically extract the characteristic pattern of different scale, and then again by different scale
Characteristic pattern carry out fusion treatment, the text filed candidate regions of various scales in image are obtained based on obtained fusion feature figure
Domain.The detection model, it is only necessary to inputting an image can be obtained the text filed candidate region of various scales in the image,
Artificial changing image scale is no longer needed to, it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes
Under, it quickly can all-sidedly and accurately detect each class text in image, and then be also beneficial to the accuracy of follow-up text identification,
Improve the effect of text identification.
The text detection model training method provided based on the above embodiment, the embodiment of the present invention also provide a kind of text area
Domain determines method, and text detection model training method of this method described in above-described embodiment on the basis of is realized;Such as Fig. 4 institute
Show, this method comprises the following steps:
Step S402 obtains image to be detected;The image to be detected can be picture, be also possible to from video file or straight
Broadcast the video frame etc. intercepted in video.
Image to be detected is input to the text detection model that training is completed in advance, exports image to be detected by step S404
In text filed multiple candidate regions and each candidate region probability value;Text detection model passes through above-mentioned text
The training method training of detection model obtains;
Step S406, according to the overlapping degree between the probability value of candidate region and multiple candidate regions, from multiple times
It is determined in favored area text filed in image to be detected.
In the candidate region of above-mentioned text detection model output, there may be multiple candidate regions to correspond to the same text
Row;In order to found out from multiple candidate regions with the most matched region of line of text, need to screen multiple candidate regions.Greatly
In more situations, the overlapped higher multiple candidate regions of degree usually correspond to the same line of text, and then further according to phase mutual respect
The probability value of the folded higher multiple candidate regions of degree, can therefrom determine that this article current row is corresponding text filed;For example, by phase
In the mutual higher multiple candidate regions of overlapping degree, the maximum candidate region of probability value is determined as text filed.If in image
There are multiple line of text, then usually finally determine multiple text filed.
Above-mentioned text filed determining method provided in an embodiment of the present invention, the image to be detected that will acquire are input to text
Detection model exports the probability value of multiple candidate regions and each candidate region text filed in image to be detected;In turn
According to the overlapping degree between the probability value of candidate region and multiple candidate regions, determination is to be detected from multiple candidate regions
It is text filed in image.In which, text detection model can automatically extract the feature of different scale, thus only need defeated
Entering an image to the model can be obtained the text filed candidate region of various scales in the image, no longer need to manually convert
Graphical rule, it is convenient to operate, can be quickly complete especially under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes
Face accurately detects each class text in image, and then is also beneficial to the accuracy of follow-up text identification, improves text knowledge
Other effect.
The embodiment of the present invention also provides another text filed determining method, and this method is in above-described embodiment the method
On the basis of realize;The description of this method emphasis is general according to the apex coordinate and candidate region for detecting the candidate region that network exports
Rate value determines the text filed detailed process in image to be detected;As shown in figure 5, this method comprises the following steps:
Step S502 obtains image to be detected.
Image to be detected is input to the text detection model that training is completed in advance, exports image to be detected by step S504
In text filed multiple candidate regions and each candidate region probability value;
Step S506, by multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained
Final multiple candidate regions.
Step S506 is optional step, i.e. in following step S508, each candidate regions that detection model can be exported
Domain is arranged, and can also first will test the candidate regions that probability value in the candidate region of model output is lower than preset probability threshold value
Domain is rejected, then is arranged remaining candidate region.Above-mentioned preset probability threshold value can be preset, such as 0.2,0.1;
The candidate region for being lower than preset probability threshold value by rejecting probability value, advantageously reduces the text in subsequent determining image to be detected
The operand of one's respective area improves arithmetic speed.
Multiple candidate regions are arranged successively by step S508 according to the probability value of candidate region;Wherein, first candidate
The probability value in region is maximum, and the probability value of the last one candidate region is minimum;
Step S510 calculates current candidate region one by one and works as with removing using first candidate region as current candidate region
The overlapping degree of candidate region other than preceding candidate region;
Candidate region in addition to current candidate region can also be referred to as other candidate regions, calculate current candidate area
When the overlapping degree of domain and other each candidate regions, the friendship of two candidate regions and ratio, the friendship and ratio etc. can be specifically calculated
In the area size of two candidate region intersections and the area size of two candidate region unions.It is appreciated that friendship and ratio is bigger,
The overlapping degree of two candidate regions is bigger.It is larger with the current candidate region overlapping degree for current candidate region
Other candidate regions usually characterize the same line of text with the current candidate region, and due to the probability value of other candidate regions
Less than current candidate region, therefore other candidate regions can be rejected, to pass through this article current row of current candidate area attribute.
Step S512, by the candidate region in addition to current candidate region, overlapping degree is greater than preset anti-eclipse threshold
Candidate region reject;The anti-eclipse threshold can be preset, such as 0.5,0.6.
Step S514 is continued to execute using next candidate region in current candidate region as new current candidate region
The step of calculating the overlapping degree in current candidate region and the candidate region in addition to current candidate region one by one, until reaching most
The latter candidate region.
Include cyclic process in above-mentioned steps S510-S514, can all reject segment candidate region in every wheel circulation, when time
It goes through to the last one candidate region, circulation terminates, and final remaining candidate region is determined as the text area in image to be detected
Domain.If final remaining candidate region be it is multiple, can determine in image to be detected it is text filed be multiple.
Remaining candidate region after rejecting is determined as text filed in image to be detected by step S516.
In aforesaid way, pass through the probability of the available multiple candidate regions of text detection model and each candidate region
Value, and then determination is text filed from multiple candidate regions by way of non-maximum restraining again.In which, text detection mould
Type can automatically extract the feature of different scale, thus only need to input an image to the model and can be obtained in the image respectively
The text filed candidate region of kind of scale no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, more
Under kind font, various shapes, a variety of direction scenes, each class text in image quickly can be all-sidedly and accurately detected, in turn
It is also beneficial to the accuracy of follow-up text identification, improves the effect of text identification.
The text filed determining method provided based on the above embodiment, it is true that the embodiment of the present invention also provides a kind of content of text
Determine method, text filed determining method of this method described in above-described embodiment on the basis of is realized;As shown in fig. 6, this method
Include the following steps:
Step S602 is obtained text filed in image by above-mentioned text filed determining method;
Step S604 is input to the text identification model that training is completed in advance for text filed, exports text filed knowledge
Other result;
Step S606 determines the content of text in text filed according to recognition result.
Above-mentioned text identification model can be trained in several ways and be obtained, such as Recognition with Recurrent Neural Network, convolutional neural networks,
Text filed recognition result can certainly be obtained by way of optical character identification.Text identification model can be exported
Recognition result be determined as the content of text in text filed, can also first to text identification model output recognition result carry out
Optimization processing, such as delete repeat character (RPT) and null character, null character, and then will treated that recognition result is determined as is text filed
In content of text.
Content of text provided in an embodiment of the present invention determines method, obtains figure by above-mentioned text filed determining method first
It is text filed as in;This article one's respective area is input to the text identification model that training is completed in advance again, is exported text filed
Recognition result;The text information in text filed is finally determined according to the recognition result.In which, due to above-mentioned text filed
The method of determination can get the text filed of various scales by text detection model, in a variety of font sizes, multiple fonts, a variety of
Under shape, a variety of direction scenes, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to text
The accuracy of identification improves the effect of text identification.
The embodiment of the present invention also provides another content of text and determines method, and this method is in above-described embodiment the method
On the basis of realize;This method emphasis describes the training method of text identification model;Text identification model can be used for text knowledge
, text identification is not it is to be understood that detect text in picture region, thus orient include text picture region
Domain, and then identify in the picture region particular content of text.As shown in fig. 7, the detection model is instructed by following manner
Practice and complete:
Step S702 is gathered based on preset training and determines target training text image;
The target training text image can be individual image, or mark image-region on the image.It should
It can wrap the figure containing multiple images, in order to improve the widespread popularity of text identification model, in training set in training set
As may include the image under various scenes, for example, live scene image, scene of game image, Outdoor Scene image, indoor field
Scape image etc.;Image in training set also may include the line of text of a variety of font sizes, shape, font, language, so as to train
Text identification model be able to detect all kinds of line of text.Every target training text image is corresponding with the line of text by manually marking
Content of text, such as " hello " " excellent ".The content of text of the corresponding mark of every target training text image.
After the completion of mark, can also by the content of text of the corresponding all line of text of all images in training set,
Establish character repertoire;Specifically, getting the content of text of the corresponding all line of text of all images in training set, Cong Zhongti
Different characters is taken, character different from each other is formed into character repertoire.Furthermore it is also possible to by multiple images in above-mentioned training set
Training subset and test subset are divided into according to preset ratio.In the training process, it can be instructed from training subset from target is obtained
Practice image.After the completion of training, target detection image can be obtained from test subset, the property for test text identification model
Energy.
Target training text image is input to the second initial model by step S704;Second initial model includes second special
Sign extracts network, feature splits network, second exports network and classification function;
Step S706 extracts the characteristic pattern that network extracts target training text image by second feature;
The second feature is extracted network and can be realized by multilayer convolutional layer, in general, multilayer convolutional layer is sequentially connected, every time
Convolutional layer carries out convolutional calculation, the data of the last layer convolutional layer output by the way that corresponding convolution kernel is arranged, to the data of input
It can be used as the characteristic pattern of target training text image.
Step S708 splits network by feature and characteristic pattern is split at least one subcharacter figure;
Based on the purpose of identification content of text, text identification model needs to split the corresponding characteristic pattern of line of text, makes every
It include one or a small amount of text or symbol in a sub- characteristic pattern, convenient for the identification of content of text.It, can be in split process
The scale for presetting subcharacter figure, the scale based on the subcharacter figure split characteristic pattern;Subcharacter figure can also be preset
Quantity, quantity based on the subcharacter figure splits characteristic pattern.Certainly, if line of text is natively very short, such as only one word
Symbol, then characteristic pattern may also only split out a sub- characteristic pattern.
Above-mentioned subcharacter figure is separately input into the second output network, it is corresponding to export each subcharacter figure by step S710
Output matrix;
The second output network for calculating sub- characteristic pattern again;The corresponding output of each subcharacter figure of output
In matrix, each position is corresponding with a preset character;Numerical value in this position can characterize the subcharacter figure and the position
The matching degree of corresponding character.The second output network can be convolutional network or fully-connected network.
The corresponding output matrix of each subcharacter figure is separately input into classification function by step S712, and it is special to export every height
Sign schemes corresponding probability matrix;
Each numerical value in output matrix can be mapped as probability value by the classification function, to obtain probability matrix.It should
The probability value on each position in probability matrix can be used for characterizing subcharacter figure character corresponding with the position and match
Probability.
Step S714 determines the second penalty values of probability matrix by preset identification loss function;According to second damage
Mistake value is trained above-mentioned second initial model, until the parameter convergence in the second initial model, obtains text identification model.
Normative text content is labeled in target training text image in advance, text content is by one or more standards
Character composition;Probability matrix can be generated based on text content;In the probability matrix, the corresponding standard character pair of subcharacter figure
The probability value for the position answered is 1, and the probability value of other positions is 0.It is general to identify that loss function can be exported with match stop function
The difference of rate matrix and the probability matrix of normative text content, usually difference are bigger, and above-mentioned second penalty values are bigger.Based on this
The parameter of various pieces, trained to achieve the purpose that in adjustable above-mentioned second initial model of second penalty values.When in model
When parameters are restrained, training terminates, and obtains text identification model.
In the training method of above-mentioned text identification model, the characteristic pattern of target training text image is extracted first;Again should
Characteristic pattern splits at least one subcharacter figure;And then the subcharacter figure is separately input into the second output network, output is each
The corresponding output matrix of subcharacter figure;The corresponding probability matrix of each subcharacter figure is obtained by classification function again;By default
Identification loss function determine the second penalty values of probability matrix after, the second initial model is instructed according to second penalty values
Get text identification model.In which, model can carry out cutting by the characteristic pattern to image automatically, thus the text identifies
Model, it is only necessary to which input includes that content of text in the image can be obtained in the image of line of text, no longer needs to advance to text
Row cutting, directly can be obtained the content of text of line of text, and operation editor, arithmetic speed is fast, at the same the recognition accuracy of text compared with
It is high.
The embodiment of the present invention also provides the training method of another text identification model, and this method is described in above-described embodiment
It is realized on the basis of method;This method emphasis describes the specific implementation process of each step in above-mentioned training method;This method packet
Include following steps:
Step 802, gathered based on preset training and determine target training text image;
Step 804, target training text image is input to the second initial model;Second initial model includes second special
Sign extracts network, feature splits network, second exports network and classification function;
Step 806, the characteristic pattern that network extracts target training text image is extracted by second feature;
In order to improve the performance that second feature extracts network, it may include sequentially connected more which, which extracts network,
The second convolutional network of group;Every group of second convolutional network includes sequentially connected convolutional layer, pond layer and activation primitive layer.Fig. 8 shows
A kind of structural schematic diagram of second feature extraction network is gone out;It is illustrated by taking four group of second convolutional network as an example in Fig. 8, it is latter
The activation primitive layer of convolutional layer connection the second convolutional network of previous group of the second convolutional network of group.Second feature is extracted in network also
The second convolutional network that may include more multiple groups or less organize.
It is appreciated that the convolutional layer in the second convolutional network generates characteristic pattern for extracting feature;The pond layer can be
Average pond layer (Average Pooling or mean-pooling), global average pond layer (Global Average
Pooling), maximum pond layer (max-pooling) etc.;Pond layer can be used for pressing the characteristic pattern that convolutional layer exports
It contracting, the main feature in keeping characteristics figure deletes non-principal feature, to reduce the dimension of characteristic pattern, by taking average pond layer as an example,
Average pond layer can feature point value averaging in the neighborhood to the preset range size of current signature point, using average value as
The new feature point value of the current characteristic point.In addition, pond layer, which may also help in characteristic pattern, keeps some indeformable, such as rotate
Invariance, translation invariance, flexible invariance etc..Activation primitive layer can carry out function change to pond layer treated characteristic pattern
It changes, which breaks the linear combination of convolutional layer input, and the feature representation ability of the second convolutional network can be improved.This swashs
Function layer living is specifically as follows Sigmoid function, tanh function, Relu function etc..
Step 808, network is split by feature and characteristic pattern is split into at least one subcharacter figure;
In view of most text behavior is transversely arranged, in order to make in the subcharacter figure after splitting to include one or few
Characteristic pattern can be split at least one subcharacter figure along the column direction of characteristic pattern by the corresponding feature of the character of amount;The spy
The column direction of sign figure can be understood as the vertical direction of text line direction.In actual implementation, according to the width of most of character
The width of subcharacter figure is set, features described above figure is split according to the width.For example, features described above figure is H*W*C, preset son is special
The width for levying figure is k, then each subcharacter figure is H* (W/k) * C.Furthermore it is also possible to the number of default subcharacter figure, such as T,
Then each subcharacter figure is H* (W/T) * C.
Step 810, above-mentioned subcharacter figure is separately input into the second output network, it is corresponding defeated exports each subcharacter figure
Matrix out;
By taking convolutional network as an example, which includes multiple full articulamentums;Multiple full articulamentums are set side by side;It should
The quantity of full articulamentum is corresponding with the quantity of subcharacter figure, and each subcharacter figure is separately input into corresponding full articulamentum,
So that each full articulamentum exports the corresponding output matrix of subcharacter figure.
Step 812, the corresponding output matrix of each subcharacter figure is separately input into classification function, exports each subcharacter
Scheme corresponding probability matrix;
The classification function can be Softmax function;The Softmax function can be identified asIts
In, e indicates natural constant;T indicates t-th of probability matrix;K indicates that the target training text image of the training set is included
Kinds of characters number;M is indicated from 1 to K+1;∑ indicates summation operation;For i-th of element in the output matrix;
It is describedFor the probability matrix ptIn i-th of element.
Relative to the element in output matrixItself, the exponential function value of elementIt can expand between each element
Difference, for example, output matrix is [3,1, -3], after the exponential function value for calculating each element, the corresponding finger of the output matrix
Number function value matrix is [20,2.7,0.05].The probability that each element is calculated using the exponential function value of element, can increase each other
Between probability difference away from keeping the probability of correct recognition result higher, be conducive to the accuracy of recognition result.
Step 814, the second penalty values of probability matrix are determined by preset identification loss function;According to second loss
Value is trained the second initial model, until the parameter convergence in the second initial model, obtains text identification model.
The identification loss function include L=-log p (y | { pt}T=1 ... T);Wherein, y is the target instruction marked in advance
Practice the probability matrix of text image;T indicates t-th of probability matrix;ptFor each of the classification function output subcharacter
Scheme corresponding probability matrix;T is the total quantity of the probability matrix;P indicates to calculate probability;Log indicates logarithm operation.Based on this
It identifies loss function, in above-mentioned steps, according to the process that second penalty values are trained the second initial model, can also lead to
Cross following step 32-38 realization:
Step 32, the parameter in the second initial model is updated according to the second penalty values;
In actual implementation, Function Mapping relationship can be preset, initial parameter and the second penalty values are input to this
In Function Mapping relationship, the parameter of update can be calculated.The Function Mapping relationship of different parameters can be identical, can also not
Together.
Specifically, can determine parameter to be updated from the second initial model according to preset rules;The parameter to be updated can
Think all parameters in the second initial model, can also determine partial parameters from the second initial model at random;Is calculated again
Two penalty values treat the derivative of undated parameterWherein, L ' is the penalty values of probability matrix;W ' is parameter to be updated;It should be to
Undated parameter is referred to as the weight of each neuron.The process is referred to as back-propagation algorithm;If the second penalty values
It is larger, then illustrate that the output of the second current initial model is not inconsistent with desired output result, then finds out above-mentioned second penalty values pair
The derivative of parameter to be updated, the derivative can be used as the foundation for adjusting parameter to be updated in second initial model.
After obtaining the derivative of each parameter to be updated, then parameter to be updated is updated, obtains updated parameter to be updatedWherein, α ' is predetermined coefficient.The process is referred to as stochastic gradient descent algorithm;It is each to
For the derivative of undated parameter it can be appreciated that based on current parameter to be updated, first-loss value declines most fast direction, passes through
Direction adjusting parameter can be such that first-loss value quickly reduces, and restrain the parameter.In addition, when the second initial model is through one
After secondary training, obtain second penalty values, at this time can from the second initial model in parameters randomly choose one or
Multiple parameters carry out above-mentioned renewal process, and the model training time of which is shorter, and algorithm is very fast;It can certainly be to first
All parameters carry out above-mentioned renewal process in initial model, and the model training of which is more accurate.
Step 34, judge whether updated parameter restrains;If updated parameter restrains, step 36 is executed;
If updated parameter does not restrain, step 38 is executed;
Step 36, updated second initial model of parameter is determined as identification model;
Step 38, it continues to execute and the step of determining target training text image is gathered based on preset training, until updating
Parameters afterwards are restrained.
Specifically, new image can be reacquired from training set as target training text image, it can also be after
It is continuous to be trained current target training text image as target training text image.
In aforesaid way, model can carry out cutting, thus text identification model by the characteristic pattern to image automatically, only need
Inputting includes that the content of text in the image can be obtained in the image of line of text, no longer needs to carry out cutting to line of text, directly
The content of text that line of text can be obtained is connect, operation editor, arithmetic speed is fast, while the recognition accuracy of text is higher.
The content of text provided based on the above embodiment determines that method, the embodiment of the present invention also provide another content of text
Determine method, content of text of this method described in above-described embodiment determines the base of the training method of method or text identification model
It is realized on plinth;After this method emphasis describes text identification model output recognition result, obtained based on the recognition result text filed
Content of text process;As shown in figure 9, this method comprises the following steps:
Step S902 is obtained text filed in image by above-mentioned text filed determining method;
Step S904 is normalized according to pre-set dimension to text filed.
The pre-set dimension may include preset length and width, can be with if text filed be unsatisfactory for the pre-set dimension
Processing is zoomed in and out to this article one's respective area, the mode in the region that this article one's respective area can also be sheared or be plugged a gap, so that
Treated text filed meets above-mentioned pre-set dimension.
Step S906, by treated it is text filed be input in advance training complete text identification model, export text
The recognition result in region;The recognition result of this article one's respective area includes text filed corresponding multiple probability matrixs;
Text identification model needs to carry out cutting to text filed corresponding characteristic pattern, after cutting in identification process
Subcharacter figure pass through output network output output matrix accordingly respectively, and then each output square is obtained by classification function again
The corresponding probability matrix of battle array, thus text filed recognition result includes multiple probability matrixs, each probability matrix is usually corresponding
One or a small amount of character.
Step S908 determines the position of the most probable value in each probability matrix;
Step S910 is obtained most general from the corresponding relationship of position each in pre-set probability matrix and character
The corresponding character in the position of rate value;
As described in above-described embodiment, probability value in probability matrix on each position can be used for characterizing the subcharacter figure
The probability that character corresponding with the position matches.The corresponding character in the position of most probable value can be thus determined as pair
Answer the recognition result of subcharacter figure.In most cases, the corresponding character in the position of most probable value can be a word
Symbol, can also be with multiple characters.The corresponding relationship of above-mentioned each position and character, can be established by following manner: be acquired first
Character, the character may include text, punctuation mark, mathematic sign, network emoticon of multilingual etc.;It specifically can be
Character is acquired during establishing training set, can also be acquired by dictionary, character repertoire, symbolic library etc..
Step S912 arranges the character got according to putting in order for multiple probability matrixs;
Multiple probability matrixs of text identification model output, usually according to the corresponding subcharacter figure of each probability matrix in feature
Position determination in figure puts in order, thus the usual son corresponding with each probability matrix that puts in order of multiple probability matrixs is special
Putting in order for the character that sign figure includes is consistent;Based on this, according to putting in order for multiple probability matrixs, arranges and get
Character, the character after can making arrangement is consistent with the character arrangements of original line of text, thus can be according to the word after arrangement
Symbol determines the content of text in text filed.
Step S914 determines the content of text in text filed according to the character after arrangement.
In actual implementation, the character after arrangement can be determined directly to the content of text in text filed;But it considers
Character font in text is of different sizes, thus in text identification model, it, may not be fully according to one in cutting characteristic pattern
The mode of the corresponding sub- characteristic pattern of a character realizes, thus, there may be mutual duplicate character in the character after final arrangement,
In order to advanced optimize the recognition effect of text, can according to preset rules, delete arrangement after character in repeat character (RPT) and
Null character obtains the content of text in text filed.
Specifically, a folded dictionary can be pre-established, if there are repeat character (RPT), Ke Yicong in the character after arrangement
It searches whether that there are the repeat character (RPT)s in folded dictionary, if it does not exist, then deleting the repeat character (RPT), only retains in repeat character (RPT)
One;Furthermore it is also possible to which whether the Semantic judgement current context in conjunction with other characters should have repeat character (RPT).For empty word
Symbol can also judge whether to delete in conjunction with current context, if null character is located between two English words, without deleting,
It can retain.For example, the character after above-mentioned arrangement is " -- hh-e-l-ll-oo- ", wherein "-" represents null character;It deletes
After repeat character (RPT) and null character, obtained content of text is " hello ".
In aforesaid way, first to get it is text filed be normalized, then obtained by text identification model
To text filed recognition result;And then the character identified is determined by each probability matrix in recognition result, and then obtain
To text filed content of text.Since text identification model can carry out cutting, thus the party by the characteristic pattern to image automatically
In formula, it is only necessary to which input includes that the recognition result of the image can be obtained in the image of line of text, and then obtains content of text, nothing
Cutting need to be carried out to line of text again, the content of text of line of text directly can be obtained, operation editor, arithmetic speed is fast, while text
Recognition accuracy it is higher.
The content of text provided based on the above embodiment determines that method, the embodiment of the present invention also provide another content of text
Determine that method, this method are realized based on the above method;This method emphasis describes after obtaining text filed content of text,
Based on text content judge in image whether include sensitive word process.
It is often necessary to pre-establish a sensitive dictionary, determined in the corresponding content of text of image by the sensitivity dictionary
It whether include sensitive information;Include sensitive word in the sensitivity dictionary, is such as related to the sensitive word of pornographic, reaction, terrorism;
Can to the word in content of text, the sensitive dictionary is matched one by one, if successful match, illustrate current term be it is quick
Feel word.Based on this, the content of text of the present embodiment determines that method includes the following steps, as shown in Figure 10:
Step S1002 is obtained text filed in image by above-mentioned text filed determining method;
Step S1004 is normalized according to pre-set dimension to text filed.
The pre-set dimension may include preset length and width, can be with if text filed be unsatisfactory for the pre-set dimension
Processing is zoomed in and out to this article one's respective area, the mode in the region that this article one's respective area can also be sheared or be plugged a gap, so that
Treated text filed meets above-mentioned pre-set dimension.
Step S1006, by treated it is text filed be input in advance training complete text identification model, export text
The recognition result in region;The recognition result of this article one's respective area includes text filed corresponding multiple probability matrixs;
Step S1008 determines the position of the most probable value in each probability matrix;
Step S1010 is obtained most general from the corresponding relationship of position each in pre-set probability matrix and character
The corresponding character in the position of rate value;
Step S1012 arranges the character got according to putting in order for multiple probability matrixs;
Step S1014 determines the content of text in text filed according to the character after arrangement.
Step S1016, if include in image it is multiple text filed, obtain it is each it is text filed in content of text;
Step S1018 carries out participle operation to the content of text got;
Participle operation is referred to as word cutting operation;In actual implementation, can establish a dictionary, based on the dictionary into
Row participle operation;Specifically, can be since the first character in content of text, by the first character and second word
Symbol is searched from dictionary as a combination, if can not find comprising the corresponding word of the combination, first character is divided into
One individual word;If can find comprising the corresponding word of the combination, then third character is added into the combination, continue
It is searched from dictionary;Until can not find comprising the corresponding word of the combination, by the character in the combination in addition to last character
It is divided into a word, and so on, until completing the word cutting operation of content of text.
Step S1020 one by one matches the participle dictionary sensitive with what is pre-established obtained after participle operation;
Step S1022, if at least one participle successful match, determines that in the corresponding content of text of image include sensitivity
Information.
Step S1024, obtains text filed belonging to the participle of successful match, identifies the text got in the picture
The participle of region or successful match.
In actual implementation, text filed or successful match point got can be identified in a manner of marking frame
Word;If it is the real-time detection under video playing or real-time live broadcast scene, the mode mark of mosaic or blurring can be used
Text filed or successful match the participle got is known, to achieve the purpose that filter sensitive word.
In aforesaid way, after getting text filed content of text, then passes through sensitive dictionary and identified from content of text
Sensitive word, to realize the purpose of speech supervision;Which can obtain in real time content and identify sensitive word, be advantageously implemented in net
Speech supervision under the scenes such as network live streaming, net cast, and limit the purpose of sensitive word propagation.
It should be noted that the embodiments are all described in a progressive manner for above-mentioned each method, each embodiment is stressed
Be the difference from other embodiments, the same or similar parts between the embodiments can be referred to each other.
Corresponding to above method embodiment, a kind of structural representation of text detection model training apparatus shown in Figure 11
Figure, the device include:
Training image determining module 110 determines target training image for gathering based on preset training;
Training image input module 111, for target training image to be input to the first initial model;First initial model
Network, Fusion Features network and the first output network are extracted including fisrt feature;
Characteristic extracting module 112, for extracting multiple initial spies that network extracts target training image by fisrt feature
Sign figure;Scale between multiple initial characteristics figures is different;
Fusion Features module 113 is obtained for carrying out fusion treatment to multiple initial characteristics figures by Fusion Features network
Fusion feature figure;
Output module 114 exports text in target training image for fusion feature figure to be input to the first output network
The probability value of the candidate region in region and each candidate region;
Penalty values are determining and training module 115, for determining candidate region and often by preset Detectability loss function
The first-loss value of the probability value of a candidate region;The first initial model is trained according to first-loss value, until first
Parameter convergence in initial model, obtains text detection model.
The scale of text detection model training apparatus provided in an embodiment of the present invention, first extraction target training image is mutual
Different multiple initial characteristics figures;Fusion treatment is carried out to multiple initial characteristics figures again, obtains fusion feature figure;And then it will fusion
Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image
Probability value;After determining first-loss value by preset Detectability loss function, according to the first-loss value to the first introductory die
Type is trained, and obtains detection model.In which, feature extraction network can automatically extract the feature of different scale, thus
Text detection model, it is only necessary to which inputting an image can be obtained the text filed candidate regions of various scales in the image
Domain no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of directions
Under scene, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to the standard of follow-up text identification
True property, improves the effect of text identification.
In some embodiments, it includes sequentially connected the first convolutional network of multiple groups that above-mentioned fisrt feature, which extracts network,;Often
The first convolutional network of group includes sequentially connected convolutional layer, batch normalization layer and activation primitive layer.
In some embodiments, features described above Fusion Module is also used to:, will be multiple initial according to the scale of initial characteristics figure
Characteristic pattern is arranged successively;Wherein, the scale of the initial characteristics figure of top grade is minimum;The scale of the initial characteristics figure of bottom grade
It is maximum;The initial characteristics figure of top grade is determined as to the fusion feature figure of top grade;In addition to top grade, by current layer
The fusion feature figure of a upper level for the initial characteristics figure and current level of grade is merged, and the fusion feature of current level is obtained
Figure;The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.
In some embodiments, above-mentioned first output network includes the first convolutional layer and the second convolutional layer;Above-mentioned output mould
Block is also used to: fusion feature figure is separately input into the first convolutional layer and the second convolutional layer;It is special to fusion by the first convolutional layer
Sign figure carries out the first convolution algorithm, output coordinate matrix;Coordinates matrix includes candidate regions text filed in target training image
The apex coordinate in domain;The second convolution algorithm, output probability matrix are carried out to fusion feature figure by the second convolutional layer;Probability matrix
Probability value including each candidate region.
In some embodiments, above-mentioned Detectability loss function includes first function and second function;First function is L1=|
G*-G|;Wherein, G*For coordinates matrix text filed in the target training image that marks in advance;G is the first output network output
Target training image in text filed candidate region coordinates matrix;Second function is L2=-Y*logY-(1-Y*)log
(1-Y);Wherein, Y*For probability matrix text filed in the target training image that marks in advance;Y is the first output network output
Target training image in text filed candidate region probability matrix;The probability value of candidate region and each candidate region
First-loss value L=L1+L2。
In some embodiments, above-mentioned penalty values are determining and training module is also used to: updating first according to first-loss value
Parameter in initial model;Judge whether updated parameter restrains;If updated parameter restrains, parameter is updated
The first initial model afterwards is determined as detection model;If updated parameter does not restrain, continue to execute based on preset
The step of determining target training image, is gathered in training, until updated parameter restrains.
In some embodiments, above-mentioned penalty values are determining and training module is also used to: initial from first according to preset rules
Model determines parameter to be updated;First-loss value is calculated to the derivative of parameter to be updated in the first initial modelWherein, L is
First-loss value;W is parameter to be updated;Parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α is predetermined coefficient.
A kind of structural schematic diagram of text filed determining device shown in Figure 12;The device includes:
Image collection module 120, for obtaining image to be detected;
Detection module 122 exports to be checked for image to be detected to be input to the text detection model that training is completed in advance
The probability value of text filed multiple candidate regions and each candidate region in altimetric image;Text detection model passes through above-mentioned
The training method training of text detection model obtains;
Text filed determining module 124, for the weight between the probability value and multiple candidate regions according to candidate region
Folded degree, it is text filed in image to be detected from being determined in multiple candidate regions.
Above-mentioned text filed determining device provided in an embodiment of the present invention, the image to be detected that will acquire are input to text
Detection model exports the probability value of multiple candidate regions and each candidate region text filed in image to be detected;In turn
According to the overlapping degree between the probability value of candidate region and multiple candidate regions, determination is to be detected from multiple candidate regions
It is text filed in image.In which, text detection model can automatically extract the feature of different scale, thus only need defeated
Entering an image to the model can be obtained the text filed candidate region of various scales in the image, no longer need to manually convert
Graphical rule, it is convenient to operate, can be quickly complete especially under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes
Face accurately detects each class text in image, and then is also beneficial to the accuracy of follow-up text identification, improves text knowledge
Other effect.
In some embodiments, above-mentioned text filed determining module is also used to:, will be multiple according to the probability value of candidate region
Candidate region is arranged successively;Wherein, the probability value of first candidate region is maximum, and the probability value of the last one candidate region is most
It is small;Using first candidate region as current candidate region, current candidate region is calculated one by one and in addition to current candidate region
Candidate region overlapping degree;By in the candidate region in addition to current candidate region, overlapping degree is greater than preset overlapping
It rejects the candidate region of threshold value;Using next candidate region in current candidate region as new current candidate region, continue to hold
Row calculates the step of overlapping degree in current candidate region and the candidate region in addition to current candidate region one by one, until reaching
The last one candidate region;Remaining candidate region after rejecting is determined as text filed in image to be detected.
In some embodiments, above-mentioned apparatus further include: module is rejected in region, for by multiple candidate regions, probability
Value is rejected lower than the candidate region of preset probability threshold value, obtains final multiple candidate regions.
A kind of structural schematic diagram of content of text determining device shown in Figure 13;The device includes:
Region obtains module 130, for the text filed determining method by any one of claim 8-10, obtains image
In it is text filed;
Identification module 132 exports text area for being input to the text identification model that training is completed in advance for text filed
The recognition result in domain;
Content of text determining module 134, for determining the content of text in text filed according to recognition result.
Content of text determining device provided in an embodiment of the present invention obtains figure by above-mentioned text filed determining method first
It is text filed as in;This article one's respective area is input to the text identification model that training is completed in advance again, is exported text filed
Recognition result;The text information in text filed is finally determined according to the recognition result.In which, due to above-mentioned text filed
The method of determination can get the text filed of various scales by text detection model, in a variety of font sizes, multiple fonts, a variety of
Under shape, a variety of direction scenes, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to text
The accuracy of identification improves the effect of text identification.
In some embodiments, above-mentioned apparatus further include: normalization module is used for according to pre-set dimension, to text filed
It is normalized.
In some embodiments, above-mentioned apparatus further includes text identification model training module, for making text identification model
It is completed by following manner training: being gathered based on preset training and determine target training text image;By target training text figure
As being input to the second initial model;Second initial model includes that second feature extracts network, the second output network and classification function;
The characteristic pattern that network extracts target training text image is extracted by second feature;Characteristic pattern is split by the second initial model
At at least one subcharacter figure;Subcharacter figure is separately input into the second output network, it is corresponding defeated to export each subcharacter figure
Matrix out;The corresponding output matrix of each subcharacter figure is separately input into classification function, it is corresponding to export each subcharacter figure
Probability matrix;The second penalty values of probability matrix are determined by preset identification loss function;According to the second penalty values to second
Initial model is trained, until the parameter convergence in the second initial model, obtains text identification model.
In some embodiments, it includes sequentially connected the second convolutional network of multiple groups that above-mentioned second feature, which extracts network,;Often
The second convolutional network of group includes sequentially connected convolutional layer, pond layer and activation primitive layer.
In some embodiments, above-mentioned text identification model training module is also used to:, will be special along the column direction of characteristic pattern
Sign figure splits at least one subcharacter figure;The column direction of characteristic pattern is the vertical direction of text line direction.
In some embodiments, above-mentioned second output network includes multiple full articulamentums;The quantity and Zi Te of full articulamentum
The quantity for levying figure is corresponding;Identification model training module is also used to: each subcharacter figure is separately input into corresponding full articulamentum
In, so that the corresponding output matrix of each full articulamentum output subcharacter figure.
In some embodiments, above-mentioned classification function includes Softmax function;Softmax function isWherein, e indicates natural constant;T indicates t-th of probability matrix;K indicates the target instruction of the training set
Practice the number for the kinds of characters that text image is included;M is indicated from 1 to K+1;∑ indicates summation operation;For the output square
I-th of element in battle array;It is describedFor the probability matrix ptIn i-th of element.
In some embodiments, above-mentioned identification loss function include L=-log p (y | { pt}T=1 ... T);Wherein, y is preparatory
The probability matrix of the target training text image of mark;T indicates t-th of probability matrix;ptFor classification function output
Each of the corresponding probability matrix of the subcharacter figure;T is the total quantity of the probability matrix;P indicates to calculate probability;Log table
Show logarithm operation.
In some embodiments, above-mentioned identification model training module is also used to: it is initial to update second according to the second penalty values
Parameter in model;Judge whether updated parameters restrain;If updated parameters are restrained, by parameter
Updated second initial model is determined as text identification model;If updated parameters are not restrained, continue to hold
Row gathers the step of determining target training text image based on preset training, until updated parameters are restrained.
In some embodiments, above-mentioned identification model training module is also used to: according to preset rules, from the second initial model
Determine parameter to be updated;Calculate the derivative that the second penalty values treat undated parameterWherein, L ' is the loss of probability matrix
Value;W ' is parameter to be updated;Parameter to be updated is updated, updated parameter to be updated is obtainedIts
In, α ' is predetermined coefficient.
In some embodiments, above-mentioned text filed recognition result includes text filed corresponding multiple probability matrixs;
Content of text determining module is also used to: determining the position of the most probable value in each probability matrix;From pre-set probability
In matrix in the corresponding relationship of each position and character, the corresponding character in position of most probable value is obtained;According to multiple probability
Matrix puts in order, and arranges the character got;The content of text in text filed is determined according to the character after arrangement.
In some embodiments, above-mentioned content of text determining module is also used to: the word according to preset rules, after deleting arrangement
Repeat character (RPT) and null character in symbol, obtain the content of text in text filed.
In some embodiments, above-mentioned apparatus further include: data obtaining module, if for including multiple texts in image
One's respective area, obtain it is each it is text filed in content of text;Sensitive information determining module, for the sensitive word by pre-establishing
Whether it includes sensitive information that library determines in the corresponding content of text of image.
In some embodiments, above-mentioned sensitive information determining module is also used to: being segmented to the content of text got
Operation;The participle dictionary sensitive with what is pre-established obtained after participle operation is matched one by one;If at least one is segmented
Successful match determines that in the corresponding content of text of image include sensitive information.
In some embodiments, above-mentioned apparatus further include: area identification module, for obtaining belonging to the participle of successful match
It is text filed, identify in the picture get it is text filed.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation
Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
The embodiment of the invention also provides a kind of electronic equipment, shown in Figure 14, which includes memory 100
With processor 101, wherein memory 100 is for storing one or more computer instruction, one or more computer instruction quilt
Processor 101 executes, to realize that above-mentioned text detection model training method, text filed determining method or content of text are true
The step of determining method.
Further, electronic equipment shown in Figure 14 further includes bus 102 and communication interface 103, processor 101, communication
Interface 103 and memory 100 are connected by bus 102.
Wherein, memory 100 may include high-speed random access memory (RAM, RandomAccessMemory), can also
It can further include non-labile memory (non-volatilememory), a for example, at least magnetic disk storage.By at least
One communication interface 103 (can be wired or wireless) realizes the communication between the system network element and at least one other network element
Connection, can be used internet, wide area network, local network, Metropolitan Area Network (MAN) etc..Bus 102 can be isa bus, pci bus or EISA
Bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for indicating, only with one in Figure 14
Four-headed arrow indicates, it is not intended that an only bus or a type of bus.
Processor 101 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization
Each step of method can be completed by the integrated logic circuit of the hardware in processor 101 or the instruction of software form.On
The processor 101 stated can be general processor, including central processing unit (CentralProcessingUnit, abbreviation CPU),
Network processing unit (NetworkProcessor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal
Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, referred to as
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable
Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention
Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor is also possible to appoint
What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing
Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at
Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally
In the storage medium of field maturation.The storage medium is located at memory 100, and processor 101 reads the information in memory 100,
The step of completing the method for previous embodiment in conjunction with its hardware.
The embodiment of the invention also provides a kind of machine readable storage medium, which is stored with machine
Executable instruction, for the machine-executable instruction when being called and being executed by processor, machine-executable instruction promotes processor real
It is the step of existing above-mentioned text detection model training method, text filed determining method or content of text determine method, specific real
Now reference can be made to embodiment of the method, details are not described herein.
Text detection model training method, text filed, content provided by the embodiment of the present invention determine method, apparatus and
The computer program product of electronic equipment, the computer readable storage medium including storing program code, said program code
Including instruction can be used for executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, herein
It repeats no more.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (52)
1. a kind of text detection model training method, which is characterized in that the described method includes:
Gathered based on preset training and determines target training image;
The target training image is input to the first initial model;First initial model includes that fisrt feature extracts net
Network, Fusion Features network and the first output network;
Multiple initial characteristics figures that network extracts the target training image are extracted by the fisrt feature;It is multiple described initial
Scale between characteristic pattern is different;
Fusion treatment is carried out to multiple initial characteristics figures by the Fusion Features network, obtains fusion feature figure;
The fusion feature figure is input to the first output network, exports time text filed in the target training image
The probability value of favored area and each candidate region;
The first of the probability value of the candidate region and each candidate region is determined by preset Detectability loss function
Penalty values;First initial model is trained according to the first-loss value, until in first initial model
Parameter convergence, obtains text detection model.
2. the method according to claim 1, wherein it includes sequentially connected more that the fisrt feature, which extracts network,
The first convolutional network of group;First convolutional network described in every group includes sequentially connected convolutional layer, batch normalization layer and activation primitive
Layer.
3. the method according to claim 1, wherein by the Fusion Features network to multiple initial spies
The step of sign figure carries out fusion treatment, obtains fusion feature figure, comprising:
According to the scale of the initial characteristics figure, multiple initial characteristics figures are arranged successively;Wherein, top grade is initial
The scale of characteristic pattern is minimum;The scale of the initial characteristics figure of bottom grade is maximum;
The initial characteristics figure of top grade is determined as to the fusion feature figure of the top grade;
It is in addition to the top grade, the fusion of the initial characteristics figure of current level and a upper level for the current level is special
Sign figure is merged, and the fusion feature figure of current level is obtained;
The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.
4. the method according to claim 1, wherein the first output network includes the first convolutional layer and second
Convolutional layer;
The fusion feature figure is input to the first output network, exports time text filed in the target training image
The step of probability value of favored area and each candidate region, comprising:
The fusion feature figure is separately input into first convolutional layer and second convolutional layer;
The first convolution algorithm, output coordinate matrix are carried out to the fusion feature figure by first convolutional layer;The coordinate
Matrix includes the apex coordinate of candidate region text filed in the target training image;
The second convolution algorithm, output probability matrix are carried out to the fusion feature figure by second convolutional layer;The probability
Matrix includes the probability value of each candidate region.
5. the method according to claim 1, wherein the Detectability loss function includes first function and the second letter
Number;
The first function is L1=| G*-G|;Wherein, the G*It is text filed in the target training image that marks in advance
Coordinates matrix;G is the seat of candidate region text filed in the target training image of the first output network output
Mark matrix;
The second function is L2=-Y*logY-(1-Y*)log(1-Y);Wherein, Y*For the target training figure marked in advance
The text filed probability matrix as in;Y is text filed in the target training image of the first output network output
The probability matrix of candidate region;Log indicates logarithm operation;
The first-loss value L=L of the probability value of the candidate region and each candidate region1+L2。
6. the method according to claim 1, wherein according to the first-loss value to first initial model
It is trained, up to the step of parameter in first initial model restrains, obtains text detection model, comprising:
The parameter in first initial model is updated according to the first-loss value;
Judge whether the updated parameter restrains;
If the updated parameter restrains, updated first initial model of parameter is determined as detection model;
If the updated parameter does not restrain, continues to execute and determining target training image is gathered based on preset training
The step of, until the updated parameter restrains.
7. according to the method described in claim 6, it is characterized in that, updating first introductory die according to the first-loss value
The step of parameter in type, comprising:
According to preset rules, parameter to be updated is determined from first initial model;
The first-loss value is calculated to the derivative of parameter to be updated described in first initial modelWherein, L is institute
State first-loss value;W is the parameter to be updated;
The parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α is default system
Number.
8. a kind of text filed determining method, which is characterized in that the described method includes:
Obtain image to be detected;
Described image to be detected is input to the text detection model that training is completed in advance, exports text in described image to be detected
The probability value of multiple candidate regions in region and each candidate region;The text detection model passes through claim
The training method training of the described in any item text detection models of 1-7 obtains;
According to the overlapping degree between the probability value of the candidate region and multiple candidate regions, from multiple candidates
It is determined in region text filed in described image to be detected.
9. according to the method described in claim 8, it is characterized in that, according to the probability value of the candidate region and multiple described
Overlapping degree between candidate region, from the text filed step determined in multiple candidate regions in described image to be detected
Suddenly, comprising:
According to the probability value of the candidate region, multiple candidate regions are arranged successively;Wherein, first candidate region
Probability value is maximum, and the probability value of the last one candidate region is minimum;
Using first candidate region as current candidate region, the current candidate region is calculated one by one and except described current
The overlapping degree of candidate region other than candidate region;
By in the candidate region in addition to the current candidate region, the overlapping degree is greater than the candidate of preset anti-eclipse threshold
It rejects in region;
Using next candidate region in the current candidate region as new current candidate region, continues to execute and calculate institute one by one
The step of stating the overlapping degree in current candidate region and the candidate region in addition to the current candidate region, until reaching last
One candidate region;
Remaining candidate region after rejecting is determined as text filed in described image to be detected.
10., will be multiple described according to the method described in claim 9, it is characterized in that, according to the probability value of the candidate region
Before the step of candidate region is arranged successively, the method also includes:
By in multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained final more
A candidate region.
11. a kind of content of text determines method, which is characterized in that the described method includes:
By the described in any item text filed determining methods of claim 8-10, obtain text filed in image;
By the text filed text identification model for being input to training completion in advance, the text filed identification knot is exported
Fruit;
According to the recognition result determine it is described it is text filed in content of text.
12. according to the method for claim 11, which is characterized in that text filed be input to is trained completion in advance
Before the step of identification model, the method also includes: according to pre-set dimension, text filed it is normalized to described.
13. according to the method for claim 11, which is characterized in that the text identification model has been trained by following manner
At:
Gathered based on preset training and determines target training text image;
The target training text image is input to the second initial model;Second initial model includes that second feature is extracted
Network, feature split network, the second output network and classification function;
The characteristic pattern that network extracts the target training text image is extracted by the second feature;
Network is split by the feature, and the characteristic pattern is split into at least one subcharacter figure;
The subcharacter figure is separately input into the second output network, exports the corresponding output square of each subcharacter figure
Battle array;
The corresponding output matrix of each subcharacter figure is separately input into the classification function, exports each subcharacter
Scheme corresponding probability matrix;
The second penalty values of the probability matrix are determined by preset identification loss function;According to second penalty values to institute
It states the second initial model to be trained, until the parameter convergence in second initial model, obtains text identification model.
14. according to the method for claim 13, which is characterized in that it includes sequentially connected that the second feature, which extracts network,
The second convolutional network of multiple groups;Second convolutional network described in every group includes sequentially connected convolutional layer, pond layer and activation primitive layer.
15. according to the method for claim 13, which is characterized in that split network by the feature and tear the characteristic pattern open
The step of being divided at least one subcharacter figure, comprising:
Along the column direction of the characteristic pattern, the characteristic pattern is split into at least one subcharacter figure;The column of the characteristic pattern
Direction is the vertical direction of text line direction.
16. according to the method for claim 13, which is characterized in that the second output network includes multiple full articulamentums;
The quantity of the full articulamentum is corresponding with the quantity of the subcharacter figure;
It is described that the subcharacter figure is separately input into the second output network, it is corresponding defeated to export each subcharacter figure
The step of matrix out, comprising: each subcharacter figure is separately input into corresponding full articulamentum, so that each described complete
Articulamentum exports the corresponding output matrix of the subcharacter figure.
17. according to the method for claim 13, which is characterized in that the classification function includes Softmax function;
The Softmax function isWherein, e indicates natural constant;T indicates t-th of probability matrix;K table
Show the number for the kinds of characters that the target training text image of the training set is included;M is indicated from 1 to K+1;∑ expression is asked
And operation;For i-th of element in the output matrix;It is describedFor the probability matrix ptIn i-th of element.
18. according to the method for claim 13, which is characterized in that the identification loss function include L=-log p (y |
{pt}T=1 ... T);Wherein, y is the probability matrix of the target training text image marked in advance;T indicates t-th of probability square
Battle array;ptFor each of the classification function output corresponding probability matrix of the subcharacter figure;T is the sum of the probability matrix
Amount;P indicates to calculate probability;Log indicates logarithm operation.
19. according to the method for claim 13, which is characterized in that according to second penalty values to second introductory die
Type is trained, up to the step of parameter in second initial model restrains, obtains text identification model, comprising:
The parameter in second initial model is updated according to second penalty values;
Judge whether the updated parameter restrains;
If the updated parameter restrains, updated second initial model of parameter is determined as text identification mould
Type;
If the updated parameter does not restrain, continues to execute and determining target training text is gathered based on preset training
The step of image, until updated each parameter restrains.
20. according to the method for claim 19, which is characterized in that it is initial to update described second according to second penalty values
In model the step of parameters, comprising:
According to preset rules, parameter to be updated is determined from second initial model;
Second penalty values are calculated to the derivative of the parameter to be updatedWherein, L ' is the loss of the probability matrix
Value;W ' is the parameter to be updated;
The parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α ' is default
Coefficient.
21. according to the method for claim 11, which is characterized in that the text filed recognition result includes the text
The corresponding multiple probability matrixs in region;
According to the recognition result determine it is described it is text filed in content of text the step of, comprising:
Determine the position of the most probable value in each probability matrix;
From in the corresponding relationship of position each in pre-set probability matrix and character, the position of the most probable value is obtained
Corresponding character;
According to putting in order for multiple probability matrixs, the character got is arranged;
According to the character after arrangement determine it is described it is text filed in content of text.
22. according to the method for claim 21, which is characterized in that determine the text area according to the character after arrangement
The step of content of text in domain, comprising:
According to preset rules, the repeat character (RPT) and null character in the character after deleting arrangement, obtain it is described it is text filed in
Content of text.
23. according to the method for claim 11, which is characterized in that according to the recognition result determine it is described it is text filed in
Content of text the step of after, the method also includes:
If include in described image it is multiple text filed, obtain it is each it is described it is text filed in content of text;
Determine in the corresponding content of text of described image whether include sensitive information by the sensitive dictionary pre-established.
24. according to the method for claim 23, which is characterized in that determine described image by the sensitive dictionary pre-established
The step of whether including sensitive information in corresponding content of text, comprising:
Participle operation is carried out to the content of text got;
The participle dictionary sensitive with what is pre-established obtained after participle operation is matched one by one;
If at least one participle successful match, determines that in the corresponding content of text of described image include sensitive information.
25. according to the method for claim 24, which is characterized in that determine in the corresponding content of text of described image and include
After sensitive information, the method also includes:
Obtain successful match participle belonging to it is text filed, identified in described image get it is described text filed,
Or the participle of successful match.
26. a kind of text detection model training apparatus, which is characterized in that described device includes:
Training image determining module determines target training image for gathering based on preset training;
Training image input module, for the target training image to be input to the first initial model;First introductory die
Type includes that fisrt feature extracts network, Fusion Features network and the first output network;
Characteristic extracting module, for extracting multiple initial spies that network extracts the target training image by the fisrt feature
Sign figure;Scale between multiple initial characteristics figures is different;
Fusion Features module is obtained for carrying out fusion treatment to multiple initial characteristics figures by the Fusion Features network
To fusion feature figure;
Output module exports the target training image for the fusion feature figure to be input to the first output network
In text filed candidate region and each candidate region probability value;
Penalty values determination and training module, for determining the candidate region and each institute by preset Detectability loss function
State the first-loss value of the probability value of candidate region;First initial model is trained according to the first-loss value,
Until the parameter convergence in first initial model, obtains text detection model.
27. device according to claim 26, which is characterized in that it includes sequentially connected that the fisrt feature, which extracts network,
The first convolutional network of multiple groups;First convolutional network described in every group includes sequentially connected convolutional layer, batch normalization layer and activation letter
Several layers.
28. device according to claim 26, which is characterized in that the Fusion Features module is also used to:
According to the scale of the initial characteristics figure, multiple initial characteristics figures are arranged successively;Wherein, top grade is initial
The scale of characteristic pattern is minimum;The scale of the initial characteristics figure of bottom grade is maximum;
The initial characteristics figure of top grade is determined as to the fusion feature figure of the top grade;
It is in addition to the top grade, the fusion of the initial characteristics figure of current level and a upper level for the current level is special
Sign figure is merged, and the fusion feature figure of current level is obtained;
The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.
29. device according to claim 26, which is characterized in that the first output network includes the first convolutional layer and the
Two convolutional layers;
The output module is also used to:
The fusion feature figure is separately input into first convolutional layer and second convolutional layer;
The first convolution algorithm, output coordinate matrix are carried out to the fusion feature figure by first convolutional layer;The coordinate
Matrix includes the apex coordinate of candidate region text filed in the target training image;
The second convolution algorithm, output probability matrix are carried out to the fusion feature figure by second convolutional layer;The probability
Matrix includes the probability value of each candidate region.
30. device according to claim 26, which is characterized in that the Detectability loss function includes first function and second
Function;
The first function is L1=| G*-G|;Wherein, the G*It is text filed in the target training image that marks in advance
Coordinates matrix;G is the seat of candidate region text filed in the target training image of the first output network output
Mark matrix;
The second function is L2=-Y*logY-(1-Y*)log(1-Y);Wherein, Y*For the target training figure marked in advance
The text filed probability matrix as in;Y is text filed in the target training image of the first output network output
The probability matrix of candidate region;Log indicates logarithm operation;
The first-loss value L=L of the probability value of the candidate region and each candidate region1+L2。
31. device according to claim 26, which is characterized in that the penalty values are determining and training module is also used to:
The parameter in first initial model is updated according to the first-loss value;
Judge whether the updated parameter restrains;
If the updated parameter restrains, updated first initial model of parameter is determined as detection model;
If the updated parameter does not restrain, continues to execute and determining target training image is gathered based on preset training
The step of, until the updated parameter restrains.
32. device according to claim 31, which is characterized in that the penalty values are determining and training module is also used to:
According to preset rules, parameter to be updated is determined from first initial model;
The first-loss value is calculated to the derivative of parameter to be updated described in first initial modelWherein, L is institute
State first-loss value;W is the parameter to be updated;
The parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α is default system
Number.
33. a kind of text filed determining device, which is characterized in that described device includes:
Image collection module, for obtaining image to be detected;
Detection module, for described image to be detected to be input to the text detection model that training is completed in advance, output it is described to
The probability value of text filed multiple candidate regions and each candidate region in detection image;The text detection mould
Type is obtained by the training method training of the described in any item text detection models of claim 1-7;
Text filed determining module, for the weight between the probability value and multiple candidate regions according to the candidate region
Folded degree, it is text filed in described image to be detected from being determined in multiple candidate regions.
34. device according to claim 33, which is characterized in that the text filed determining module is also used to:
According to the probability value of the candidate region, multiple candidate regions are arranged successively;Wherein, first candidate region
Probability value is maximum, and the probability value of the last one candidate region is minimum;
Using first candidate region as current candidate region, the current candidate region is calculated one by one and except described current
The overlapping degree of candidate region other than candidate region;
By in the candidate region in addition to the current candidate region, the overlapping degree is greater than the candidate of preset anti-eclipse threshold
It rejects in region;
Using next candidate region in the current candidate region as new current candidate region, continues to execute and calculate institute one by one
The step of stating the overlapping degree in current candidate region and the candidate region in addition to the current candidate region, until reaching last
One candidate region;
Remaining candidate region after rejecting is determined as text filed in described image to be detected.
35. device according to claim 34, which is characterized in that described device further include: module is rejected in region, and being used for will
In multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained final multiple described
Candidate region.
36. a kind of content of text determining device, which is characterized in that described device includes:
Region obtains module, for obtaining in image by the described in any item text filed determining methods of claim 8-10
It is text filed;
Identification module, for exporting the text for the text filed text identification model for being input to training completion in advance
The recognition result in region;
Content of text determining module, for according to the recognition result determine it is described it is text filed in content of text.
37. device according to claim 36, which is characterized in that described device further include: normalization module, for according to
Pre-set dimension text filed is normalized to described.
38. device according to claim 36, which is characterized in that described device further includes text identification model training mould
Block, for completing the text identification model by following manner training:
Gathered based on preset training and determines target training text image;
The target training text image is input to the second initial model;Second initial model includes that second feature is extracted
Network, the second output network and classification function;
The characteristic pattern that network extracts the target training text image is extracted by the second feature;
The characteristic pattern is split into at least one subcharacter figure by second initial model;
The subcharacter figure is separately input into the second output network, exports the corresponding output square of each subcharacter figure
Battle array;
The corresponding output matrix of each subcharacter figure is separately input into the classification function, exports each subcharacter
Scheme corresponding probability matrix;
The second penalty values of the probability matrix are determined by preset identification loss function;According to second penalty values to institute
It states the second initial model to be trained, until the parameter convergence in second initial model, obtains text identification model.
39. the device according to claim 38, which is characterized in that it includes sequentially connected that the second feature, which extracts network,
The second convolutional network of multiple groups;Second convolutional network described in every group includes sequentially connected convolutional layer, pond layer and activation primitive layer.
40. the device according to claim 38, which is characterized in that the identification model training module is also used to:
Along the column direction of the characteristic pattern, the characteristic pattern is split into at least one subcharacter figure;The column of the characteristic pattern
Direction is the vertical direction of text line direction.
41. the device according to claim 38, which is characterized in that the second output network includes multiple full articulamentums;
The quantity of the full articulamentum is corresponding with the quantity of the subcharacter figure;
The identification model training module is also used to: each subcharacter figure is separately input into corresponding full articulamentum,
So that each full articulamentum exports the corresponding output matrix of the subcharacter figure.
42. the device according to claim 38, which is characterized in that the classification function includes Softmax function;
The Softmax function isWherein, e indicates natural constant;T indicates t-th of probability matrix;K table
Show the number for the kinds of characters that the target training text image of the training set is included;M is indicated from 1 to K+1;∑ expression is asked
And operation;For i-th of element in the output matrix;It is describedFor the probability matrix ptIn i-th of element.
43. the device according to claim 38, which is characterized in that the identification loss function include L=-log p (y |
{pt}T=1 ... T);Wherein, y is the probability matrix of the target training text image marked in advance;T indicates t-th of probability square
Battle array;ptFor each of the classification function output corresponding probability matrix of the subcharacter figure;T is the sum of the probability matrix
Amount;P indicates to calculate probability;Log indicates logarithm operation.
44. the device according to claim 38, which is characterized in that the identification model training module is also used to:
The parameter in second initial model is updated according to second penalty values;
Judge whether updated each parameter restrains;
If updated each parameter restrains, updated second initial model of parameter is determined as text and is known
Other model;
If updated each parameter does not restrain, continues to execute and determining target training is gathered based on preset training
The step of text image, until updated each parameter restrains.
45. device according to claim 44, which is characterized in that the identification model training module is also used to:
According to preset rules, parameter to be updated is determined from second initial model;
Second penalty values are calculated to the derivative of the parameter to be updatedWherein, L ' is the loss of the probability matrix
Value;W ' is the parameter to be updated;
The parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α ' is default
Coefficient.
46. device according to claim 36, which is characterized in that the text filed recognition result includes the text
The corresponding multiple probability matrixs in region;
The content of text determining module is also used to:
Determine the position of the most probable value in each probability matrix;
From in the corresponding relationship of position each in pre-set probability matrix and character, the position of the most probable value is obtained
Corresponding character;
According to putting in order for multiple probability matrixs, the character got is arranged;
According to the character after arrangement determine it is described it is text filed in content of text.
47. device according to claim 46, which is characterized in that the content of text determining module is also used to:
According to preset rules, the repeat character (RPT) and null character in the character after deleting arrangement, obtain it is described it is text filed in
Content of text.
48. device according to claim 36, which is characterized in that described device further include:
Data obtaining module, if for include in described image it is multiple text filed, obtain it is each it is described it is text filed in
Content of text;
Sensitive information determining module, determining in the corresponding content of text of described image for the sensitive dictionary by pre-establishing is
No includes sensitive information.
49. device according to claim 48, which is characterized in that the sensitive information determining module is also used to:
Participle operation is carried out to the content of text got;
The participle dictionary sensitive with what is pre-established obtained after participle operation is matched one by one;
If at least one participle successful match, determines that in the corresponding content of text of described image include sensitive information.
50. device according to claim 49, which is characterized in that described device further include:
Area identification module, it is text filed belonging to the participle of successful match for obtaining, acquisition is identified in described image
That arrives is described text filed.
51. a kind of electronic equipment, which is characterized in that including processor and memory, the memory is stored with can be described
The machine-executable instruction that processor executes, the processor execute the machine-executable instruction to realize claim 1 to 7
Described in any item text detection model training methods, the described in any item text filed determining methods of claim 8 to 10, or
The step of described in any item content of text of person's claim 11 to 25 determine method.
52. a kind of machine readable storage medium, which is characterized in that the machine readable storage medium is stored with the executable finger of machine
It enables, for the machine-executable instruction when being called and being executed by processor, machine-executable instruction promotes processor to realize that right is wanted
Ask 1 to 7 described in any item text detection model training methods, the described in any item text filed determinations of claim 8 to 10
The step of method or the described in any item content of text of claim 11 to 25 determine method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910367675.2A CN110110715A (en) | 2019-04-30 | 2019-04-30 | Text detection model training method, text filed, content determine method and apparatus |
PCT/CN2020/087809 WO2020221298A1 (en) | 2019-04-30 | 2020-04-29 | Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910367675.2A CN110110715A (en) | 2019-04-30 | 2019-04-30 | Text detection model training method, text filed, content determine method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110110715A true CN110110715A (en) | 2019-08-09 |
Family
ID=67488106
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910367675.2A Pending CN110110715A (en) | 2019-04-30 | 2019-04-30 | Text detection model training method, text filed, content determine method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110110715A (en) |
WO (1) | WO2020221298A1 (en) |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610166A (en) * | 2019-09-18 | 2019-12-24 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110674804A (en) * | 2019-09-24 | 2020-01-10 | 上海眼控科技股份有限公司 | Text image detection method and device, computer equipment and storage medium |
CN110705460A (en) * | 2019-09-29 | 2020-01-17 | 北京百度网讯科技有限公司 | Image category identification method and device |
CN110751146A (en) * | 2019-10-23 | 2020-02-04 | 北京印刷学院 | Text region detection method, text region detection device, electronic terminal and computer-readable storage medium |
CN110929647A (en) * | 2019-11-22 | 2020-03-27 | 科大讯飞股份有限公司 | Text detection method, device, equipment and storage medium |
CN110942067A (en) * | 2019-11-29 | 2020-03-31 | 上海眼控科技股份有限公司 | Text recognition method and device, computer equipment and storage medium |
CN111062389A (en) * | 2019-12-10 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Character recognition method and device, computer readable medium and electronic equipment |
CN111062385A (en) * | 2019-11-18 | 2020-04-24 | 上海眼控科技股份有限公司 | Network model construction method and system for image text information detection |
CN111104934A (en) * | 2019-12-22 | 2020-05-05 | 上海眼控科技股份有限公司 | Engine label detection method, electronic device and computer readable storage medium |
CN111353442A (en) * | 2020-03-03 | 2020-06-30 | Oppo广东移动通信有限公司 | Image processing method, device, equipment and storage medium |
CN111382740A (en) * | 2020-03-13 | 2020-07-07 | 深圳前海环融联易信息科技服务有限公司 | Text picture analysis method and device, computer equipment and storage medium |
CN111784623A (en) * | 2020-09-07 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
WO2020221298A1 (en) * | 2019-04-30 | 2020-11-05 | 北京金山云网络技术有限公司 | Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus |
CN112287763A (en) * | 2020-09-27 | 2021-01-29 | 北京旷视科技有限公司 | Image processing method, apparatus, device and medium |
CN112541491A (en) * | 2020-12-07 | 2021-03-23 | 沈阳雅译网络技术有限公司 | End-to-end text detection and identification method based on image character region perception |
CN112580656A (en) * | 2021-02-23 | 2021-03-30 | 上海旻浦科技有限公司 | End-to-end text detection method, system, terminal and storage medium |
CN112686317A (en) * | 2020-12-30 | 2021-04-20 | 北京迈格威科技有限公司 | Neural network training method and device, electronic equipment and storage medium |
CN112749704A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Text region detection method and device and server |
CN112767431A (en) * | 2021-01-12 | 2021-05-07 | 云南电网有限责任公司电力科学研究院 | Power grid target detection method and device for power system |
CN112801097A (en) * | 2021-04-14 | 2021-05-14 | 北京世纪好未来教育科技有限公司 | Training method and device of text detection model and readable storage medium |
CN112818975A (en) * | 2021-01-27 | 2021-05-18 | 北京金山数字娱乐科技有限公司 | Text detection model training method and device and text detection method and device |
CN112990181A (en) * | 2021-04-30 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and storage medium |
CN113033593A (en) * | 2019-12-25 | 2021-06-25 | 上海智臻智能网络科技股份有限公司 | Text detection training method and device based on deep learning |
CN113076944A (en) * | 2021-03-11 | 2021-07-06 | 国家电网有限公司 | Document detection and identification method based on artificial intelligence |
CN113112511A (en) * | 2021-04-19 | 2021-07-13 | 新东方教育科技集团有限公司 | Method and device for correcting test paper, storage medium and electronic equipment |
CN113205160A (en) * | 2021-07-05 | 2021-08-03 | 北京世纪好未来教育科技有限公司 | Model training method, text recognition method, model training device, text recognition device, electronic equipment and medium |
CN113205426A (en) * | 2021-05-27 | 2021-08-03 | 中库(北京)数据系统有限公司 | Method and device for predicting popularity level of social media content |
CN113221711A (en) * | 2021-04-30 | 2021-08-06 | 北京金山数字娱乐科技有限公司 | Information extraction method and device |
CN113298156A (en) * | 2021-05-28 | 2021-08-24 | 有米科技股份有限公司 | Neural network training method and device for image gender classification |
CN113313022A (en) * | 2021-05-27 | 2021-08-27 | 北京百度网讯科技有限公司 | Training method of character recognition model and method for recognizing characters in image |
CN113409776A (en) * | 2021-06-30 | 2021-09-17 | 南京领行科技股份有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN113807096A (en) * | 2021-04-09 | 2021-12-17 | 京东科技控股股份有限公司 | Text data processing method and device, computer equipment and storage medium |
CN114005019A (en) * | 2021-10-29 | 2022-02-01 | 北京有竹居网络技术有限公司 | Method for identifying copied image and related equipment thereof |
CN114065768A (en) * | 2021-12-08 | 2022-02-18 | 马上消费金融股份有限公司 | Feature fusion model training and text processing method and device |
CN114120287A (en) * | 2021-12-03 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN114239499A (en) * | 2021-04-30 | 2022-03-25 | 北京金山数字娱乐科技有限公司 | Recruitment information management method, system and device |
CN114663594A (en) * | 2022-03-25 | 2022-06-24 | 中国电信股份有限公司 | Image feature point detection method, device, medium, and apparatus |
CN114724144A (en) * | 2022-05-16 | 2022-07-08 | 北京百度网讯科技有限公司 | Text recognition method, model training method, device, equipment and medium |
CN115205562A (en) * | 2022-07-22 | 2022-10-18 | 四川云数赋智教育科技有限公司 | Random test paper registration method based on feature points |
CN116311320A (en) * | 2023-05-22 | 2023-06-23 | 建信金融科技有限责任公司 | Training method of text image fusion layer, text image recognition method and device |
CN116630755A (en) * | 2023-04-10 | 2023-08-22 | 雄安创新研究院 | Method, system and storage medium for detecting text position in scene image |
CN117077814A (en) * | 2023-09-25 | 2023-11-17 | 北京百度网讯科技有限公司 | Training method of picture retrieval model, picture retrieval method and device |
Families Citing this family (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417847A (en) * | 2020-11-19 | 2021-02-26 | 湖南红网新媒体集团有限公司 | News content safety monitoring method, system, device and storage medium |
CN112434510B (en) * | 2020-11-24 | 2024-03-29 | 北京字节跳动网络技术有限公司 | Information processing method, device, electronic equipment and storage medium |
CN112328710B (en) * | 2020-11-26 | 2024-06-11 | 北京百度网讯科技有限公司 | Entity information processing method, device, electronic equipment and storage medium |
CN112560476B (en) * | 2020-12-09 | 2024-10-15 | 科大讯飞(北京)有限公司 | Text completion method, electronic equipment and storage device |
CN112686812B (en) * | 2020-12-10 | 2023-08-29 | 广州广电运通金融电子股份有限公司 | Bank card inclination correction detection method and device, readable storage medium and terminal |
CN112418209B (en) * | 2020-12-15 | 2022-09-13 | 润联软件系统(深圳)有限公司 | Character recognition method and device, computer equipment and storage medium |
CN112580495A (en) * | 2020-12-16 | 2021-03-30 | 上海眼控科技股份有限公司 | Text recognition method and device, computer equipment and storage medium |
CN112613376B (en) * | 2020-12-17 | 2024-04-02 | 深圳集智数字科技有限公司 | Re-identification method and device and electronic equipment |
CN112541496B (en) * | 2020-12-24 | 2023-08-22 | 北京百度网讯科技有限公司 | Method, device, equipment and computer storage medium for extracting POI (point of interest) names |
CN112734699B (en) * | 2020-12-24 | 2024-06-14 | 浙江大华技术股份有限公司 | Article state alarm method and device, storage medium and electronic device |
CN112597918B (en) * | 2020-12-25 | 2024-09-10 | 创新奇智(西安)科技有限公司 | Text detection method and device, electronic equipment and storage medium |
CN112784692B (en) * | 2020-12-31 | 2024-07-09 | 科大讯飞股份有限公司 | Method, device, equipment and storage medium for identifying text content of image |
CN112651373B (en) * | 2021-01-04 | 2024-02-09 | 广联达科技股份有限公司 | Method and device for identifying text information of building drawing |
CN113591893B (en) * | 2021-01-26 | 2024-06-28 | 腾讯医疗健康(深圳)有限公司 | Image processing method and device based on artificial intelligence and computer equipment |
CN113763503A (en) * | 2021-01-29 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Graph generation method, device and computer readable storage medium |
CN112802139A (en) * | 2021-02-05 | 2021-05-14 | 歌尔股份有限公司 | Image processing method and device, electronic equipment and readable storage medium |
CN112861739B (en) * | 2021-02-10 | 2022-09-09 | 中国科学技术大学 | End-to-end text recognition method, model training method and device |
CN112949653B (en) * | 2021-02-23 | 2024-04-16 | 科大讯飞股份有限公司 | Text recognition method, electronic equipment and storage device |
CN112966690B (en) * | 2021-03-03 | 2023-01-13 | 中国科学院自动化研究所 | Scene character detection method based on anchor-free frame and suggestion frame |
CN112966609B (en) * | 2021-03-05 | 2023-08-11 | 北京百度网讯科技有限公司 | Target detection method and device |
CN112989844A (en) * | 2021-03-10 | 2021-06-18 | 北京奇艺世纪科技有限公司 | Model training and text recognition method, device, equipment and storage medium |
CN113011312A (en) * | 2021-03-15 | 2021-06-22 | 中国科学技术大学 | Training method of motion positioning model based on weak supervision text guidance |
CN113076823B (en) * | 2021-03-18 | 2023-12-12 | 深圳数联天下智能科技有限公司 | Training method of age prediction model, age prediction method and related device |
CN112927173B (en) * | 2021-04-12 | 2023-04-18 | 平安科技(深圳)有限公司 | Model compression method and device, computing equipment and storage medium |
CN113139463B (en) * | 2021-04-23 | 2022-05-13 | 北京百度网讯科技有限公司 | Method, apparatus, device, medium and program product for training a model |
CN113160196A (en) * | 2021-04-28 | 2021-07-23 | 东南大学 | DBNet-based wiring detection method for secondary circuit terminal block in intelligent substation |
CN113205041B (en) * | 2021-04-29 | 2023-07-28 | 百度在线网络技术(北京)有限公司 | Structured information extraction method, device, equipment and storage medium |
CN113205047B (en) * | 2021-04-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Medicine name identification method, device, computer equipment and storage medium |
CN113221718B (en) * | 2021-05-06 | 2024-01-16 | 新东方教育科技集团有限公司 | Formula identification method, device, storage medium and electronic equipment |
CN113344027B (en) * | 2021-05-10 | 2024-04-23 | 北京迈格威科技有限公司 | Method, device, equipment and storage medium for retrieving objects in image |
CN113139625B (en) * | 2021-05-18 | 2023-12-15 | 北京世纪好未来教育科技有限公司 | Model training method, electronic equipment and storage medium thereof |
CN113326887B (en) * | 2021-06-16 | 2024-03-29 | 深圳思谋信息科技有限公司 | Text detection method, device and computer equipment |
CN113379500B (en) * | 2021-06-21 | 2024-09-24 | 北京沃东天骏信息技术有限公司 | Sequencing model training method and device, and article sequencing method and device |
CN113379592B (en) * | 2021-06-23 | 2023-09-01 | 北京百度网讯科技有限公司 | Processing method and device for sensitive area in picture and electronic equipment |
CN113343970B (en) * | 2021-06-24 | 2024-03-08 | 中国平安人寿保险股份有限公司 | Text image detection method, device, equipment and storage medium |
CN113378832B (en) * | 2021-06-25 | 2024-05-28 | 北京百度网讯科技有限公司 | Text detection model training method, text prediction box method and device |
CN113298079B (en) * | 2021-06-28 | 2023-10-27 | 北京奇艺世纪科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN113361524B (en) * | 2021-06-29 | 2024-05-03 | 北京百度网讯科技有限公司 | Image processing method and device |
CN113343987B (en) * | 2021-06-30 | 2023-08-22 | 北京奇艺世纪科技有限公司 | Text detection processing method and device, electronic equipment and storage medium |
CN113516126A (en) * | 2021-07-02 | 2021-10-19 | 成都信息工程大学 | Adaptive threshold scene text detection method based on attention feature fusion |
CN113780087B (en) * | 2021-08-11 | 2024-04-26 | 同济大学 | Postal package text detection method and equipment based on deep learning |
CN113762109B (en) * | 2021-08-23 | 2023-11-07 | 北京百度网讯科技有限公司 | Training method of character positioning model and character positioning method |
CN113780131B (en) * | 2021-08-31 | 2024-04-12 | 众安在线财产保险股份有限公司 | Text image orientation recognition method, text content recognition method, device and equipment |
CN113469878B (en) * | 2021-09-02 | 2021-11-12 | 北京世纪好未来教育科技有限公司 | Text erasing method and training method and device of model thereof, and storage medium |
CN113806589B (en) * | 2021-09-29 | 2024-03-08 | 云从科技集团股份有限公司 | Video clip positioning method, device and computer readable storage medium |
CN114022695A (en) * | 2021-10-29 | 2022-02-08 | 北京百度网讯科技有限公司 | Training method and device for detection model, electronic equipment and storage medium |
CN114419199B (en) * | 2021-12-20 | 2023-11-07 | 北京百度网讯科技有限公司 | Picture marking method and device, electronic equipment and storage medium |
CN114022882B (en) * | 2022-01-04 | 2022-04-12 | 北京世纪好未来教育科技有限公司 | Text recognition model training method, text recognition device, text recognition equipment and medium |
CN114821622B (en) * | 2022-03-10 | 2023-07-21 | 北京百度网讯科技有限公司 | Text extraction method, text extraction model training method, device and equipment |
CN114743019A (en) * | 2022-03-24 | 2022-07-12 | 国网山东省电力公司莱芜供电公司 | Cross-modal target detection method and system based on multi-scale features |
CN114937267B (en) * | 2022-04-20 | 2024-04-02 | 北京世纪好未来教育科技有限公司 | Training method and device for text recognition model and electronic equipment |
CN114758332B (en) * | 2022-06-13 | 2022-09-02 | 北京万里红科技有限公司 | Text detection method and device, computing equipment and storage medium |
CN114827132B (en) * | 2022-06-27 | 2022-09-09 | 河北东来工程技术服务有限公司 | Ship traffic file transmission control method, system, device and storage medium |
CN114842483B (en) * | 2022-06-27 | 2023-11-28 | 齐鲁工业大学 | Standard file information extraction method and system based on neural network and template matching |
CN115171110B (en) * | 2022-06-30 | 2023-08-22 | 北京百度网讯科技有限公司 | Text recognition method and device, equipment, medium and product |
CN115601553B (en) * | 2022-08-15 | 2023-08-18 | 杭州联汇科技股份有限公司 | Visual model pre-training method based on multi-level picture description data |
CN116226319B (en) * | 2023-05-10 | 2023-08-04 | 浪潮电子信息产业股份有限公司 | Hybrid heterogeneous model training method, device, equipment and readable storage medium |
CN116503517B (en) * | 2023-06-27 | 2023-09-05 | 江西农业大学 | Method and system for generating image by long text |
CN117315702B (en) * | 2023-11-28 | 2024-02-23 | 山东正云信息科技有限公司 | Text detection method, system and medium based on set prediction |
CN117593752B (en) * | 2024-01-18 | 2024-04-09 | 星云海数字科技股份有限公司 | PDF document input method, PDF document input system, storage medium and electronic equipment |
CN117611580B (en) * | 2024-01-18 | 2024-05-24 | 深圳市宗匠科技有限公司 | Flaw detection method, flaw detection device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108519970A (en) * | 2018-02-06 | 2018-09-11 | 平安科技(深圳)有限公司 | The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text |
CN108764226A (en) * | 2018-04-13 | 2018-11-06 | 顺丰科技有限公司 | Image text recognition methods, device, equipment and its storage medium |
CN109086756A (en) * | 2018-06-15 | 2018-12-25 | 众安信息技术服务有限公司 | A kind of text detection analysis method, device and equipment based on deep neural network |
CN109447469A (en) * | 2018-10-30 | 2019-03-08 | 阿里巴巴集团控股有限公司 | A kind of Method for text detection, device and equipment |
CN109492638A (en) * | 2018-11-07 | 2019-03-19 | 北京旷视科技有限公司 | Method for text detection, device and electronic equipment |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9460357B2 (en) * | 2014-01-08 | 2016-10-04 | Qualcomm Incorporated | Processing text images with shadows |
CN108288078B (en) * | 2017-12-07 | 2020-09-29 | 腾讯科技(深圳)有限公司 | Method, device and medium for recognizing characters in image |
CN108764228A (en) * | 2018-05-28 | 2018-11-06 | 嘉兴善索智能科技有限公司 | Word object detection method in a kind of image |
CN109299274B (en) * | 2018-11-07 | 2021-12-17 | 南京大学 | Natural scene text detection method based on full convolution neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110135248A (en) * | 2019-04-03 | 2019-08-16 | 华南理工大学 | A kind of natural scene Method for text detection based on deep learning |
CN110110715A (en) * | 2019-04-30 | 2019-08-09 | 北京金山云网络技术有限公司 | Text detection model training method, text filed, content determine method and apparatus |
-
2019
- 2019-04-30 CN CN201910367675.2A patent/CN110110715A/en active Pending
-
2020
- 2020-04-29 WO PCT/CN2020/087809 patent/WO2020221298A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108519970A (en) * | 2018-02-06 | 2018-09-11 | 平安科技(深圳)有限公司 | The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text |
CN108764226A (en) * | 2018-04-13 | 2018-11-06 | 顺丰科技有限公司 | Image text recognition methods, device, equipment and its storage medium |
CN109086756A (en) * | 2018-06-15 | 2018-12-25 | 众安信息技术服务有限公司 | A kind of text detection analysis method, device and equipment based on deep neural network |
CN109447469A (en) * | 2018-10-30 | 2019-03-08 | 阿里巴巴集团控股有限公司 | A kind of Method for text detection, device and equipment |
CN109492638A (en) * | 2018-11-07 | 2019-03-19 | 北京旷视科技有限公司 | Method for text detection, device and electronic equipment |
Non-Patent Citations (3)
Title |
---|
MR_HEALTH: "FPN网络中RPN构建与相应的损失函数", 《CSDN》 * |
PENGYUAN LYU ET AL.: "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes", 《ARXIV》 * |
TSUNG-YI LIN ET AL.: "Feature Pyramid Networks for Object Detection", 《ARXIV》 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020221298A1 (en) * | 2019-04-30 | 2020-11-05 | 北京金山云网络技术有限公司 | Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus |
CN110610166A (en) * | 2019-09-18 | 2019-12-24 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110610166B (en) * | 2019-09-18 | 2022-06-07 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110674804A (en) * | 2019-09-24 | 2020-01-10 | 上海眼控科技股份有限公司 | Text image detection method and device, computer equipment and storage medium |
CN110705460A (en) * | 2019-09-29 | 2020-01-17 | 北京百度网讯科技有限公司 | Image category identification method and device |
CN110705460B (en) * | 2019-09-29 | 2023-06-20 | 北京百度网讯科技有限公司 | Image category identification method and device |
CN110751146A (en) * | 2019-10-23 | 2020-02-04 | 北京印刷学院 | Text region detection method, text region detection device, electronic terminal and computer-readable storage medium |
CN112749704A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Text region detection method and device and server |
CN111062385A (en) * | 2019-11-18 | 2020-04-24 | 上海眼控科技股份有限公司 | Network model construction method and system for image text information detection |
CN110929647A (en) * | 2019-11-22 | 2020-03-27 | 科大讯飞股份有限公司 | Text detection method, device, equipment and storage medium |
CN110942067A (en) * | 2019-11-29 | 2020-03-31 | 上海眼控科技股份有限公司 | Text recognition method and device, computer equipment and storage medium |
CN111062389A (en) * | 2019-12-10 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Character recognition method and device, computer readable medium and electronic equipment |
CN111104934A (en) * | 2019-12-22 | 2020-05-05 | 上海眼控科技股份有限公司 | Engine label detection method, electronic device and computer readable storage medium |
CN113033593B (en) * | 2019-12-25 | 2023-09-01 | 上海智臻智能网络科技股份有限公司 | Text detection training method and device based on deep learning |
CN113033593A (en) * | 2019-12-25 | 2021-06-25 | 上海智臻智能网络科技股份有限公司 | Text detection training method and device based on deep learning |
CN111353442A (en) * | 2020-03-03 | 2020-06-30 | Oppo广东移动通信有限公司 | Image processing method, device, equipment and storage medium |
CN111382740A (en) * | 2020-03-13 | 2020-07-07 | 深圳前海环融联易信息科技服务有限公司 | Text picture analysis method and device, computer equipment and storage medium |
CN111382740B (en) * | 2020-03-13 | 2023-11-21 | 深圳前海环融联易信息科技服务有限公司 | Text picture analysis method, text picture analysis device, computer equipment and storage medium |
CN111784623A (en) * | 2020-09-07 | 2020-10-16 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN112287763A (en) * | 2020-09-27 | 2021-01-29 | 北京旷视科技有限公司 | Image processing method, apparatus, device and medium |
CN112541491A (en) * | 2020-12-07 | 2021-03-23 | 沈阳雅译网络技术有限公司 | End-to-end text detection and identification method based on image character region perception |
CN112541491B (en) * | 2020-12-07 | 2024-02-02 | 沈阳雅译网络技术有限公司 | End-to-end text detection and recognition method based on image character region perception |
CN112686317A (en) * | 2020-12-30 | 2021-04-20 | 北京迈格威科技有限公司 | Neural network training method and device, electronic equipment and storage medium |
CN112767431A (en) * | 2021-01-12 | 2021-05-07 | 云南电网有限责任公司电力科学研究院 | Power grid target detection method and device for power system |
CN112767431B (en) * | 2021-01-12 | 2024-04-23 | 云南电网有限责任公司电力科学研究院 | Power grid target detection method and device for power system |
CN112818975B (en) * | 2021-01-27 | 2024-09-24 | 北京金山数字娱乐科技有限公司 | Text detection model training method and device, text detection method and device |
CN112818975A (en) * | 2021-01-27 | 2021-05-18 | 北京金山数字娱乐科技有限公司 | Text detection model training method and device and text detection method and device |
CN112580656A (en) * | 2021-02-23 | 2021-03-30 | 上海旻浦科技有限公司 | End-to-end text detection method, system, terminal and storage medium |
CN113076944A (en) * | 2021-03-11 | 2021-07-06 | 国家电网有限公司 | Document detection and identification method based on artificial intelligence |
CN113807096A (en) * | 2021-04-09 | 2021-12-17 | 京东科技控股股份有限公司 | Text data processing method and device, computer equipment and storage medium |
CN112801097A (en) * | 2021-04-14 | 2021-05-14 | 北京世纪好未来教育科技有限公司 | Training method and device of text detection model and readable storage medium |
CN113112511A (en) * | 2021-04-19 | 2021-07-13 | 新东方教育科技集团有限公司 | Method and device for correcting test paper, storage medium and electronic equipment |
CN113112511B (en) * | 2021-04-19 | 2024-01-05 | 新东方教育科技集团有限公司 | Method and device for correcting test paper, storage medium and electronic equipment |
CN113221711A (en) * | 2021-04-30 | 2021-08-06 | 北京金山数字娱乐科技有限公司 | Information extraction method and device |
CN112990181A (en) * | 2021-04-30 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and storage medium |
CN114239499A (en) * | 2021-04-30 | 2022-03-25 | 北京金山数字娱乐科技有限公司 | Recruitment information management method, system and device |
CN113313022A (en) * | 2021-05-27 | 2021-08-27 | 北京百度网讯科技有限公司 | Training method of character recognition model and method for recognizing characters in image |
CN113313022B (en) * | 2021-05-27 | 2023-11-10 | 北京百度网讯科技有限公司 | Training method of character recognition model and method for recognizing characters in image |
CN113205426A (en) * | 2021-05-27 | 2021-08-03 | 中库(北京)数据系统有限公司 | Method and device for predicting popularity level of social media content |
CN113298156A (en) * | 2021-05-28 | 2021-08-24 | 有米科技股份有限公司 | Neural network training method and device for image gender classification |
CN113409776A (en) * | 2021-06-30 | 2021-09-17 | 南京领行科技股份有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN113409776B (en) * | 2021-06-30 | 2024-06-07 | 南京领行科技股份有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN113205160A (en) * | 2021-07-05 | 2021-08-03 | 北京世纪好未来教育科技有限公司 | Model training method, text recognition method, model training device, text recognition device, electronic equipment and medium |
CN114005019B (en) * | 2021-10-29 | 2023-09-22 | 北京有竹居网络技术有限公司 | Method for identifying flip image and related equipment thereof |
CN114005019A (en) * | 2021-10-29 | 2022-02-01 | 北京有竹居网络技术有限公司 | Method for identifying copied image and related equipment thereof |
CN114120287B (en) * | 2021-12-03 | 2024-08-09 | 腾讯科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN114120287A (en) * | 2021-12-03 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN114065768A (en) * | 2021-12-08 | 2022-02-18 | 马上消费金融股份有限公司 | Feature fusion model training and text processing method and device |
CN114663594A (en) * | 2022-03-25 | 2022-06-24 | 中国电信股份有限公司 | Image feature point detection method, device, medium, and apparatus |
CN114724144B (en) * | 2022-05-16 | 2024-02-09 | 北京百度网讯科技有限公司 | Text recognition method, training device, training equipment and training medium for model |
CN114724144A (en) * | 2022-05-16 | 2022-07-08 | 北京百度网讯科技有限公司 | Text recognition method, model training method, device, equipment and medium |
CN115205562A (en) * | 2022-07-22 | 2022-10-18 | 四川云数赋智教育科技有限公司 | Random test paper registration method based on feature points |
CN116630755A (en) * | 2023-04-10 | 2023-08-22 | 雄安创新研究院 | Method, system and storage medium for detecting text position in scene image |
CN116630755B (en) * | 2023-04-10 | 2024-04-02 | 雄安创新研究院 | Method, system and storage medium for detecting text position in scene image |
CN116311320B (en) * | 2023-05-22 | 2023-08-22 | 建信金融科技有限责任公司 | Training method of text image fusion layer, text image recognition method and device |
CN116311320A (en) * | 2023-05-22 | 2023-06-23 | 建信金融科技有限责任公司 | Training method of text image fusion layer, text image recognition method and device |
CN117077814A (en) * | 2023-09-25 | 2023-11-17 | 北京百度网讯科技有限公司 | Training method of picture retrieval model, picture retrieval method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2020221298A1 (en) | 2020-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110715A (en) | Text detection model training method, text filed, content determine method and apparatus | |
CN107808143B (en) | Dynamic gesture recognition method based on computer vision | |
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
CN105144239A (en) | Image processing device, program, and image processing method | |
CN110738207A (en) | character detection method for fusing character area edge information in character image | |
Kadam et al. | Detection and localization of multiple image splicing using MobileNet V1 | |
JP7559063B2 (en) | FACE PERSHING METHOD AND RELATED DEVICE | |
CN110263819A (en) | A kind of object detection method and device for shellfish image | |
CN112446302B (en) | Human body posture detection method, system, electronic equipment and storage medium | |
CN108470354A (en) | Video target tracking method, device and realization device | |
CN107742107A (en) | Facial image sorting technique, device and server | |
CN111445459A (en) | Image defect detection method and system based on depth twin network | |
CN108492294B (en) | Method and device for evaluating harmony degree of image colors | |
CN112966691A (en) | Multi-scale text detection method and device based on semantic segmentation and electronic equipment | |
Wang et al. | Learning deep conditional neural network for image segmentation | |
CN106778852A (en) | A kind of picture material recognition methods for correcting erroneous judgement | |
CN107291825A (en) | With the search method and system of money commodity in a kind of video | |
CN113487610B (en) | Herpes image recognition method and device, computer equipment and storage medium | |
CN112836625A (en) | Face living body detection method and device and electronic equipment | |
CN112183672A (en) | Image classification method, and training method and device of feature extraction network | |
CN113420763B (en) | Text image processing method and device, electronic equipment and readable storage medium | |
CN108596098A (en) | Analytic method, system, equipment and the storage medium of human part | |
CN108492301A (en) | A kind of Scene Segmentation, terminal and storage medium | |
CN112949408A (en) | Real-time identification method and system for target fish passing through fish channel | |
CN113570540A (en) | Image tampering blind evidence obtaining method based on detection-segmentation architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190809 |
|
RJ01 | Rejection of invention patent application after publication |