CN108734089A - Identify method, apparatus, equipment and the storage medium of table content in picture file - Google Patents
Identify method, apparatus, equipment and the storage medium of table content in picture file Download PDFInfo
- Publication number
- CN108734089A CN108734089A CN201810285135.5A CN201810285135A CN108734089A CN 108734089 A CN108734089 A CN 108734089A CN 201810285135 A CN201810285135 A CN 201810285135A CN 108734089 A CN108734089 A CN 108734089A
- Authority
- CN
- China
- Prior art keywords
- character
- information
- gauge outfit
- coordinate
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/418—Document matching, e.g. of document images
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Character Input (AREA)
- Character Discrimination (AREA)
Abstract
The present invention relates to method, apparatus, equipment and the storage mediums of table content in a kind of identification picture file, belong to image identification technical field.The method includes:Obtain Target Photo file to be identified;Character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file;The character information that will identify that carries out matching treatment with default dictionary, to obtain being more than the gauge outfit character of first threshold with default dictionary matching degree;According to the corresponding character information of gauge outfit character, the table content that Target Photo file includes is determined.Hereby it is achieved that fast and accurately being identified to the table that picture includes, the accuracy of identification is not only increased, moreover it is possible to reduce identification operation the time it takes, effectively improve the usage experience of user.
Description
Technical field
The present invention relates to image identification technical field, more particularly to the method for table content in a kind of identification picture file,
Device, equipment and storage medium.
Background technology
Optical character recognition technology (Optical Character Recognition, referred to as:OCR), it is that one kind passes through
It detects dark, bright pattern and determines character shape in picture, the image of character is then converted into calculating using character recognition technologies
The process of machine word.That is, being directed to printed character, the text conversion in picture is become by black and white lattice using optical method
Image file, and by identification software by the text conversion in image at text formatting, further compiled for word processor
Collect the technology of processing.
With the continuous development of computer technology, picture input computer system is become one strong with user-friendly
Strong demand.Especially, it will include the picture input computer system of table.Currently, in the related technology, to including table
Picture carry out Table recognition when, document is typically divided into multiple units first, the table for then including to each unit
Line is identified, and extracts and identifies into line character after obtaining tableau format, then to picture.
However, when the table in picture being identified using aforesaid way, not only algorithm is complicated, but also the effect identified
It is affected by picture quality, and it is high to detect error rate.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, one aspect of the present invention embodiment provides a kind of method identifying table content in picture file, this method packet
It includes:Obtain Target Photo file to be identified;Character recognition processing is carried out to the Target Photo file, obtains the target
Character information in picture file;The character information that will identify that carries out matching treatment with default dictionary, with obtain with it is described pre-
If dictionary matching degree is more than the gauge outfit character of first threshold;According to the corresponding character information of the gauge outfit character, the mesh is determined
The table content that mark picture file includes.
Another aspect of the present invention embodiment provides a kind of device identifying table content in picture file, which includes:
First acquisition module, for obtaining Target Photo file to be identified;Processing module, for the Target Photo file into
Line character identifying processing obtains the character information in the Target Photo file;Matching module, the character for will identify that
Information carries out matching treatment with default dictionary, to obtain being more than the gauge outfit character of first threshold with the default dictionary matching degree;
Determining module, for according to the corresponding character information of the gauge outfit character, determining the table that the Target Photo file includes
Content.
Another aspect of the invention embodiment provides a kind of computer equipment, which includes:Memory and processing
Device, the memory are stored with computer program, when the processor executes described program, realize the identification picture
The method of table content in file.
Further aspect of the present invention embodiment provides a kind of computer readable storage medium, is stored thereon with computer program,
When the program is executed by processor, the method for identifying table content in picture file is realized.
Method, apparatus, equipment and the storage medium of table content in identification picture file provided in an embodiment of the present invention lead to
It crosses and obtains Target Photo file to be identified, to carry out character recognition processing to Target Photo file, obtain Target Photo file
In character information, the character information that then will identify that and default dictionary carry out matching treatment, to obtain and default dictionary
Gauge outfit character with degree more than first threshold, and then according to the corresponding character information of gauge outfit character, determine in Target Photo file
Including table content.Hereby it is achieved that fast and accurately being identified to the table that picture includes, knowledge is not only increased
Other accuracy, moreover it is possible to identification operation the time it takes is reduced, to effectively improve the usage experience of user.
It should be understood that above general description and following detailed description is only exemplary and explanatory, not
It can the limitation present invention.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the present invention
Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram;
Fig. 2 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram;
Fig. 3 (a) is the table style schematic diagram shown according to an exemplary embodiment of the invention;
Fig. 3 (b) is the schematic diagram according to the addition Target Photo shown in an exemplary embodiment of the invention;
Fig. 3 (c) is according to the format for determining Target Photo and correspondence character shown in an exemplary embodiment of the invention
The schematic diagram of content;
Fig. 3 (d) is screened according to the character content to identification shown in an exemplary embodiment of the invention, determines knot
Fruit is the schematic diagram of digital content;
Fig. 3 (e) is to draw corresponding trend broken line according to numeric results according to shown in an exemplary embodiment of the invention
The schematic diagram of figure;
Fig. 4 is the position according to the selection content character corresponding with gauge outfit character shown in an exemplary embodiment of the invention
The flow diagram of information and semanteme;
Fig. 5 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram;
Fig. 6 is the structure according to the device of table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram;
Fig. 7 is the structural schematic diagram according to the computer equipment shown in an exemplary embodiment of the invention;
Fig. 8 is the structural schematic diagram according to the computer equipment shown in an exemplary embodiment of the invention.
Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings
It is not intended to limit the scope of the inventive concept in any manner with verbal description, but by referring to specific embodiments
Illustrate idea of the invention for those skilled in the art.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary is implemented
Embodiment described in example does not represent all embodiments consistent with the present invention.On the contrary, they are only and such as institute
The example of the consistent device and method of some aspects being described in detail in attached claims, of the invention.
Various embodiments of the present invention are for the method for table content in existing identification picture file, and not only algorithm is complicated, and
And the effect of identification is affected by picture quality, and the high problem of error rate is detected, propose a kind of Table recognition method.
The method of table content in identification picture file provided in an embodiment of the present invention, first by obtaining mesh to be identified
Picture file is marked, to carry out character recognition processing to Target Photo file, obtains the character information in Target Photo file, so
The character information that will identify that afterwards carries out matching treatment with default dictionary, to obtain being more than the first threshold with default dictionary matching degree
The gauge outfit character of value, and then according to the obtained corresponding character information of gauge outfit character, determine the table that Target Photo file includes
Lattice content.Hereby it is achieved that fast and accurately being identified to the table that picture includes, the accurate of identification is not only increased
Property, moreover it is possible to identification operation the time it takes is reduced, to effectively improve the usage experience of user.
Below in conjunction with the accompanying drawings, it to the method, apparatus of table content, equipment in identification picture file provided by the invention and deposits
Storage media is described in detail.
Fig. 1 is combined first, the method for table content in identification picture file provided in an embodiment of the present invention is carried out detailed
Explanation.
Fig. 1 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram.
As shown in Figure 1, the method for table content may comprise steps of in the identification picture file:
Step 101, Target Photo file to be identified is obtained.
Optionally, the method provided in an embodiment of the present invention for identifying table content in picture file, can be by of the invention real
The computer equipment for applying example offer executes.Wherein, the dress of table content in identification picture file is provided in computer equipment
It sets, with by identifying that the device of table content in picture file identified table content in Target Photo file to be identified
Journey is managed or controls.The present embodiment computer equipment can be any hardware device with data processing function, such as
Computer, personal digital assistant etc..
Wherein, in the present embodiment, Target Photo file to be identified can be the arbitrary picture text with table content
Part, the present embodiment are not especially limited this.
, can be from the local picture library of equipment in a kind of optional way of realization of the application, obtaining arbitrarily has table
The picture file of lattice content is as Target Photo file to be identified;Alternatively, can be sent to server-side with table content
Picture file obtains request, obtains Target Photo file etc. to be identified in real time from server-side to realize, does not make to it herein
It is specific to limit.
Step 102, character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file.
Wherein, in the present embodiment, character information may include character shape, semanteme and character location information etc., herein
It is not especially limited.
Wherein, " character shape ", the writing for indicating character and presentation mode, " character is semantic ", for indicating character
Meaning, " character location information ", for indicating position of the character in Target Photo file.
Optionally, after getting Target Photo file, it is existing to identify that the device of table content in picture file can utilize
There are the character recognition technologies in technology, such as:ORC technologies carry out character recognition processing, to obtain mesh to Target Photo file
Mark the character information in picture file.
Step 103, the character information that will identify that carries out matching treatment with default dictionary, to obtain matching with default dictionary
Gauge outfit character of the degree more than first threshold.
Wherein, it includes various gauge outfit characters to preset dictionary.It can be by collecting a large amount of words, and to a large amount of words
Carry out what analyzing processing obtained;Alternatively, can also be artificial self-defined setting;Alternatively, can also be by different field
Involved a large amount of words are handled, and obtain the dictionary etc. corresponding to different field, the present embodiment does not limit this specifically
It is fixed.
For example, being illustrated by taking the physical examination of medical domain report as an example, physical examination report generally includes:Project, result, reference
The gauge outfit of the types such as value, unit, and there may be difference for the gauge outfit used in the physical examination report of Different hospital, such as:Item class
Gauge outfit generally include:" project ", " project name ", " project full name ", " examining project ", " Chinese ", " Chinese name " etc.
Deng as a result the gauge outfit of class generally includes:" result ", " inspection result ", " testing result ", " measurement result ", " actual numerical value ",
" detected value ", " quantitative result " etc., then by carrying out analyzing processing to above-mentioned multiple contents, you can obtain medical domain
Corresponding default dictionary.
In the present embodiment, first threshold can carry out adaptability setting, such as 0.90 according to actual needs, and 0.92 etc.,
It is not especially limited herein.
In a kind of optional way of realization of the application, after the character information in obtaining Target Photo file, identification
The device of table content carries out matching operation, to obtain i.e. using default dictionary with the character information identified in picture file
Get the gauge outfit character that matching degree is more than first threshold.
If for example, after to Target Photo file identification, determine the character information in Target Photo file be " inspection item ",
" albumin ", " weight ", " reference value ", and first threshold are 0.90.So when above-mentioned character information and the progress of default dictionary
With processing, when obtaining the matching degree between " inspection item " and " reference value " and default dictionary more than 0.90, then " inspection can be determined
Look into project " and " reference value " be gauge outfit character.
Step 104, according to the corresponding character information of gauge outfit character, the table content that Target Photo file includes is determined.
Optionally, after determining gauge outfit character, identify that the device of table content in picture file can be according to gauge outfit
The corresponding character information of character, the table content for including to Target Photo file are determined.
In a kind of optional way of realization, first the corresponding character information of gauge outfit character can be analyzed first, with true
Field belonging to the fixed gauge outfit character, then according to determining field and the corresponding character information of gauge outfit character, you can obtain mesh
Mark table format and table content in picture file.
The method of table content in identification picture file provided in an embodiment of the present invention, by obtaining target figure to be identified
Piece file obtains the character information in Target Photo file to carry out character recognition processing to Target Photo file, then will
The character information identified carries out matching treatment with default dictionary, to obtain being more than first threshold with default dictionary matching degree
Gauge outfit character, and then according to the corresponding character information of gauge outfit character, determine the table content that Target Photo file includes.By
This, realizes and is fast and accurately identified to the table that picture includes, not only increase the accuracy of identification, moreover it is possible to subtract
Few identification operation the time it takes, to effectively improve the usage experience of user.
By above-mentioned analysis it is found that the embodiment of the present invention is by obtaining character information in Target Photo file, with according to word
Information is accorded with, gauge outfit character is obtained, then according to the corresponding character information of gauge outfit character, determines that Target Photo file includes
Table content.In a kind of optional way of realization, due to may include character semanteme and character bit in the character information of acquisition
Confidence ceases, therefore in order to more accurately determine that gauge outfit character, the present embodiment can determine target word first according to dictionary is preset
Symbol collection, it is then semantic further according to the character in character information, the corresponding table style of character information is determined, to according to table
Pattern determines target position information, and then according to target position information and the corresponding location information of target character collection, obtains gauge outfit
Character.With reference to Fig. 2, the above process of the method for table content in picture file, which is specifically described, to be identified to the present invention.
As shown in Fig. 2, the method for table content may comprise steps of in the identification picture file:
Step 201, Target Photo file to be identified is obtained.
Step 202, character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file.
Wherein, character information includes:Character semanteme and character location information.Character location information may include that character exists
First direction coordinate in Target Photo file and second direction coordinate.
In actual use, it can first be one coordinate system of Target Photo document definition, such as Target Photo file
The upper left corner is coordinate origin, is to the right X-axis positive direction by origin, is downwards positive direction of the y-axis.Correspondingly, above-mentioned first
Direction coordinate can be X axis coordinate, and second direction coordinate can be Y axis coordinate;It is sat alternatively, first direction coordinate can be Y-axis
Mark, second direction coordinate can be X axis coordinate, and the present embodiment is not especially limited this.
In the present embodiment, Target Photo file can be picture of any format, such as BMP, TIF, JPG, PDF etc.,
It is not especially limited herein.
A kind of optional realization method, using character recognition technologies in the prior art, such as:OCR technique, to target
Picture file carries out character recognition processing, to obtain the semanteme of the character in Target Photo file and character location information.
Step 203, it will identify that character information carries out matching treatment with default dictionary, to obtain and default dictionary matching degree
More than the target character collection of first threshold.
Optionally, in the present embodiment, character in getting Target Photo file is semantic and character location information it
Afterwards, it identifies that the device of table content in picture file can utilize default dictionary, matching operation is carried out with character information, to obtain
It is more than the target character collection of first threshold to matching degree.
Since in actual application, Target Photo file may be related to any one field, therefore in order to improve
To the accuracy of Target Photo file identification, the present embodiment, can before character information and default dictionary are carried out matching operation
To be analyzed first by the character semanteme to identification, to determine corresponding target dictionary according to the semanteme of character.Also
It is to say, by analyzing character semanteme, according to character semanteme fields, to determine corresponding with above-mentioned field pre-
If dictionary, to effectively promote the identification accuracy to Target Photo file.
For example, if analysis show that character semanteme relates generally to medical domain, then can be corresponding by medical domain
Default dictionary, is determined as target dictionary;In another example if analysis show that character semanteme relates generally to financial field, then can incite somebody to action
The corresponding default dictionary in financial field, is determined as target dictionary.
Further, after determining the corresponding target dictionary of character information, table content in picture file is identified
The character that character information includes can be carried out matching treatment by device with target dictionary, obtain between character and target dictionary
With degree size, then matching degree size is compared with first threshold, and matching degree is more than to the character of first threshold, made
For target character collection.
If for example, after to Target Photo file identification, determine character that Target Photo file includes be " inspection item ",
" albumin ", " weight ", " reference value ", and first threshold are 0.90.After being analyzed by the semanteme to above-mentioned each character,
It determines that the character in the Target Photo file is related to medical domain, and then the dictionary of medical domain can be obtained, then in judgement
The matching degree of the dictionary of each character and medical domain is stated, if the matching degree between " inspection item " and " reference value " and default dictionary
When more than 0.90, then it can determine that " inspection item " and " reference value " is used as target character collection.
Step 204, semantic according to the character in character information, determine the corresponding table style of character information.
Step 205, according to table style, target position information is determined.
In a kind of optional way of realization, since the table style of different field is different, in order to accurately may be used
That leans on gets gauge outfit character, and the present embodiment can be semantic according to the character in character information in Target Photo file first, right
The corresponding table style of character information is determined;Then further according to determining table style, target position information is determined.
That is, when field difference involved by the character semanteme in character information in Target Photo file, target
The table style that picture file includes also differs.For example, if the character semanteme in Target Photo file in character information relates to
And medical domain, it is determined that table style may be as shown in Fig. 3 (a), to can determine that the position of gauge outfit intercharacter is closed
System is that row coordinate is identical.
Step 206, it according to target position information and the corresponding location information of target character collection, concentrates and obtains from target character
Gauge outfit character.
Optionally, after determining target position information, identify that the device of table content in picture file can basis
Target position information and the corresponding location information of target character collection are concentrated from target character and obtain gauge outfit character.
As a kind of optional realization method, when determining that gauge outfit character is that second direction coordinate is identical, and second direction
For Y direction, then the device of table content can be according to the identical rule of Y direction coordinate, in target in identification picture file
Gauge outfit character is filtered out in character set;Alternatively, when determining that gauge outfit character is that first direction coordinate is identical, i.e., first direction is X
When axis direction coordinate is identical, then the device of table content can be according to the identical rule of X-direction coordinate in identification picture file
Then, gauge outfit character is filtered out in target character concentration.
For example, if the location information of target character concentration character and character is respectively:" serial number, (X1, Y1) ", " inspection
Look into project, (X2, Y1) ", " blood pressure, (X2, Y2) ", " inspection result, (X3, Y1) ", " 45, (X3, Y3) ", " reference value, (X4,
Y1) ", and determining target position information is:Y direction coordinate is identical.So identify the device of table content in picture file
It can determine that gauge outfit character is respectively with the identical rule of Y direction coordinate:" serial number, (X1, Y1) ", " inspection item, (X2,
Y1) ", " inspection result, (X3, Y1) " and " reference value, (X4, Y1) ".
Step 207, it according to the location information and semanteme of gauge outfit character, is chosen from character information corresponding with gauge outfit character
The location information and semanteme of content character.
Specifically, after determining gauge outfit character, it is perfect in order to be carried out to the content of table, identify table in picture file
The device of lattice content can according to the location information and semanteme of gauge outfit character, chosen from character information with gauge outfit character position and
Semantic corresponding character as content character, and obtains the location information and semanteme of content character.
In order to clearly illustrate above-mentioned example, with reference to Fig. 4, to above-mentioned location information and language according to gauge outfit character
The process of justice, location information and semanteme that content character corresponding with gauge outfit character is chosen from character information carries out specifically
It is bright.
It should be noted that in the present embodiment, character location information, including character first direction coordinate, second direction
Coordinate.
As shown in figure 4, choosing the location information and semanteme of content character corresponding with gauge outfit character, it may include following step
Suddenly:
Step 401, according to the first direction coordinate or second direction coordinate of any gauge outfit character, the of content character is determined
One direction coordinate range or second direction coordinate range.
It optionally, can be according to gauge outfit word after identifying that the device of table content in picture file determines gauge outfit character
The location information of symbol determines the first direction coordinate range or second direction coordinate range of content character.
In practical applications, due to the device of table content in identification picture file, the position letter of determining gauge outfit character
Breath is therefore corresponding with gauge outfit character in order to accurately obtain there may be error or content character length are irregular
Content character, the present embodiment can determine the first direction coordinate range of content character according to the location information of gauge outfit character
Or second direction coordinate range.For example, coordinate in a first direction can be distinguished on the basis of the location information of gauge outfit character
With on second direction coordinate add an additional range, i.e., when the location information of gauge outfit character be (X, Y) when, determine content word
The coordinate range for according with first direction can be (X- Δs, X+ Δs);Alternatively, being in second direction coordinate range:(Y- Δs, Y+ Δs)
Etc., the present embodiment is not especially limited this.
It is understood that when determining the coordinate range of content character, it can be first according to the position of each gauge outfit character
Relationship determines the position relationship of content character and gauge outfit character, and then determines the corresponding coordinate range of content character again.
For example, if according to the position of gauge outfit character, determine that each gauge outfit character is located at same row, then can determine with
The corresponding content character of gauge outfit character is close with the row coordinate of gauge outfit character, to be sat according to the X-direction of each gauge outfit character
X1 is marked, determines that each content character row coordinate range is (x1- Δs, x1+ Δs).
If correspondingly, according to the position of gauge outfit character, determine that each gauge outfit character is located at same row, then can then determine with
The corresponding content character of gauge outfit character is close with the ordinate of gauge outfit character, to can both be sat according to the Y-direction of each gauge outfit character
Y1 is marked, determines that each content character row coordinate range is (y1- Δs, y1+ Δs).
Step 402, chosen position information meets first direction coordinate range or second direction coordinate model from character information
The primary election character set enclosed.
Step 403, according to the semanteme of any gauge outfit character, from primary election character set, the language with any gauge outfit character is chosen
The matched character of justice is the corresponding content character of any gauge outfit character.
Specifically, obtaining precision to the corresponding content character of gauge outfit character to improve, the present embodiment identifies picture text
The device of table content in part, can also be after getting primary election character set, based on the semanteme of gauge outfit character, to primary election character
Collection is analyzed, from primary election character set, to select the character to match with gauge outfit character semanteme, as gauge outfit character pair
The content character answered.
It is understood that the present embodiment by the location information according to gauge outfit character, first determines the coordinate of content symbol
Position range is in first direction coordinate range or second direction coordinate range to select location information from character information
Primary election character set from primary election character set, selected and gauge outfit character semanteme phase then further according to the semanteme of gauge outfit character
Matched character is effectively increased and is obtained to the corresponding content character of gauge outfit character as the corresponding content character of gauge outfit character
Take accuracy.
Step 208, according to the location information of gauge outfit character and the location information and semanteme of semanteme, content character, mesh is generated
The table that mark picture file includes.
Specifically, in the location information for the gauge outfit character for getting Target Photo file and the position of semanteme, content character
After information and semanteme, identify that the device of table content in picture file can generate Target Photo file according to above- mentioned information
The table for including.
In practical application, due to character may be because alignment thereof etc. in table, a word is divided into two
Farther apart word on the space of a whole page, this is easy for so that above-mentioned word is identified as two independent characters.Such as:" project " quilt
It is identified as " item " and " mesh ";In another example " unit " to be identified as to two independent characters of " list " and " position ";
Alternatively, because there are many word of identical semanteme is understood in table fields, therefore in order to by the word of different semantemes
Language carries out unification, to form relatively uniform structural data, follow-up storage and use, the present embodiment is facilitated to identify picture text
The device of table content in part will have identical language in combination with the modes such as the semantic analysis of gauge outfit character or synonym merging
The word of justice is normalized.For example, physical examination report template includes:" project ", " unit ", " reference value ", " unit " etc.
Four, then can be the words such as " project ", " project name ", " project full name ", " examine project ", " Chinese name " by gauge outfit character
Language is classified as " project ", other and so on.
Further, preset dictionary or the semanteme according to the word after merging can also be utilized, to gauge outfit character and
Content character carries out correction process, if than " project " to be identified as to " item and ", then passing through semantic analysis, it is known that " and "
For wrong word, and then character " item and " can be corrected as " project ".
That is, according to the location information of gauge outfit character and the location information and semanteme of semanteme, content character, mesh is generated
Before the table that mark picture file includes, the present embodiment identifies the device of table content in picture file, using default word
Library, is normalized gauge outfit character and content character and word merging treatment, so that follow-up management and processing more square
Just.
With reference to Fig. 3 (b)-Fig. 3 (e), explanation is further expalined to above-described embodiment:
If Target Photo file is that user is reported using the papery physical examination in different time periods of equipment acquisition in the present embodiment,
So user's papery physical examination is reported in order to realize, establishes corresponding electronic edition physical examination archives, user can be by the papery body of shooting
Inspection report is added in the application for establishing electronic edition physical examination archives, specific such as Fig. 3 (b), then when establishing electronic edition physical examination archives
Application detect user addition physical examination report picture after, using character identification function from physical examination report picture in, determine
Go out the table style of physical examination report and corresponding character content, such as Fig. 3 (c), further user for convenience understands physical examination report
The height of indices, tendency in announcement can also screen the character content of identification, to select the content that result is number,
As Fig. 3 (d) draws the corresponding trend line chart of total bilirubin, tool then according to the numeric results of extraction, such as total bilirubin
Body such as Fig. 3 (e), so that user can be according to the trend line chart of generation, the well-known understanding indices of itself
Whether normal level is in.
The method of table content in identification picture file provided in an embodiment of the present invention, by obtaining mesh to be identified first
Picture file is marked, character recognition processing then is carried out to Target Photo file, to obtain the letter of the character in Target Photo file
Breath, and the character information that will identify that carries out matching treatment with default dictionary, to obtain being more than first with default dictionary matching degree
The target character collection of threshold value, it is then semantic according to the character in character information, determine the corresponding table style of character information, from
And according to table style, determine target position information, and then believe according to target position information and the corresponding position of target character collection
Breath is concentrated from target character and obtains gauge outfit character, then according to the location information and semanteme of gauge outfit character, from character information
The location information and semanteme for choosing content character corresponding with gauge outfit character, to according to the location information and language of gauge outfit character
Justice, the location information and semanteme of content character generate the table that Target Photo file includes.Hereby it is achieved that in picture
Including table fast and accurately identified, not only increase the accuracy of identification, moreover it is possible to reduce identification operation and be spent
Time, to effectively improve the usage experience of user.
By above-mentioned analysis it is found that the embodiment of the present invention by obtain Target Photo file character location information, with root
Gauge outfit character is determined according to character location information, then according to the location information and semanteme of gauge outfit character, is chosen from character information
The location information and semanteme of content character corresponding with gauge outfit character, with according to the location information of gauge outfit character and semanteme, content
The location information and semanteme of character generate the table that Target Photo file includes.In specific implementation, due to identifying picture
The character location information that the device of table content recognizes in file, including character first direction coordinate or second direction sit
There may be errors for mark, so that according to character first direction coordinate or second direction coordinate, determine gauge outfit character or content
When character, in fact it could happen that due to character location information mistake, and lead to the situation of character types identification mistake, therefore, the present invention
It,, can be first to character bit confidence before determining gauge outfit character or content character according to character location information in embodiment
Breath is modified.
In actual use, in the character location information recognized due to the device of table content in identification picture file also
Including character width in a first direction and in the width of second direction.Therefore, the present embodiment is determining whether character is gauge outfit
When character, can also first according to character in a first direction the width of coordinate and second direction coordinate and character in a first direction and
In the width of second direction, character location information is modified, and then further according to revised location information, determines gauge outfit
Character or content character.With reference to Fig. 5, to the present invention identify the above process of the method for table content in picture file into
Row illustrates.
Fig. 5 is the flow according to the method for table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram.
It should be noted that for clearer explanation embodiment, the present embodiment can be first to Target Photo document definition
One coordinate system, for example be X-axis positive direction to the right by origin using the upper left corner of Target Photo file as coordinate origin, to
It is Y axis positive directions down.Correspondingly, first direction coordinate can be defined as to X axis coordinate (i.e. abscissa), second direction coordinate is fixed
Justice is Y axial coordinates (i.e. ordinate), and the embodiment is described in detail according to content defined above to realize.
As shown in figure 5, the method for table content may comprise steps of in the identification picture file:
Step 501, Target Photo file to be identified is obtained.
Step 502, character recognition processing is carried out to Target Photo file, obtains the character information in Target Photo file.
Wherein, character information includes character semanteme and character location information, the character location information, including character is in mesh
First direction coordinate (i.e. X axis coordinate) and second direction coordinate (i.e. Y axis coordinate), character in mark picture file is in first party
To width and in the width of second direction.
Since the realization method of the step and the realization method of above-mentioned example are similar, it is not repeated excessively at this, is had
Body is referring to step 102 or step 202.
Step 503, character information is traversed successively by the ascending sequence of first direction coordinate, is judged j-th
Whether character is identical in the coordinate of second direction as i-th of character, if differing, thens follow the steps 504, otherwise, executes step
508。
Wherein, the difference of the first direction coordinate of j-th of character and i-th of intercharacter adjacent character is in preset range
Interior, i and j are positive integer, and j is more than i.
Preset range in the present embodiment can carry out adaptability setting, for example, in advance according to the location information between actual characters
If range can be determined according to character duration, alternatively, determined according to common character pitch etc., the present embodiment does not make this to have
Body limits.
Optionally, after getting the character information in Target Photo file, table content in picture file is identified
Device can successively be traversed according to the first direction coordinate sequence from small to large of character information, with judge j-th of character with
Whether i-th of character be identical in the coordinate of second direction.If identical, illustrate that each character is in same a line, if differing,
Illustrate that each character is in not go together.
Wherein, j-th of character and i-th of character can be the different character of any two in the character identified, this reality
Example is applied to be not especially limited this.
That is, be independent variable by the way that the X axis coordinate of character in character information to be used as, it is right with ascending sequence
Whether the Y axis coordinate of character is identical to be determined.Wherein, when the Y axis coordinate of character is identical, you can determine that character is in same
A line;When the Y axis coordinate of character differs, you can determine that character is in and do not go together, to realize the position to each coordinate
Whether information is correctly accurately judged.
For example, if the first direction coordinate of character A is X1, second direction coordinate is Y1, and the first direction coordinate of character B is
X2, second direction coordinate are that Y2 then illustrates that character A and character B is in same a line then as Y1=Y2;As Y1 ≠ Y2,
Then illustrate that character A and character B is in not go together.
Step 504, it according to the width of the second direction of the width of the second direction of j-th of character and i-th of character, determines
The registration of jth character and i-th of character in second direction.
When specific implementation, can by formula (1), determine j-th of character and i-th of character second direction registration:
Wherein, ri indicates that i-th of character, yi indicate the second direction coordinate of i-th of character, hi i-th of character of expression
The width of second direction, rj indicate that j-th of character, yj indicate that the second direction coordinate of j-th of character, hj indicate j-th of character
Second direction width.
Step 505, judge j-th of character and i-th of character second direction registration, if be more than third threshold value,
If more than, then follow the steps 506, it is no to then follow the steps 508.
Wherein, third threshold value can rule of thumb carry out value, and the present embodiment is not especially limited this.For example, third threshold
Value could be provided as the half of the minimum value of i-th of character and j-th of character in the width of second direction.
If for example, third threshold value be j-th of character and i-th of character second direction width minimum one
Half, and the coordinate of the second direction of j-th of character is 2, the width of second direction is 1, the seat of the second direction of i-th of character
2.1 are designated as, the width of second direction is 0.9, then being based on above-mentioned formula (1), it may be determined that go out j-th of character and i-th of word
The registration of symbol is more than third threshold value.
Step 506, j-th of character and/or i-th of character are modified in the coordinate of second direction.
Specifically, when determining j-th of character and i-th of character are more than third threshold value in the registration of second direction, then say
Bright jth character and i-th of character are actually in a line.Therefore subsequently character types are judged by accident to reduce
May, j-th of the character and i-th of character of third threshold value are more than for above-mentioned registration, according to j-th of word of alignment schemes pair
Symbol and/or i-th character are modified in the coordinate of second direction, so that j-th of character and i-th of character are in second direction
Coordinate it is identical.
Specifically to j-th of character and/or i-th of character when the coordinate of second direction is modified, repaiied to improve
Positive effect, the present embodiment can be first according to the second direction coordinate of i-th character and the second direction coordinates of j-th of character, really
Target second direction coordinate range is made, is then selected from each character of identification in target second direction coordinate range
Multiple characters, to according to the second direction coordinate of multiple characters of selection, exist to j-th of character and/or i-th of character
The coordinate of second direction is modified.
I.e.:According to the second direction coordinate of i-th character and the second direction coordinate of j-th of character, target second is determined
Direction coordinate range;
It chooses second direction coordinate and belongs to k character in target second direction coordinate range;
According to the second direction coordinate of k character, to j-th of character and/or i-th of character second direction coordinate into
Row is corrected.
It is understood that in the present embodiment, to j-th of character and/or i-th of character second direction coordinate into
Row is corrected, and can be that the coordinate to j-th of character in second direction is modified;Alternatively, to i-th of character in second direction
Coordinate be modified;Alternatively, being modified in the coordinate of second direction to j-th of character and i-th of character, the present embodiment
This is not especially limited.
Step 507, j-th of character is in i-th of character and does not go together.
Step 508, j-th of character and the i-th character are in same a line.
In actual application, since to may result in Target Photo file abnormal for the identifying processing of Target Photo file
Become, or error occur to the identification of character location information in character information, so as to cause to each character whether in second direction
There is error in the identical judgement of coordinate.
In this regard, causing the situation of judging result inaccuracy to occur to reduce drawbacks described above, in the possibility of the present invention
It realizes in scene, identifies that the device of table content in picture file can be selected with j-th of character from character information second
Immediate i-th of the character of coordinate in direction is compared, with determine j-th of character and i-th of character second direction seat
Whether mark is identical.If identical, it is determined that j-th of character and i-th of character are in same a line, if differing, according to j-th
The width of the second direction of the width of the second direction of character and i-th of character determines j-th of character and i-th of character
The registration in two directions, and judge whether registration is more than third threshold value, if more than j-th of character and i-th of word is then illustrated
Symbol is practically at same a line, at this time in order to avoid subsequently being judged by accident to character types identification, can to j-th of character and/or
I-th of character is modified in the coordinate of second direction, so that the coordinate phase of j-th of character and i-th of character in second direction
Together;Otherwise, in not going together.
Step 509, character information and default dictionary are subjected to matching treatment, to obtain being more than the with default dictionary matching degree
The target character collection of one threshold value.
Step 510, semantic according to the character in character information, determine the corresponding table style of character information.
Step 511, according to table style, target position information is determined.
Step 512, it according to target position information and the corresponding location information of target character collection, concentrates and obtains from target character
Gauge outfit character.
Step 513, it according to the location information and semanteme of gauge outfit character, is chosen from character information corresponding with gauge outfit character
The location information and semanteme of content character.
Step 514, according to the location information of gauge outfit character and the location information and semanteme of semanteme, content character, mesh is generated
The table that mark picture file includes.
It should be noted that the specific implementation process and principle of above-mentioned steps 509-514, are referred to above-described embodiment
Detailed description, details are not described herein again.
Likewise, when in identification picture file table content according to character width in a first direction and character the
The coordinate in one direction and the coordinate of second direction, when being modified to the location information of character, with it is above-mentioned according to character second
The width and the coordinate of character first direction and the coordinate of second direction in direction, are modified the location information of character
Process is similar, differs only in:
When carrying out traversing operation to the character information that Target Photo file identification is handled, by second direction coordinate (i.e. Y
Axial coordinate) ascending sequence is traversed successively, and judges the coordinate of j-th of character and i-th of character in a first direction
Whether (i.e. X axial coordinates) be identical.If identical, illustrate that j-th of character and i-th of character are in same row, if differing,
According to the width of the first direction of j-th of character and the width of the first direction of i-th of character, j-th of character and i-th are determined
The registration of a character in a first direction, and determine whether registration is more than third threshold value.
If it is determined that registration is less than third threshold value, then illustrate that j-th of character and i-th of character are in different lines;If it is determined that
When registration is more than third threshold value, then it can determine that j-th of character is actually with the coordinate of i-th of character in a first direction
It is identical, therefore in order to reduce the probability of miscarriage of justice subsequently identified to character types, the present embodiment can be to j-th of character or i-th
The coordinate of character in a first direction is modified, so that the coordinate of j-th of character or i-th of character in a first direction is identical.Tool
The coordinate of j-th of character of body pair or i-th of character in a first direction may include when being modified:According to the of j-th of character
The first direction coordinate of one direction coordinate and i-th character determines target first direction coordinate range;First direction is chosen to sit
Mark belongs to m character in target first direction coordinate range;According to the first direction coordinate of m character, to j-th of character
And/or the coordinate of i-th of character in a first direction is modified.
Wherein, third threshold value can be the half of minimum value in the width of j-th of character and i-th of character in a first direction
Etc., the present embodiment is not especially limited this.
It is understood that in the present embodiment, to j-th of character and/or i-th of character second direction coordinate into
Row is corrected, and can be that the coordinate to j-th of character in second direction is modified;Alternatively, to i-th of character in second direction
Coordinate be modified;Alternatively, being modified in the coordinate of second direction to j-th of character and i-th of character, the present embodiment
This is not especially limited.
The method of table content in identification picture file provided in an embodiment of the present invention, by being carried out to Target Photo file
Character recognition, to obtain first direction coordinate and second direction coordinate of the character in Target Photo file in Target Photo,
And the width of character in a first direction and the width in second direction, then by character according to first direction coordinate by it is small to
Big sequence is traversed, to judge whether j-th of character be identical in the coordinate of second direction as i-th of character, if differing
Then judge j-th of character and i-th of character second direction registration, and judge registration whether be more than threshold value, if more than
Then j-th of character or i-th of character are modified in the coordinate of second direction, then by character information and default dictionary into
Row matching, obtains target character collection, then semantic according to character in character information, determines table style, and according to table sample
Formula determines target position information, according to target position information and the corresponding location information of target character collection, obtains gauge outfit character,
Then according to the location information and semanteme of gauge outfit character, content character corresponding with gauge outfit character is chosen from character information
Location information and semanteme, to which according to the location information and semanteme of gauge outfit character, the location information and semanteme of content character are raw
The table for including at Target Photo file.Hereby it is achieved that the table that picture includes fast and accurately is identified,
Not only increase the accuracy of identification, moreover it is possible to identification operation the time it takes is reduced, to effectively improve the use of user
Experience, and provide advantage for the use of subsequent user.
In the exemplary embodiment, a kind of device identifying table content in picture file is additionally provided.
Fig. 6 is the structure according to the device of table content in the identification picture file shown in an exemplary embodiment of the invention
Schematic diagram.
With reference to shown in Fig. 6, the device of table content includes in identification picture file of the invention:First acquisition module 110,
Processing module 120, matching module 130 and determining module 140.
Wherein, the first acquisition module 110 is for obtaining Target Photo file to be identified;
Processing module 120 is used to carry out character recognition processing to the Target Photo file, obtains the Target Photo text
Character information in part;
Matching module 130 is used for the character information that will identify that and carries out matching treatment with default dictionary, with obtain with it is described
Default dictionary matching degree is more than the gauge outfit character of first threshold;
Determining module 140 is used to, according to the corresponding character information of the gauge outfit character, determine in the Target Photo file
Including table content.
It should be noted that the explanation of the embodiment of the method for table content is also suitable in the aforementioned picture file to identification
The device of table content in the identification picture file of the embodiment, realization principle is similar, and details are not described herein again.
The device of table content in identification picture file provided in an embodiment of the present invention, by obtaining target figure to be identified
Piece file obtains the character information in Target Photo file to carry out character recognition processing to Target Photo file, then will
The character information identified carries out matching treatment with default dictionary, to obtain being more than first threshold with default dictionary matching degree
Gauge outfit character, and then according to the corresponding character information of gauge outfit character, determine the table content that Target Photo file includes.By
This, realizes and is fast and accurately identified to the table that picture includes, not only increase the accuracy of identification, moreover it is possible to subtract
Few identification operation the time it takes, to effectively improve the usage experience of user.
In the exemplary embodiment, a kind of computer equipment is additionally provided.
Fig. 7 is the structural schematic diagram according to the computer equipment shown in an exemplary embodiment.The computer that Fig. 7 is shown is set
A standby only example, should not bring any restrictions to the function and use scope of the embodiment of the present invention.
With reference to Fig. 7, which includes:Memory 210 and processor 220, the memory 210 are stored with
Computer program, when the computer program is executed by processor 220 so that the processor 220 executes following steps:It obtains
Take Target Photo file to be identified;Character recognition processing is carried out to the Target Photo file, obtains the Target Photo text
Character information in part;Wherein, the character information includes character shape, semanteme and character location information;The word that will identify that
It accords with information and carries out matching treatment with default dictionary, to obtain being more than the gauge outfit word of first threshold with the default dictionary matching degree
Symbol;According to the corresponding character information of the gauge outfit character, the table content that the Target Photo file includes is determined.
In one embodiment, the character information includes character semanteme and character location information;It is described obtain with it is described
Default dictionary matching degree is more than the gauge outfit character of first threshold, including:The character information that will identify that and the progress of default dictionary
With processing, to obtain being more than the target character collection of first threshold with the default dictionary matching degree;According in the character information
Character it is semantic, determine the corresponding table style of the character information;According to the table style, target position information is determined;
According to the target position information and the corresponding location information of the target character collection, is concentrated from the target character and obtain table
Head character.
In one embodiment, before the table content that the determination Target Photo file includes, further include:Profit
With the default dictionary, to the gauge outfit character and the content character is normalized and word merging treatment.
In one embodiment, described according to the corresponding character information of the gauge outfit character, determine the Target Photo text
The table content that part includes, including:According to the location information and semanteme of the gauge outfit character, selected from the character information
Take the location information and semanteme of content character corresponding with the gauge outfit character;According to the location information of the gauge outfit character and
The location information and semanteme of semantic, the described content character generate the table that the Target Photo file includes.
In one embodiment, the character location information, including character first direction coordinate, second direction coordinate;Institute
The location information and semanteme that content character corresponding with the gauge outfit character is chosen from the character information are stated, including:According to
The first direction coordinate or second direction coordinate of any gauge outfit character determine in target corresponding with any gauge outfit character
Hold the first direction coordinate range or second direction coordinate range of character;Chosen position information meets from the character information
The primary election character set of the first direction coordinate range or second direction coordinate range;According to the language of any gauge outfit character
Justice, from the primary election character set, it is any gauge outfit word to choose with the character of the semantic matches of any gauge outfit character
Accord with corresponding content character.
In one embodiment, the character information includes character location information, wherein character location information, including word
The width of symbol first direction coordinate, second direction coordinate and character in second direction;It is described to obtain in the Target Photo file
Character information after, further include:Character information is traversed successively by the ascending sequence of first direction coordinate, is sentenced
Whether disconnected j-th of character and i-th of character are identical in the coordinate of second direction, wherein j-th of character and i-th of intercharacter are each
Within a preset range, i and j are positive integer to the difference of the first direction coordinate of adjacent character, and j is more than i;If the jth
A character is different in the coordinate of second direction from i-th of character, then according to the width of the second direction of j-th of character and
The width of the second direction of i-th of character, determine j-th of character and i-th of character second direction weight
It is right;Judge j-th of character and i-th of character second direction registration, if be more than third threshold value;If
It is more than, then j-th of character and/or i-th of character is modified in the coordinate of second direction.
In one embodiment, it is described to j-th of character and/or i-th of character second direction coordinate
Before being modified, further include:It is sat according to the second direction of the second direction coordinate of i-th character and j-th of character
Mark, determines target second direction coordinate range;Second direction coordinate is chosen to belong in the target second direction coordinate range
K character;According to the second direction coordinate of the k character, to j-th of character and/or i-th of character
The coordinate in two directions is modified.
In one embodiment, the character location information further includes the width of first direction;It is described to believe the character
Breath is carried out by the ascending sequence of first direction coordinate after traversing successively, further includes:According to the first party of j-th of character
To coordinate and the first direction coordinate of i-th of character, target first direction coordinate range is determined;First direction is chosen to sit
Mark belongs to m character in the target first direction coordinate range;According to the first direction coordinate of the m character, to institute
The coordinate of j-th of character and/or i-th of character in a first direction is stated to be modified.
In one embodiment, the character information includes character semanteme;The character information that will identify that with it is default
Before dictionary carries out matching treatment, further include:According to character semanteme, target dictionary is determined;The character that will identify that
Information carries out matching treatment with default dictionary, including:The character information identified is matched with the target dictionary
Processing.
In a kind of optional way of realization, as shown in figure 8, the computer equipment 200 can also include:Memory 210
And processor 220, the bus 230 of different components (including memory 210 and processor 220) is connected, memory 210 is stored with
Computer program realizes the cross-domain data transmission method described in the embodiment of the present invention when processor 220 executes described program.
Bus 230 indicates one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.
For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture
(MAC) bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI)
Bus.
Computer equipment 200 typically comprises a variety of computer equipment readable mediums.These media can be it is any can
The usable medium accessed by computer equipment 200, including volatile and non-volatile media, it is moveable and immovable
Medium.
Memory 210 can also include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 240 and/or cache memory 250.Computer equipment 200 may further include it is other it is removable/
Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 260 can be used for
Read and write immovable, non-volatile magnetic media (Fig. 8 do not show, commonly referred to as " hard disk drive ").Although not showing in Fig. 8
Go out, can provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable
The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases,
Each driver can be connected by one or more data media interfaces with bus 230.Memory 210 may include to
There is one group of (for example, at least one) program module, these program modules to be configured to for a few program product, the program product
Execute the function of various embodiments of the present invention.
Program/utility 280 with one group of (at least one) program module 270, can be stored in such as memory
In 210, such program module 270 include --- but being not limited to --- operating system, one or more application program, its
Its program module and program data may include the realization of network environment in each or certain combination in these examples.
Program module 270 usually executes function and/or method in embodiment described in the invention.
Computer equipment 200 can also be with one or more external equipments 290 (such as keyboard, sensing equipment, display
291 etc.) it communicates, the equipment interacted with the computer equipment 200 communication can be also enabled a user to one or more, and/or
With enable any equipment that the computer equipment 200 communicated with one or more of the other computing device (such as network interface card,
Modem etc.) communication.This communication can be carried out by input/output (I/O) interface 292.Also, computer is set
Standby 200 can also pass through network adapter 293 and one or more network (such as LAN (LAN), wide area network (WAN)
And/or public network, such as internet) communication.As shown, network adapter 293 passes through bus 230 and computer equipment
200 other modules communication.It should be understood that although not shown in the drawings, other hardware can be used in conjunction with computer equipment 200
And/or software module, including but not limited to:Microcode, device driver, redundant processing unit, external disk drive array,
RAID systems, tape drive and data backup storage system etc..
It should be noted that the explanation of the embodiment of the method for table content is also suitable in the aforementioned picture file to identification
In the computer equipment of the embodiment, realization principle is similar, and details are not described herein again.
Computer equipment provided in an embodiment of the present invention, by obtaining Target Photo file to be identified, with to target figure
Piece file carry out character recognition processing, obtain the character information in Target Photo file, the character information that then will identify that with
Default dictionary carries out matching treatment, to obtain being more than the gauge outfit character of first threshold with default dictionary matching degree, and then according to table
The corresponding character information of head character, determines the table content that Target Photo file includes.Hereby it is achieved that being wrapped in picture
The table included is fast and accurately identified, the accuracy of identification is not only increased, moreover it is possible to when reducing identification and operating spent
Between, to effectively improve the usage experience of user.
In the exemplary embodiment, the invention also provides a kind of computer readable storage mediums.
Above computer readable storage medium storing program for executing, is stored thereon with computer program, when which is executed by processor, realizes
The method of table content in the identification picture file.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot
It is interpreted as indicating or implies relative importance or implicitly indicate the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include one or more this feature.In the description of the present invention,
The meaning of " plurality " is two or more, unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means that specific features described in conjunction with this embodiment or example or feature are contained in this
In at least one embodiment or example of invention.In the present specification, schematic expression of the above terms are necessarily directed to
It is identical embodiment or example.Moreover, the specific features or feature of description in any one or more embodiments or can show
It can be combined in any suitable manner in example.In addition, without conflicting with each other, those skilled in the art can illustrate this
The feature of different embodiments or examples and different embodiments or examples described in book is combined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, segment or
Part, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussion
Sequentially, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be by this
The embodiment person of ordinary skill in the field of invention is understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (system of such as computer based system including processor or other can be from instruction
Execute system, device or equipment instruction fetch and the system that executes instruction) use, or combine these instruction execution systems, device or
Equipment and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating
Or transmission program uses for instruction execution system, device or equipment or in conjunction with these instruction execution systems, device or equipment
Device.The more specific example (non-exhaustive list) of computer-readable medium includes following:It is connected up with one or more
Electrical connection section (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk are read-only
Memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other conjunctions
Suitable medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or necessity
When handled with other suitable methods electronically to obtain described program, be then stored in computer storage
In.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, multiple steps or method can in memory and by suitable instruction execution system be executed soft with storage
Part or firmware are realized.It, and in another embodiment, can be with well known in the art for example, if realized with hardware
Any one of following technology or their combination are realized:With the logic gate for realizing logic function to data-signal
The discrete logic of circuit, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA),
Field programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage
In medium, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also
That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould
The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a calculating
In machine read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention
System, those skilled in the art above-described embodiment can be changed, be changed within the scope of the invention, replaced and
Modification.
Claims (12)
1. a kind of method of table content in identification picture file, which is characterized in that including:
Obtain Target Photo file to be identified;
Character recognition processing is carried out to the Target Photo file, obtains the character information in the Target Photo file;
The character information that will identify that carries out matching treatment with default dictionary, to obtain being more than the with the default dictionary matching degree
The gauge outfit character of one threshold value;
According to the corresponding character information of the gauge outfit character, the table content that the Target Photo file includes is determined.
2. the method as described in claim 1, which is characterized in that the character information includes character semanteme and character bit confidence
Breath;
The gauge outfit character for obtaining being more than first threshold with the default dictionary matching degree, including:
The character information that will identify that carries out matching treatment with default dictionary, to obtain being more than the with the default dictionary matching degree
The target character collection of one threshold value;
It is semantic according to the character in the character information, determine the corresponding table style of the character information;
According to the table style, target position information is determined;
According to the target position information and the corresponding location information of the target character collection, concentrates and obtain from the target character
Gauge outfit character.
3. method as claimed in claim 2, which is characterized in that in the table that the determination Target Photo file includes
Before appearance, further include:
Using the default dictionary, to the gauge outfit character and the content character is normalized and word merging treatment.
4. method as claimed in claim 2, which is characterized in that it is described according to the corresponding character information of the gauge outfit character, really
The table content that the fixed Target Photo file includes, including:
According to the location information and semanteme of the gauge outfit character, chosen from the character information corresponding with the gauge outfit character
The location information and semanteme of content character;
According to the location information of the gauge outfit character and the location information and semanteme of semanteme, the content character, the mesh is generated
The table that mark picture file includes.
5. method as claimed in claim 4, which is characterized in that the character location information, including character first direction coordinate,
Second direction coordinate;
The location information and semanteme that content character corresponding with the gauge outfit character is chosen from the character information, packet
It includes:
According to the first direction coordinate or second direction coordinate of any gauge outfit character, determination is corresponding with any gauge outfit character
The first direction coordinate range or second direction coordinate range of object content character;
Chosen position information meets the first direction coordinate range or second direction coordinate range from the character information
Primary election character set;
According to the semanteme of any gauge outfit character, from the primary election character set, the language with any gauge outfit character is chosen
The matched character of justice is the corresponding content character of any gauge outfit character.
6. the method as described in claim 1, which is characterized in that the character information includes character location information, wherein character
Location information, including character first direction coordinate, second direction coordinate and character are in the width of second direction;
It is described obtain the character information in the Target Photo file after, further include:
Character information is traversed successively by the ascending sequence of first direction coordinate, judges j-th of character and i-th of word
Whether symbol is identical in the coordinate of second direction, wherein the first direction of j-th of character and each adjacent character of i-th of intercharacter is sat
Within a preset range, i and j are positive integer to target difference, and j is more than i;
If j-th of character is different in the coordinate of second direction from i-th of character, according to the second of j-th of character
The width of the second direction of the width in direction and i-th of character determines that j-th of character exists with i-th of character
The registration of second direction;
Judge j-th of character and i-th of character second direction registration, if be more than third threshold value;
If more than being then modified in the coordinate of second direction to j-th of character and/or i-th of character.
7. method as claimed in claim 6, which is characterized in that described to j-th of character and/or i-th of character
Before the coordinate of second direction is modified, further include:
According to the second direction coordinate of i-th character and the second direction coordinate of j-th of character, target second direction is determined
Coordinate range;
It chooses second direction coordinate and belongs to k character in the target second direction coordinate range;
According to the second direction coordinate of the k character, to j-th of character and/or i-th of character in second direction
Coordinate be modified.
8. method as claimed in claim 6, which is characterized in that the character location information further includes the width of first direction;
It is described to carry out the character information after traversing successively by the ascending sequence of first direction coordinate, further include:
According to the first direction coordinate of j-th character and the first direction coordinate of i-th of character, target first direction is determined
Coordinate range;
It chooses first direction coordinate and belongs to m character in the target first direction coordinate range;
According to the first direction coordinate of the m character, in a first direction to j-th of character and/or i-th of character
Coordinate be modified.
9. the method as described in claim 1-8 is any, which is characterized in that the character information includes character semanteme;
Before the character information that will identify that carries out matching treatment with default dictionary, further include:
According to character semanteme, target dictionary is determined;
The character information that will identify that carries out matching treatment with default dictionary, including:
The character information identified and the target dictionary are subjected to matching treatment.
10. the device of table content in a kind of identification picture file, which is characterized in that including:
First acquisition module, for obtaining Target Photo file to be identified;
Processing module is obtained for carrying out character recognition processing to the Target Photo file in the Target Photo file
Character information;
Matching module, the character information for will identify that carries out matching treatment with default dictionary, to obtain and the default word
Storehouse matching degree is more than the gauge outfit character of first threshold;
Determining module, for according to the corresponding character information of the gauge outfit character, determining that the Target Photo file includes
Table content.
11. a kind of computer equipment, which is characterized in that including:Memory and processor, the memory are stored with computer journey
Sequence, which is characterized in that when the processor executes described program, realize the identification picture as described in claim 1-9 is any
The method of table content in file.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
When execution, the method for identifying table content in picture file as described in claim 1-9 is any is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810285135.5A CN108734089B (en) | 2018-04-02 | 2018-04-02 | Method, device, equipment and storage medium for identifying table content in picture file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810285135.5A CN108734089B (en) | 2018-04-02 | 2018-04-02 | Method, device, equipment and storage medium for identifying table content in picture file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108734089A true CN108734089A (en) | 2018-11-02 |
CN108734089B CN108734089B (en) | 2023-04-18 |
Family
ID=63940603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810285135.5A Active CN108734089B (en) | 2018-04-02 | 2018-04-02 | Method, device, equipment and storage medium for identifying table content in picture file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108734089B (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726643A (en) * | 2018-12-13 | 2019-05-07 | 北京金山数字娱乐科技有限公司 | The recognition methods of form data, device, electronic equipment and storage medium in image |
CN109740135A (en) * | 2018-12-19 | 2019-05-10 | 平安普惠企业管理有限公司 | Chart generation method and device, electronic equipment and storage medium |
CN109871524A (en) * | 2019-02-21 | 2019-06-11 | 腾讯科技(深圳)有限公司 | A kind of chart generation method and device |
CN110059688A (en) * | 2019-03-19 | 2019-07-26 | 平安科技(深圳)有限公司 | Pictorial information recognition methods, device, computer equipment and storage medium |
CN110147774A (en) * | 2019-05-23 | 2019-08-20 | 阳光保险集团股份有限公司 | Sheet format picture printed page analysis method and computer storage medium |
CN110287854A (en) * | 2019-06-20 | 2019-09-27 | 北京百度网讯科技有限公司 | Extracting method, device, computer equipment and the storage medium of table |
WO2020098078A1 (en) * | 2018-11-12 | 2020-05-22 | 平安科技(深圳)有限公司 | Method and apparatus for generating ocr training sample, device and readable storage medium |
CN111507230A (en) * | 2020-04-11 | 2020-08-07 | 创景未来(北京)科技有限公司 | Method and system for identifying and extracting document and table data |
CN111683285A (en) * | 2020-08-11 | 2020-09-18 | 腾讯科技(深圳)有限公司 | File content identification method and device, computer equipment and storage medium |
CN111898528A (en) * | 2020-07-29 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer readable medium and electronic equipment |
WO2020232866A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Scanned text segmentation method and apparatus, computer device and storage medium |
CN112115111A (en) * | 2019-06-20 | 2020-12-22 | 上海怀若智能科技有限公司 | OCR-based document version management method and system |
CN112395830A (en) * | 2019-07-31 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Form processing method based on Wan Guo code and related device |
WO2021042507A1 (en) * | 2019-09-02 | 2021-03-11 | 苏州朗动网络科技有限公司 | Method and device for extracting table data from pdf file, and storage medium |
CN112507909A (en) * | 2020-12-15 | 2021-03-16 | 信号旗智能科技(上海)有限公司 | Document data extraction method, device, equipment and medium based on OCR recognition |
CN112509661A (en) * | 2021-02-03 | 2021-03-16 | 南京吉拉福网络科技有限公司 | Methods, computing devices, and media for identifying physical examination reports |
WO2021072885A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Method and apparatus for recognizing text, device and storage medium |
WO2021147222A1 (en) * | 2020-01-22 | 2021-07-29 | 平安科技(深圳)有限公司 | Ocr-based table layout restoration method and device, electronic apparatus, and storage medium |
CN113449559A (en) * | 2020-03-26 | 2021-09-28 | 顺丰科技有限公司 | Table identification method and device, computer equipment and storage medium |
CN113504863A (en) * | 2021-06-02 | 2021-10-15 | 珠海金山办公软件有限公司 | Method and device for realizing picture screening, computer storage medium and terminal |
CN113723301A (en) * | 2021-08-31 | 2021-11-30 | 广州新丝路信息科技有限公司 | Imported goods customs clearance list OCR recognition branch processing method and device |
CN115019320A (en) * | 2022-06-30 | 2022-09-06 | 京东方科技集团股份有限公司 | Data extraction method, device, equipment and storage medium |
CN116127928A (en) * | 2023-04-17 | 2023-05-16 | 广东粤港澳大湾区国家纳米科技创新研究院 | Table data identification method and device, storage medium and computer equipment |
CN116156212A (en) * | 2023-02-21 | 2023-05-23 | 广州虎牙科技有限公司 | Live video processing method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184265A (en) * | 2015-09-14 | 2015-12-23 | 哈尔滨工业大学 | Self-learning-based handwritten form numeric character string rapid recognition method |
JP2016009223A (en) * | 2014-06-23 | 2016-01-18 | 株式会社日立情報通信エンジニアリング | Optica character recognition device and optical character recognition method |
US20160055376A1 (en) * | 2014-06-21 | 2016-02-25 | iQG DBA iQGATEWAY LLC | Method and system for identification and extraction of data from structured documents |
-
2018
- 2018-04-02 CN CN201810285135.5A patent/CN108734089B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160055376A1 (en) * | 2014-06-21 | 2016-02-25 | iQG DBA iQGATEWAY LLC | Method and system for identification and extraction of data from structured documents |
JP2016009223A (en) * | 2014-06-23 | 2016-01-18 | 株式会社日立情報通信エンジニアリング | Optica character recognition device and optical character recognition method |
CN105184265A (en) * | 2015-09-14 | 2015-12-23 | 哈尔滨工业大学 | Self-learning-based handwritten form numeric character string rapid recognition method |
Non-Patent Citations (1)
Title |
---|
仲小挺: ""基于自学习的手写表格数字字符串快速识别方法的研究"" * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020098078A1 (en) * | 2018-11-12 | 2020-05-22 | 平安科技(深圳)有限公司 | Method and apparatus for generating ocr training sample, device and readable storage medium |
CN109726643A (en) * | 2018-12-13 | 2019-05-07 | 北京金山数字娱乐科技有限公司 | The recognition methods of form data, device, electronic equipment and storage medium in image |
CN112818812A (en) * | 2018-12-13 | 2021-05-18 | 北京金山数字娱乐科技有限公司 | Method and device for identifying table information in image, electronic equipment and storage medium |
CN112818812B (en) * | 2018-12-13 | 2024-03-12 | 北京金山数字娱乐科技有限公司 | Identification method and device for table information in image, electronic equipment and storage medium |
CN109740135A (en) * | 2018-12-19 | 2019-05-10 | 平安普惠企业管理有限公司 | Chart generation method and device, electronic equipment and storage medium |
CN109871524A (en) * | 2019-02-21 | 2019-06-11 | 腾讯科技(深圳)有限公司 | A kind of chart generation method and device |
CN110059688A (en) * | 2019-03-19 | 2019-07-26 | 平安科技(深圳)有限公司 | Pictorial information recognition methods, device, computer equipment and storage medium |
CN110059688B (en) * | 2019-03-19 | 2024-05-28 | 平安科技(深圳)有限公司 | Picture information identification method, device, computer equipment and storage medium |
WO2020232866A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Scanned text segmentation method and apparatus, computer device and storage medium |
CN110147774A (en) * | 2019-05-23 | 2019-08-20 | 阳光保险集团股份有限公司 | Sheet format picture printed page analysis method and computer storage medium |
CN110147774B (en) * | 2019-05-23 | 2021-06-15 | 阳光保险集团股份有限公司 | Table format picture layout analysis method and computer storage medium |
CN110287854B (en) * | 2019-06-20 | 2022-06-10 | 北京百度网讯科技有限公司 | Table extraction method and device, computer equipment and storage medium |
CN112115111A (en) * | 2019-06-20 | 2020-12-22 | 上海怀若智能科技有限公司 | OCR-based document version management method and system |
CN110287854A (en) * | 2019-06-20 | 2019-09-27 | 北京百度网讯科技有限公司 | Extracting method, device, computer equipment and the storage medium of table |
CN112395830A (en) * | 2019-07-31 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Form processing method based on Wan Guo code and related device |
WO2021042507A1 (en) * | 2019-09-02 | 2021-03-11 | 苏州朗动网络科技有限公司 | Method and device for extracting table data from pdf file, and storage medium |
WO2021072885A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Method and apparatus for recognizing text, device and storage medium |
WO2021147222A1 (en) * | 2020-01-22 | 2021-07-29 | 平安科技(深圳)有限公司 | Ocr-based table layout restoration method and device, electronic apparatus, and storage medium |
CN113449559A (en) * | 2020-03-26 | 2021-09-28 | 顺丰科技有限公司 | Table identification method and device, computer equipment and storage medium |
CN111507230A (en) * | 2020-04-11 | 2020-08-07 | 创景未来(北京)科技有限公司 | Method and system for identifying and extracting document and table data |
CN111898528A (en) * | 2020-07-29 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer readable medium and electronic equipment |
CN111898528B (en) * | 2020-07-29 | 2023-11-10 | 腾讯科技(深圳)有限公司 | Data processing method, device, computer readable medium and electronic equipment |
CN111683285A (en) * | 2020-08-11 | 2020-09-18 | 腾讯科技(深圳)有限公司 | File content identification method and device, computer equipment and storage medium |
CN112507909A (en) * | 2020-12-15 | 2021-03-16 | 信号旗智能科技(上海)有限公司 | Document data extraction method, device, equipment and medium based on OCR recognition |
CN112509661A (en) * | 2021-02-03 | 2021-03-16 | 南京吉拉福网络科技有限公司 | Methods, computing devices, and media for identifying physical examination reports |
CN112509661B (en) * | 2021-02-03 | 2021-05-25 | 南京吉拉福网络科技有限公司 | Methods, computing devices, and media for identifying physical examination reports |
CN113504863A (en) * | 2021-06-02 | 2021-10-15 | 珠海金山办公软件有限公司 | Method and device for realizing picture screening, computer storage medium and terminal |
CN113723301A (en) * | 2021-08-31 | 2021-11-30 | 广州新丝路信息科技有限公司 | Imported goods customs clearance list OCR recognition branch processing method and device |
CN113723301B (en) * | 2021-08-31 | 2024-08-30 | 广州新丝路信息科技有限公司 | OCR recognition and branch processing method and device for import goods customs declaration |
CN115019320A (en) * | 2022-06-30 | 2022-09-06 | 京东方科技集团股份有限公司 | Data extraction method, device, equipment and storage medium |
CN115019320B (en) * | 2022-06-30 | 2024-10-18 | 京东方科技集团股份有限公司 | Data extraction method, device, equipment and storage medium |
CN116156212A (en) * | 2023-02-21 | 2023-05-23 | 广州虎牙科技有限公司 | Live video processing method and system |
CN116127928A (en) * | 2023-04-17 | 2023-05-16 | 广东粤港澳大湾区国家纳米科技创新研究院 | Table data identification method and device, storage medium and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108734089B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108734089A (en) | Identify method, apparatus, equipment and the storage medium of table content in picture file | |
US10482174B1 (en) | Systems and methods for identifying form fields | |
CN112185520B (en) | Text structuring processing system and method for medical pathology report picture | |
US10489644B2 (en) | System and method for automatic detection and verification of optical character recognition data | |
US11816138B2 (en) | Systems and methods for parsing log files using classification and a plurality of neural networks | |
US10489645B2 (en) | System and method for automatic detection and verification of optical character recognition data | |
WO2019075820A1 (en) | Test paper reviewing system | |
US9081412B2 (en) | System and method for using paper as an interface to computer applications | |
US9286526B1 (en) | Cohort-based learning from user edits | |
WO2023279045A1 (en) | Ai-augmented auditing platform including techniques for automated document processing | |
CN112749547A (en) | Generation of text classifier training data | |
CN109783796A (en) | Predict that the pattern in content of text destroys | |
US11315353B1 (en) | Systems and methods for spatial-aware information extraction from electronic source documents | |
CN111090641A (en) | Data processing method and device, electronic equipment and storage medium | |
CN112509661B (en) | Methods, computing devices, and media for identifying physical examination reports | |
CN108170468A (en) | The method and its system of a kind of automatic detection annotation and code consistency | |
CN108597565A (en) | It is a kind of that method of calibration is cooperateed with the clinical queuing data of name entity extraction technology based on OCR | |
CN106529381A (en) | Information processing apparatus and information processing method | |
JP2019212115A (en) | Inspection device, inspection method, program, and learning device | |
CN115937887A (en) | Method and device for extracting document structured information, electronic equipment and storage medium | |
RU2702967C1 (en) | Method and system for checking an electronic set of documents | |
CN111008624A (en) | Optical character recognition method and method for generating training sample for optical character recognition | |
CN112308048B (en) | Medical record integrity judging method, device and system based on small quantity of marked data | |
EP3640861A1 (en) | Systems and methods for parsing log files using classification and a plurality of neural networks | |
US20220392243A1 (en) | Method for training text classification model, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |