CN113191345A - Text line direction determining method and related equipment thereof - Google Patents
Text line direction determining method and related equipment thereof Download PDFInfo
- Publication number
- CN113191345A CN113191345A CN202110468072.9A CN202110468072A CN113191345A CN 113191345 A CN113191345 A CN 113191345A CN 202110468072 A CN202110468072 A CN 202110468072A CN 113191345 A CN113191345 A CN 113191345A
- Authority
- CN
- China
- Prior art keywords
- text line
- processed
- area
- center
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 76
- 239000013598 vector Substances 0.000 claims description 70
- 238000004590 computer program Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012015 optical character recognition Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/242—Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
Abstract
After an image to be processed is obtained, determining a global area of a text line to be processed and a local area of the text line to be processed in the image to be processed, so that the global area of the text line to be processed represents an area occupied by the text line to be processed in the image to be processed, and the local area of the text line to be processed represents an area occupied by a preset part of the text line to be processed in the image to be processed; and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed. Therefore, the text line direction of the text line in the image can be accurately determined, and the character recognition accuracy of the text line in the image is favorably improved.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method for determining a text line direction and a related device.
Background
Character recognition (e.g., optical character recognition) refers to performing recognition processing on characters in an image to obtain textual information carried in the image.
In fact, the text line in the image can be presented in any direction (for example, rotated 90 ° clockwise, etc.), so in order to improve the accuracy of character recognition, the text line direction of the text line can be referred to when character recognition is performed on the text line in the image (in particular, the text line with a certain rotation angle). The text line direction refers to rotation information of a text line in the image.
However, how to determine the text direction becomes an urgent technical problem to be solved.
Disclosure of Invention
In order to solve the technical problems in the prior art, the application provides a text line direction determining method and related equipment thereof, which can accurately determine the text line direction of a text line in an image, and thus are beneficial to improving the character recognition accuracy of the text line in the image.
In order to achieve the above purpose, the technical solutions provided in the embodiments of the present application are as follows:
the embodiment of the application provides a text line direction determining method, which comprises the following steps: acquiring an image to be processed; determining a global area of a text line to be processed and a local area of the text line to be processed in the image to be processed; the global area of the text line to be processed represents the area occupied by the text line to be processed in the image to be processed; the local area of the text line to be processed represents the area occupied by the preset part of the text line to be processed in the image to be processed; and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed.
An embodiment of the present application further provides a device for determining a direction of a text line, where the device includes:
the image acquisition unit is used for acquiring an image to be processed;
the region determining unit is used for determining a global region of a text line to be processed and a local region of the text line to be processed in the image to be processed; the global area of the text line to be processed represents the area occupied by the text line to be processed in the image to be processed; the local area of the text line to be processed represents the area occupied by the preset part of the text line to be processed in the image to be processed;
and the direction determining unit is used for comparing the positions of the global area of the text line to be processed and the local area of the text line to be processed to obtain the text line direction of the text line to be processed.
An embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:
the memory is used for storing a computer program;
the processor is configured to execute any implementation manner of the text line direction determination method provided by the embodiment of the application according to the computer program.
An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation manner of the text line direction determining method provided in the embodiment of the present application.
The embodiment of the present application further provides a computer program product, and when the computer program product runs on a terminal device, the terminal device is enabled to execute any implementation manner of the text line direction determination method provided by the embodiment of the present application.
Compared with the prior art, the embodiment of the application has at least the following advantages:
in the text line direction determining method and the related device provided by the embodiment of the application, after an image to be processed is acquired, a global area of a text line to be processed in the image to be processed and a local area of the text line to be processed are determined, so that the global area of the text line to be processed represents an area occupied by the text line to be processed in the image to be processed, and the local area of the text line to be processed represents an area occupied by a preset part of the text line to be processed in the image to be processed; and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed.
The global area and the local area of the text line to be processed can respectively represent the area of the text line to be processed in the image to be processed and the area of the preset part of the text line to be processed in the image to be processed, so that the global area and the local area of the text line to be processed can comprehensively represent the presentation mode of the text line to be processed in the image to be processed, the presentation mode of the text line to be processed in the image to be processed can be accurately described according to the text line direction of the text line to be processed determined based on the global area and the local area of the text line to be processed, the text line direction of the text line in the image can be accurately determined, and the character recognition accuracy of the text line in the image can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic diagram of an image, a global area corresponding to the image, and a local area corresponding to the image provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an original image, a global mask image and a local mask image corresponding to the original image according to an embodiment of the present disclosure;
fig. 3 is a flowchart of a text line direction determining method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a directed connection provided in an embodiment of the present application;
FIG. 5 is a characteristic diagram of a rotation of a transverse text line according to an embodiment of the present application;
FIG. 6 is a characteristic diagram of a rotation manner of a vertical text line provided by an embodiment of the present application;
FIG. 7 is a characteristic diagram of another rotation manner of a horizontal text line according to an embodiment of the present application;
FIG. 8 is a characteristic diagram of another rotation of a vertical text line provided by an embodiment of the present application;
fig. 9 is a schematic structural diagram of a text line direction determining apparatus according to an embodiment of the present application.
Detailed Description
The inventors found in the study on the direction of the text line that, when the text line is presented in different directions of the text line in the image, the region occupied by the text line in the image (the global region shown in fig. 1) and the region occupied by the preset portion of the text line in the image (the first character region or the character string local region shown in fig. 1) are presented in different relative relationships (as shown in fig. 2).
Note that, in fig. 2, "text line-1" is a lateral text line that is presented in the positive direction (i.e., the rotation angle is 0 °); "text line-2" is a horizontal line of text rendered with a counterclockwise rotation by a first angle between 0 ° and 90 °; "text line-3" is a horizontal text line that is rendered as rotated 180 degrees clockwise or 180 degrees counterclockwise; "text line-4" is a horizontal text line presented as rotated 90 ° clockwise; "text line-5" is a horizontal text line presented with a 90 ° counterclockwise rotation; "text line-6" is a vertical text line presented in the positive direction (i.e., a rotation angle of 0 °); "text line-7" is a vertical line of text rendered with a clockwise rotation of 180 ° or a counterclockwise rotation of 180 °. Wherein, the character arrangement mode of the horizontal text line is horizontal; and the arrangement mode of the characters of the vertical text lines is vertical.
Based on the above findings, an embodiment of the present application provides a method for determining a direction of a text line, where the method may include: acquiring an image to be processed; determining a global area of a text line to be processed and a local area of the text line to be processed in the image to be processed; and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed. The global area and the local area of the text line to be processed can respectively represent the area of the text line to be processed in the image to be processed and the area of the preset part of the text line to be processed in the image to be processed, so that the global area and the local area of the text line to be processed can comprehensively represent the presentation mode of the text line to be processed in the image to be processed, the presentation mode of the text line to be processed in the image to be processed can be accurately described according to the text line direction of the text line to be processed determined based on the global area and the local area of the text line to be processed, the text line direction of the text line in the image to be processed can be accurately determined, and the character recognition accuracy of the text line in the image can be improved.
In addition, the embodiment of the present application does not limit the execution subject of the text line direction determination method, and for example, the text line direction determination method provided by the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like. The server may be a stand-alone server, a cluster server, or a cloud server.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Method embodiment
Referring to fig. 3, this figure is a flowchart of a text line direction determining method provided in the embodiment of the present application.
The text line direction determining method provided by the embodiment of the application comprises the following steps of S1-S3:
s1: and acquiring an image to be processed.
The image to be processed refers to an image which needs to be subjected to text line direction determination processing. The text line direction is used for describing the presentation mode of one text line in the image; moreover, the text line direction is not limited in the embodiments of the present application, for example, the text line direction may include a text arrangement manner and/or a text line rotation manner.
The character arrangement mode is used for describing the arrangement mode of all characters in one text line; the present embodiment does not limit the character arrangement, and for example, the character arrangement may be horizontal (as shown in "text line-1" in fig. 2) or vertical (as shown in "text line-6" in fig. 2). Wherein, the horizontal row is used for representing that all characters in one text line are arranged according to the horizontal direction. The vertical arrangement is used to indicate that all characters in a text line are arranged in the vertical direction.
The text line rotation mode is used for describing rotation information of one text line in the image; moreover, the text line rotation manner is not limited in the embodiments of the present application, and for example, the text line rotation manner may include a rotation direction and/or a rotation angle.
In addition, the embodiment of the present application does not limit the text arrangement manner of the text line in the image to be processed, for example, for the nth text line in the image to be processed, the text arrangement manner of the nth text line may be horizontal, so that the nth text line belongs to the horizontal text line; or, the text arrangement mode of the nth text line may be vertical, so that the nth text line belongs to the vertical text line. In addition, the text line rotation mode of the text line in the image to be processed is not limited in the embodiment of the present application, for example, for the nth text line in the image to be processed, the text line rotation mode of the nth text line may be a positive direction (that is, rotation by 0 °); or the text line of the nth text line can be rotated by a second angle in a clockwise manner, and the second angle belongs to (0 °, 180 °); or the text line rotation mode of the nth text line can be that the third angle is rotated counterclockwise, and the third angle belongs to (0 degrees, 180 degrees); alternatively, the text line of the nth text line may be rotated 180 ° (i.e., 180 ° in a counterclockwise or clockwise manner). Wherein N is a positive integer, N is less than or equal to N, and N represents the number of text lines in the image to be processed.
S2: determining a global area of a text line to be processed in an image to be processed and a local area of the text line to be processed.
The text lines to be processed are used for representing any text line in the image to be processed. For example, if there are N text lines in the image to be processed, the 1 st text line, the 2 nd text line, … …, and the nth text line in the image to be processed may be determined as the text lines to be processed, respectively.
The global area of the text line to be processed represents the area occupied by the text line to be processed in the image to be processed.
The local area of the text line to be processed represents the area occupied by the preset part of the text line to be processed in the image to be processed.
The preset part can be preset according to an application scene. For example, the predetermined portion may include a first character portion (e.g., a portion corresponding to the mask region of each text line in the third mask map in fig. 2). For another example, the preset portion of the horizontal text line may include a bottom boundary portion of the horizontal text line (e.g., a portion corresponding to a mask region of "text line-1" in the second mask diagram in fig. 2); also, the preset portion of the vertical text line may include a central portion of the vertical text line (e.g., a portion corresponding to the mask region of "text line-6" in the second mask diagram in fig. 2).
Based on the relevant content of the above-mentioned "preset portion", the local area may include an initial character area and/or a character string local area. Wherein the first character area represents the area within the image occupied by the first character of a line of text in the image. The character string local area is used for representing the area occupied by the preset local area of each character in one text line in the image.
In addition, the preset local area can be preset, and especially can be set according to the character arrangement mode of one text line. For example, if the text arrangement manner of a text line is horizontal, the preset local area of each character in the text line may be the bottom boundary area of each character; if the text arrangement mode of a text line is vertical, the preset local area of each character in the text line may be the central area of each character (especially, the central area can vertically penetrate through the character).
Based on the above-mentioned related content of the "local area", the local area of the text line to be processed may include the first character area of the text line to be processed and/or the character string local area of the text line to be processed. The first character area of the text line to be processed refers to the area of the first character in the text line to be processed in the image to be processed. The local region of the character string of the text line to be processed is obtained by connecting preset local regions of each character in the text line to be processed, and may specifically include: if the character arrangement mode of the text line to be processed is horizontal, the local area of the character string of the text line to be processed is obtained by connecting the bottom boundary areas of all characters in the text line to be processed; if the text arrangement mode of the text line to be processed is vertical, the local area of the character string of the text line to be processed is obtained by connecting the central areas of all the characters in the text line to be processed.
In addition, the embodiments of the present application are not limited to the implementation of S2, for example, in one possible implementation, S2 may specifically include S21 to S23:
s21: inputting the image to be processed into a pre-constructed mask image generation model to obtain a predicted text line global mask image corresponding to the image to be processed and a predicted text line local mask image corresponding to the image to be processed, which are output by the mask image generation model. The global mask image of the predicted text line corresponding to the image to be processed comprises a global mask area corresponding to the text line to be processed; the local mask image of the predicted text line corresponding to the image to be processed comprises a local mask region corresponding to the text line to be processed.
The mask map generation model is used for predicting a global mask map and a local mask map of a model input image. The "model input image" is an image (original image shown in fig. 2) input to the mask map generation model.
The global mask map is obtained by performing mask processing on the region occupied by the text line in the model input image, so that the global mask map can represent the region occupied by the text line in the model input image.
The local mask map is obtained by performing mask processing on the region occupied by the preset part of the text line in the model input image, so that the local mask map can represent the region occupied by the preset part of the text line in the model input image. For the content of the "predetermined portion", please refer to the above.
In addition, the partial mask map is not limited in the embodiments of the present application, for example, the partial mask map may include an initial character mask subgraph (e.g., the third mask map shown in fig. 2) and/or a character string partial mask subgraph (e.g., the second mask map shown in fig. 2). The first character mask subgraph is obtained by performing mask processing on the area occupied by the first character of the text line in the model input image, so that the first character mask subgraph can represent the area occupied by the first character of the text line in the model input image. The character string local mask subgraph is obtained by performing mask processing on the occupied area of the local part of the character string of the text line in the model input image, so that the character string local mask subgraph represents the occupied area of the local part of the character string of the text line in the model input image.
It should be noted that the partial portion of the character string may be set in advance, and particularly, may be set according to the character arrangement manner of one text line. For example, the character string partial portion of a horizontal line of text may be the bottom boundary portion of the horizontal line of text. As another example, the character string partial portion of the vertical text line may be a center region of the vertical text line (particularly, a center region that vertically intersects the vertical text line).
In addition, the embodiment of the present application is not limited to the model structure of the mask map generation model, and for example, the mask map generation model may be implemented by using the model structure of any deep learning model (e.g., a semantic segmentation model based on deep learning).
In addition, the mask map generation model may be constructed according to the sample image, the actual text line global mask map corresponding to the sample image, and the actual text line local mask map corresponding to the sample image. The sample image is an image required to be used for constructing a mask map generation model; the actual text line global mask image corresponding to the sample image is used for representing the actual occupied area of each text line in the sample image; the actual text line local mask map corresponding to the sample image is used for representing the actual occupied area of the preset part of each text line in the sample image.
In addition, the embodiment of the present application does not limit the construction process of the mask map generation model, for example, in a possible implementation manner, the construction process of the mask map generation model may specifically include steps 11 to 15:
step 11: and acquiring a sample image, an actual text line global mask image corresponding to the sample image and an actual text line local mask image corresponding to the sample image.
Step 12: and inputting the sample image into a model to be trained to obtain a global mask image of a prediction text line corresponding to the sample image and a local mask image of the prediction text line corresponding to the sample image, which are output by the model to be trained.
The model to be trained refers to a model which needs to be trained when a mask map generation model is constructed; moreover, the model to be trained is not limited in the embodiments of the present application, and for example, the model to be trained may be a deep learning model.
The global mask map of the predicted text line corresponding to the sample image is used for representing the area of at least one text line in the sample image predicted in the sample image.
The local mask map of the prediction text line corresponding to the sample image is used for indicating the region of the preset part of at least one text line in the sample image predicted in the sample image.
Step 13: judging whether a preset stop condition is reached, if so, executing the step 15; if not, go to step 14.
Wherein the preset stop condition may be preset; moreover, the preset stop condition is not limited in the embodiment of the present application, for example, the preset stop condition may be that the loss value of the model to be trained is lower than a first threshold, the change rate of the loss value of the model to be trained is lower than a second threshold, or the number of times of updating the model to be trained reaches a third threshold. The first threshold, the second threshold and the third threshold are all preset.
The loss value of the model to be trained is used for measuring the prediction performance of the model to be trained; and the loss value of the model to be trained can be determined according to the difference between the prediction data (i.e. the global mask map of the prediction text line and the local mask map of the prediction text line corresponding to the sample image) and the label data (i.e. the global mask map of the actual text line and the local mask map of the actual text line corresponding to the sample image). In addition, the embodiment of the present application does not limit the calculation manner of the Loss value of the model to be trained, and for example, the calculation manner may be implemented by using a Dice coefficient, a Dice Loss, or Laplace smoothening.
Based on the relevant content in the step 13, it can be known that, for the model to be trained of the current round, it can be judged whether the model to be trained of the current round reaches the preset stop condition, and if the preset stop condition is reached, it indicates that the model to be trained of the current round has better prediction performance, so that a mask map generation model can be directly constructed according to the model to be trained of the current round, so that the mask map generation model also has better prediction performance; if the preset stopping condition is not met, the prediction performance of the model to be trained of the current round is still poor, so that the model to be trained of the current round can be updated according to the predicted text line global mask image, the actual text line global mask image, the predicted text line local mask image and the actual text line local mask image corresponding to the sample image, and the updated model to be trained has better prediction performance.
Step 14: and updating the model to be trained according to the predicted text line global mask image corresponding to the sample image, the actual text line global mask image corresponding to the sample image, the predicted text line local mask image corresponding to the sample image and the actual text line local mask image corresponding to the sample image, and returning to execute the step 12.
It should be noted that, the embodiment of the present application is not limited to the updating manner of the model to be trained, and may be implemented by any existing or future model updating manner (for example, model updating may be performed according to the loss value of the model to be trained).
Step 15: and constructing a mask map generation model according to the model to be trained.
In the embodiment of the application, after it is determined that the model to be trained of the current wheel reaches the preset stop condition, a mask map generation model may be constructed according to the model to be trained of the current wheel (for example, the model to be trained of the current wheel is directly determined as the mask map generation model; or, according to the model structure and the model parameters of the model to be trained of the current wheel, the model structure and the model parameters of the mask map generation model are determined so that the mask map generation model and the model to be trained of the current wheel have the same model structure and the same model parameters), so that the mask map generation model and the model to be trained of the current wheel have the same prediction performance, and the mask map generation model also has better prediction performance.
Based on the relevant content of the above steps 11 to 15, after the sample image, the actual text line global mask map corresponding to the sample image, and the actual text line local mask map are obtained, the sample image may be used as model input data required to be used when a mask map generation model is constructed, the actual text line global mask map corresponding to the sample image and the actual text line local mask map are used as model tag data required to be used when the mask map generation model is constructed, and the mask map generation model is constructed based on the model input data and the model tag data, so that the constructed mask map generation model has better prediction performance, and the constructed mask map can be subsequently used for predicting the global mask map and the local mask map.
The global mask map of the predicted text line corresponding to the image to be processed is obtained by performing global mask map prediction on the image to be processed by the constructed mask map generation model, so that the global mask map of the predicted text line is used for representing the area occupied by at least one text line in the image to be processed. As can be seen, the global mask map of the predicted text line corresponding to the image to be processed may include a global mask region corresponding to all or a part of the text line in the image to be processed. The global mask area is used for representing the area occupied by one text line in the image to be processed.
The local mask map of the prediction text line corresponding to the image to be processed is obtained by performing local mask map prediction on the image to be processed by the constructed mask map generation model, so that the local mask map of the prediction text line is used for representing the area occupied by the preset part of at least one text line in the image to be processed. As can be seen, the local mask map of the predicted text line corresponding to the image to be processed may include the local mask region corresponding to all or part of the text line in the image to be processed. The local mask area is used for representing the area occupied by the preset part of one text line in the image to be processed.
In addition, the embodiment of the present application does not limit the local mask map of the predicted text line corresponding to the image to be processed, and for example, the local mask map of the predicted text line may include an initial character mask sub-map corresponding to the image to be processed and/or a character string local mask sub-map corresponding to the image to be processed. The first character mask subgraph corresponding to the image to be processed is used for representing the area occupied by the first character of at least one text line in the image to be processed in the model input image. The character string local mask subgraph corresponding to the image to be processed is used for representing the area occupied by the character string local part of at least one text line in the image to be processed in the model input image.
Based on the relevant content of S21, after the image to be processed is acquired, the image to be processed may be input to a mask map generation model that is constructed in advance, so that the mask map generation model can predict a global mask map and a local mask map for the image to be processed, and obtain and output a predicted text line global mask map and a predicted text line local mask map corresponding to the image to be processed, so that the predicted text line global mask map and the predicted text line local mask map can accurately represent at least one text line in the image to be processed and a region occupied by a preset portion thereof in the image to be processed.
S22: and determining a global mask region corresponding to the text line to be processed in the global mask image of the predicted text line corresponding to the image to be processed as a global region of the text line to be processed.
In this embodiment of the application, after the global mask map of the text line to be processed corresponding to the image to be processed is obtained, the global mask region corresponding to the text line to be processed recorded in the global mask map of the text line to be processed may be determined as the global region of the text line to be processed, so that the text line direction may be subsequently determined based on the global region of the text line to be processed.
S23: and determining a local mask region corresponding to the text line to be processed in the local mask image of the predicted text line corresponding to the image to be processed as the local region of the text line to be processed.
The present embodiment does not limit the determination process of the local area of the text line to be processed (i.e., the implementation manner of S23), and for ease of understanding, the following description is made with reference to three examples.
Example 1, if the local mask map of the prediction text line corresponding to the image to be processed includes the first character mask sub-map corresponding to the image to be processed, S23 may specifically include steps 21 to 22:
step 21: and determining a first character area of the text line to be processed from the corresponding first character mask subgraph of the image to be processed.
The first character mask subgraph corresponding to the image to be processed is used for representing the area occupied by the first character of at least one text line in the image to be processed.
The first character area of the text line to be processed is used to represent the area occupied by the first character of the text line to be processed in the image to be processed.
Step 22: and determining the local area of the text line to be processed according to the first character area of the text line to be processed, so that the local area of the text line to be processed comprises the first character area of the text line to be processed.
Based on the related contents in the above steps 21 to 22, for the text line to be processed, after the initial character mask subgraph corresponding to the image to be processed is acquired, the initial character region of the text line to be processed may be determined from the initial character mask subgraph, so that the initial character region can accurately represent the region occupied by the initial character of the text line to be processed in the image to be processed; and then, according to the first character area, determining the local area of the text line to be processed, so that the local area can also accurately represent the area occupied by the first character of the text line to be processed in the image to be processed.
Example 2, if the local mask map of the prediction text line corresponding to the image to be processed includes the local mask sub-map of the character string corresponding to the image to be processed, step S23 may specifically include steps 31 to 32:
step 31: and determining a character string local area of the text line to be processed from the character string local mask subgraph corresponding to the image to be processed.
The character string local mask subgraph corresponding to the image to be processed is used for representing the area occupied by the character string local part of at least one text line in the image to be processed.
The character string local area of the text line to be processed is used for representing the area occupied by the character string local part of the text line to be processed in the image to be processed.
Step 32: and determining the local area of the text line to be processed according to the local area of the character string of the text line to be processed, so that the local area of the text line to be processed comprises the local area of the text line to be processed.
Based on the related contents in the above steps 31 to 32, for the text line to be processed, after the character string local mask sub-image corresponding to the image to be processed is obtained, the character string local area of the text line to be processed may be determined from the character string local mask sub-image, so that the character string local area can accurately indicate the area occupied by the character string local part of the text line to be processed in the image to be processed; and then, according to the local area of the character string, determining the local area of the text line to be processed, so that the local area can also accurately represent the area occupied by the local part of the character string of the text line to be processed in the image to be processed.
Example 3, if the local mask map of the predicted text line corresponding to the image to be processed includes the initial character mask sub-map corresponding to the image to be processed and the character string local mask sub-map corresponding to the image to be processed, S23 may specifically include step 41 to step 43:
step 41: and determining a first character area of the text line to be processed from the corresponding first character mask subgraph of the image to be processed.
Step 42: and determining a character string local area of the text line to be processed from the character string local mask subgraph corresponding to the image to be processed.
It should be noted that, for the relevant contents of step 41 and step 42, refer to step 21 and step 31 above, respectively.
Step 43: and determining the local area of the text line to be processed according to the first character area of the text line to be processed and the character string local area of the text line to be processed, so that the local area of the text line to be processed comprises the first character area of the text line to be processed and the character string local area of the text line to be processed.
Based on the relevant content of the above steps 41 to 43, for the text line to be processed, after the initial character mask sub-image and the character string local mask sub-image corresponding to the image to be processed are obtained, the initial character region and the character string local region of the text line to be processed are respectively determined from the initial character mask sub-image and the character string local mask sub-image, so that the initial character region and the character string local region can accurately represent the regions of the text line to be processed and the character string local portion thereof in the image to be processed respectively; and then, according to the first character area and the character string local area, determining the local area of the text line to be processed, so that the local area can also accurately represent the areas of the text line to be processed and the character string local part thereof in the image to be processed respectively.
Based on the above-mentioned related content of S2, after the to-be-processed image is acquired, the global region and the local region of each to-be-processed text line may be determined from the to-be-processed image, so that the text line direction of each to-be-processed text line may be subsequently determined according to the global region and the local region of each to-be-processed text line.
S3: and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed.
The text line direction of the text line to be processed is used for representing the presentation mode of the text line to be processed in the image to be processed.
In addition, the text line direction of the text line to be processed is not limited in the embodiment of the present application, for example, the text line direction of the text line to be processed may include a text line rotation mode of the text line to be processed and/or a text arrangement mode of the text line to be processed; and the text line rotation mode of the text line to be processed may include the text line rotation direction of the text line to be processed and/or the text line rotation angle of the text line to be processed.
The embodiment of the present application also does not limit the implementation of the "position comparison", for example, the relative information between the global region of the text line to be processed and the local region of the text line to be processed may be matched with at least one candidate standard relative information, and the text line direction corresponding to the candidate standard relative information that is successfully matched may be determined as the text line direction of the text line to be processed. The above "relative information" may include relative parameters (e.g., relative relationship and relative difference in position, relative relationship and relative difference in area range, etc.) between different areas. In addition, each candidate criterion relative information may be set in advance, and the number of candidate criterion relative information is not limited in the embodiment of the present application.
In addition, in order to further improve the accuracy of the text line direction, the embodiment of the present application further provides a possible specific implementation manner of S3, which may specifically include S31-S33:
s31: and determining the global area center of the text line to be processed, so that the global area center of the text line to be processed is used for representing the center position of the global area of the text line to be processed.
In the embodiment of the application, after the global area of the text line to be processed is obtained, the center of the global area of the text line to be processed can be determined according to the center position of the global area, so that the center of the global area of the text line to be processed can accurately represent the center position of the area occupied by the text line to be processed in the image to be processed.
S32: and determining the local area center of the text line to be processed, so that the local area center of the text line to be processed is used for representing the center position of the local area of the text line to be processed.
The present embodiment is not limited to the embodiment of S32, and for ease of understanding, the following description will be made with reference to three examples.
As an example, if the local area of the text line to be processed includes the first character area of the text line to be processed, S32 may specifically include step 51 to step 52:
step 51: and determining the center of the first character area of the text line to be processed from the first character area of the text line to be processed, so that the center of the first character area of the text line to be processed is used for expressing the center position of the first character area of the text line to be processed.
In the embodiment of the present application, after the first character area of the text line to be processed is acquired, the center position of the first character area may be determined as the center of the first character area of the text line to be processed, so that the center of the first character area can accurately represent the center position of the area occupied by the first character of the text line to be processed in the image to be processed.
Step 52: and determining the center of the local area of the text line to be processed according to the center of the first character area of the text line to be processed, so that the center of the local area of the text line to be processed comprises the center of the first character area of the text line to be processed.
As can be seen from the related contents of the foregoing steps 51 to 52, after the first character region of the text line to be processed is acquired, the center of the first character region of the text line to be processed may be determined from the first character region, so that the center of the first character region can accurately indicate the central position of the region occupied by the first character of the text line to be processed in the image to be processed; and then, determining the center of the local area of the text line to be processed according to the center of the first character area, so that the center of the local area can accurately represent the central position of the area occupied by the first character of the text line to be processed in the image to be processed.
For example two, if the local area of the text line to be processed includes the local area of the character string of the text line to be processed, step S32 may specifically include step 61-step 62:
step 61: and determining the center of the character string local area of the text line to be processed from the character string local area of the text line to be processed, so that the center of the character string local area of the text line to be processed is used for expressing the center position of the character string local area of the text line to be processed.
In this embodiment of the application, after the character string local area of the text line to be processed is obtained, the center position of the character string local area may be determined as the center of the character string local area of the text line to be processed, so that the center of the character string local area can accurately represent the center position of the area occupied by the character string local part of the text line to be processed in the image to be processed.
Step 62: and determining the local area center of the text line to be processed according to the character string local area center of the text line to be processed, so that the local area center of the text line to be processed comprises the character string local area center of the text line to be processed.
Based on the related contents in the above steps 61 to 62, after the character string local area of the text line to be processed is obtained, the center of the character string local area of the text line to be processed may be determined from the character string local area, so that the center of the character string local area can accurately represent the center position of the area occupied by the character string local part of the text line to be processed in the image to be processed; and then, determining the local area center of the text line to be processed according to the local area center of the character string, so that the local area center can also accurately represent the central position of the area occupied by the local part of the character string of the text line to be processed in the image to be processed.
Example three, if the local area of the to-be-processed text line includes the first character area of the to-be-processed text line and the character string local area of the to-be-processed text line, S32 may specifically include steps 71 to 73:
step 71: and determining the center of the first character area of the text line to be processed from the first character area of the text line to be processed, so that the center of the first character area of the text line to be processed is used for expressing the center position of the first character area of the text line to be processed.
Step 72: and determining the center of the character string local area of the text line to be processed from the character string local area of the text line to be processed, so that the center of the character string local area of the text line to be processed is used for expressing the center position of the character string local area of the text line to be processed.
It should be noted that, for the relevant contents of step 71 and step 72, see step 51 and step 61 above, respectively.
Step 73: determining the center of the local area of the text line to be processed according to the center of the first character area of the text line to be processed and the center of the local area of the character string of the text line to be processed, so that the center of the local area of the text line to be processed comprises the center of the first character area of the text line to be processed and the center of the local area of the character string of the text line to be processed.
Based on the related contents in the above-mentioned steps 71 to 73, after the first character region and the character string local region of the text line to be processed are acquired, the center of the first character region and the center of the character string local region of the text line to be processed can be determined from the first character region and the character string local region, respectively; and then determining the set of the center of the first character area and the center of the local area of the character string as the center of the local area of the text line to be processed, so that the center of the local area can accurately represent the center position of the area occupied by the first character of the text line to be processed and the local part of the character string in the image to be processed.
Based on the above-mentioned relevant content of S32, after the local area of the text line to be processed is acquired, the local area center of the text line to be processed may be determined from the local area, so that the local area center can accurately indicate the center position of the area occupied by the preset portion (e.g., the local portion of the initial character and/or the character string) of the text line to be processed in the image to be processed.
S33: and determining the text line direction of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the local area center of the text line to be processed.
The relative position information is used to describe relative relationships of different objects (e.g., the global area center and the local area center) in position (e.g., horizontal direction, vertical direction, etc.) and relative differences.
In addition, the present example does not limit the implementation of S33, and for ease of understanding S33, the following description will be made with reference to two examples.
Under the first example of S33, if the text line direction of the text line to be processed includes the text line rotation manner of the text line to be processed, and the local area center of the text line to be processed includes the first character area center of the text line to be processed, S331 may specifically include: and determining the text line direction of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the first character area center of the text line to be processed.
In addition, the embodiment of the present application is not limited to the implementation of the above determination process, for example, in a possible implementation, if the text line direction includes a text line rotation direction and/or a text line rotation angle, the determination process of the text line direction of the text line to be processed may include all or part of the steps from step 81 to step 84:
step 81: and performing directional connection on the center of the global area of the text line to be processed and the center of the first character area of the text line to be processed to obtain a first vector.
It should be noted that the present embodiment does not limit the connection manner of the "directional connection" in step 81, for example, the global area center of the text line to be processed is taken as a starting point, and the first character area center of the text line to be processed is taken as an end point, so that the first vector becomes a directional line segment which takes the global area center of the text line to be processed as a starting point and the first character area center of the text line to be processed as an end point. For another example, the center of the first character area of the text line to be processed may be used as a starting point, and the center of the global area of the text line to be processed may be used as an ending point, so that the first vector becomes a directed line segment that takes the center of the first character area of the text line to be processed as a starting point and the center of the global area of the text line to be processed as an ending point.
Step 82: and determining an included angle between the first vector and the first preset vector as a first rotation angle of the text line to be processed.
The first preset vector is a preset reference vector for indicating a directional connection between the center of the global area and the center of the first character area.
In practice, lines of text with different word arrangements correspond to different reference vectors. Based on this, the embodiment of the present application further provides a possible implementation manner of determining the first preset vector, which may specifically include: and determining a first preset vector according to the character arrangement mode of the text line to be processed and the first mapping relation. The first mapping relation comprises a corresponding relation between a text arrangement mode of a text line to be processed and a first preset vector.
The first mapping relation is used for recording reference vectors which correspond to different character arrangement modes and are used for representing directional connection between the center of the global area and the center of the first character area; the first mapping relationship is not limited in the embodiments of the present application, and for example, the first mapping relationship may include a correspondence relationship between a horizontal row and a first standard vector, and a correspondence relationship between a vertical row and a second standard vector.
The first standard vector is used for representing the directional connection between the center of the global area and the center of the first character area of the horizontal text line; furthermore, the first criterion vector is not limited in the embodiments of the present application, for example, the first criterion vector may be determined according to a directional connection between a center of a global area of a standard lateral text line and a center of an initial character area of the standard lateral text line (for example, as shown in fig. 4, a directional connection between a center of a global area of a standard lateral text line and a center of an initial character area of the standard lateral text line may be determined as the first criterion vector. Where the text direction of the standard lateral text line is the horizontal direction and the text line direction of the standard lateral text line is the positive direction (i.e., rotated 0 °).
The second standard vector is used for representing the directional connection between the center of the global area and the center of the first character area of the vertical text line; further, the second criterion vector is not limited in the embodiments of the present application, for example, the second criterion vector may be determined according to a directional connection between the center of the global area of the standard vertical text line and the center of the first character area of the standard vertical text line (for example, as shown in fig. 4, a directional connection between the center of the global area of the standard vertical text line and the center of the first character area of the standard vertical text line may be determined as the second criterion vector. Wherein the text direction of the standard vertical text line is the vertical direction and the text line direction of the standard vertical text line is the positive direction (i.e., rotated 0 °).
Based on the relevant content of the "first preset vector", for the text to be processed, if it is determined that the text arrangement mode of the text line to be processed is horizontal, the first standard vector may be determined as the first preset vector; if the text arrangement mode of the text line to be processed is determined to be vertical, the second standard vector can be determined as the first preset vector.
In addition, the first rotation angle may be used to represent a text line rotation angle of a text line to be processed.
Based on the related content of the step 82, for the text to be processed, after the first vector is obtained, an included angle between the first vector and the first preset vector may be determined as the first rotation angle of the text line to be processed, so that the first rotation angle can accurately represent the rotation angle of the text line to be processed.
Step 83: and determining a first rotation direction of the text line to be processed according to the relative relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed in a first preset direction.
The first preset direction refers to a preset reference direction.
In practice, lines of text with different arrangements of words correspond to different reference directions. Based on this, the embodiment of the present application further provides a possible implementation manner of determining the first preset direction, which may specifically include: after the character arrangement mode of the text line to be processed is obtained, a first preset direction is determined according to the character arrangement mode of the text line to be processed and the second mapping relation. The second mapping relation comprises a corresponding relation between the text arrangement mode of the text line to be processed and the first preset direction.
The second mapping relation is used for recording reference directions corresponding to different character arrangement modes and related to the global area center and the first character area center; the second mapping relationship is not limited in the embodiments of the present application, and for example, the second mapping relationship may include a correspondence relationship between a horizontal row and a vertical direction, and a correspondence relationship between a vertical row and a horizontal direction.
Based on the content of the "first preset direction", for the text line to be processed, if it is determined that the text arrangement manner of the text line to be processed is horizontal, the vertical direction may be determined as the first preset direction; if the text arrangement mode of the text line to be processed is determined to be vertical, the horizontal direction can be determined to be a first preset direction.
In addition, the first rotation direction may be used to describe a text line rotation direction of the text line to be processed.
In addition, the embodiment of step 83 is not limited in the examples of the present application, and for the convenience of understanding, the following description is made with reference to two scenarios.
Scene one: when the text arrangement manner of the text line to be processed is horizontal and the first preset direction is vertical, step 83 may specifically include: if the projection position of the global area center of the text line to be processed in the first preset direction is higher than the projection position of the first character area center of the text line to be processed in the first preset direction, determining that the first rotation direction of the text line to be processed is in the anticlockwise direction; if the projection position of the global area center of the text line to be processed in the first preset direction is lower than the projection position of the first character area center of the text line to be processed in the first preset direction, determining that the first rotation direction of the text line to be processed is clockwise.
It can be seen that, as shown in fig. 5, if the text line to be processed is arranged horizontally (that is, the text line to be processed belongs to a horizontal text line), after acquiring the relative relationship between the center of the global area of the text line to be processed (e.g., "O1" in fig. 5) and the center of the first character area of the text line to be processed (e.g., "O3" in fig. 5) in the vertical direction, if the relative relationship indicates that the projection position of the center of the global area of the text line to be processed in the vertical direction is higher than the projection position of the center of the first character area of the text line to be processed in the vertical direction, it may be determined that the first rotation direction of the text line to be processed is in the counterclockwise direction; if the correlation indicates that the projection position of the center of the global area of the line of text to be processed in the vertical direction is lower than the projection position of the center of the first character area of the line of text to be processed in the vertical direction, it may be determined that the first rotation direction of the line of text to be processed is clockwise.
It should be noted that, if the text line to be processed is arranged horizontally and the projection position of the center of the global area of the text line to be processed in the vertical direction coincides with the projection position of the center of the first character area of the text line to be processed in the vertical direction, it may be determined that the text line to be processed is in the positive direction or that the text line to be processed is rotated by 180 °.
Scene two: when the text arrangement mode of the text line to be processed is vertical and the first preset direction is horizontal, step 83 may specifically include: if the projection position of the global area center of the text line to be processed in the first preset direction is more right than the projection position of the first character area center of the text line to be processed in the first preset direction, determining that the first rotation direction of the text line to be processed is in the anticlockwise direction; and if the projection position of the global area center of the text line to be processed in the first preset direction is more left than the projection position of the first character area center of the text line to be processed in the first preset direction, determining that the first rotation direction of the text line to be processed is clockwise.
It can be seen that, as shown in fig. 6, if the text line to be processed is arranged vertically (that is, the text line to be processed belongs to a vertical text line), after acquiring the relative relationship between the center of the global area of the text line to be processed (e.g., "O1" in fig. 6) and the center of the first character area of the text line to be processed (e.g., "O3" in fig. 6) in the horizontal direction, if the relative relationship indicates that the projection position of the center of the global area of the text line to be processed in the horizontal direction is more right than the projection position of the center of the first character area of the text line to be processed in the horizontal direction, it may be determined that the first rotation direction of the text line to be processed is in the counterclockwise direction; if the correlation indicates that the projection position of the global area center of the line of text to be processed in the horizontal direction is more left than the projection position of the first character area center of the line of text to be processed in the horizontal direction, it may be determined that the first rotation direction of the line of text to be processed is clockwise.
It should be noted that, if the text line to be processed is arranged vertically, and the projection position of the center of the global area of the text line to be processed in the horizontal direction coincides with the projection position of the center of the first character area of the text line to be processed in the horizontal direction, it may be determined that the text line to be processed is in the positive direction or that the text line to be processed is rotated by 180 °.
Based on the related content of the step 83, after the global area center of the text line to be processed and the first character area center of the text line to be processed are obtained, a relative relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed in a first preset direction may be obtained first; and then, according to the correlation, determining a first rotation direction of the text line to be processed, so that the first rotation direction can accurately represent the rotation direction of the text line of the text to be processed.
Step 84: and generating the text line direction of the text line to be processed according to the first rotation angle of the text line to be processed and/or the first rotation direction of the text line to be processed.
As an example, if the text line direction includes a text line rotation direction and/or a text line rotation angle, step 84 may include step 841 and/or step 842:
step 841: the text line rotation direction of the text line to be processed is determined according to the first rotation direction of the text line to be processed (e.g., the first rotation direction of the text line to be processed can be directly determined as the text line rotation direction of the text line to be processed).
Step 842: the text line rotation angle of the text line to be processed is determined according to the first rotation angle of the text line to be processed (e.g., the first rotation angle of the text line to be processed may be directly determined as the text line rotation angle of the text line to be processed).
Based on the related content of the step 84, after the first rotation angle and/or the first rotation direction of the text line to be processed are obtained, the text line direction (in particular, the text line rotation manner) of the text line to be processed may be generated according to the first rotation angle and/or the first rotation direction, so that the text line direction can accurately represent the text line rotation manner of the text line to be processed, and thus the text line direction can accurately represent the presentation manner of the text line to be processed in the image to be processed.
Based on the relevant content of the first example of S33, for the text line to be processed, after the center of the first character area of the text line to be processed is acquired, the text line direction of the text line to be processed may be determined according to the relative position information between the center of the global area and the center of the first character area of the text line to be processed, so that the text line direction can accurately indicate the presentation manner of the text line to be processed in the image to be processed.
Under the second example of S33, if the local area center of the line of text to be processed includes the first character area center of the line of text to be processed and the character string local area center of the line of text to be processed, S33 may specifically include: and determining the text line direction of the text line to be processed according to the relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed.
For example, in a possible implementation manner, if the text line direction includes a text line rotation manner and/or a text arrangement manner, S33 may specifically include S331 and/or S332:
s331: and determining the character arrangement mode of the text line to be processed according to the relative position information between the center of the global area of the text line to be processed and the center of the local area of the character string of the text line to be processed.
In the embodiment of the application, after the global area center and the character string local area center of the text line to be processed are obtained, the text arrangement mode of the text line to be processed can be determined according to the relative relationship (especially the relative relationship in position) between the global area center and the character string local area center of the text line to be processed; and the determining process may specifically include: if the center of the global area of the text line to be processed is coincident with the center of the local area of the character string of the text line to be processed, determining that the text arrangement mode of the text line to be processed is vertical; and if the center of the global area of the text line to be processed is not coincident with the center of the local area of the character string of the text line to be processed, determining that the text arrangement mode of the text line to be processed is horizontal.
S332: and determining the text line rotation mode of the text line to be processed according to the relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed.
The present embodiment does not limit S332, and for example, S332 may be implemented in the following three ways.
The first method is as follows: and determining the text line rotation mode of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the first character area center of the text line to be processed.
It should be noted that the related content of the first mode may refer to the related content of the above "first example of S33".
The second method comprises the following steps: and determining the text line rotation mode of the text line to be processed according to the relative position information between the center of the first character area of the text line to be processed and the center of the character string local area of the text line to be processed.
The embodiment of the present application does not limit the determination process, for example, in a possible implementation manner, the determination process of the text line rotation manner of the text line to be processed may include steps 91 to 94:
step 91: and performing directional connection on the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed to obtain a third vector.
It should be noted that the present embodiment does not limit the connection manner of the "directional connection" in step 91, for example, the first character region center of the text line to be processed is taken as a starting point, and the character string local region center of the text line to be processed is taken as an end point, so that the third vector becomes a directional line segment which takes the first character region center of the text line to be processed as a starting point and the character string local region center of the text line to be processed as an end point. For another example, the center of the local region of the character string of the text line to be processed may be used as a starting point, and the center of the first character region of the text line to be processed may be used as an ending point to perform directional connection, so that the third vector becomes a directional line segment that takes the center of the local region of the character string of the text line to be processed as a starting point and the center of the first character region of the text line to be processed as an ending point.
And step 92: and determining an included angle between the third vector and a third preset vector as a third rotation angle of the text line to be processed.
Wherein, the third preset vector is a preset reference vector for indicating the directional connection between the center of the first character area and the center of the character string local area.
In practice, lines of text with different word arrangements correspond to different reference vectors. Based on this, the embodiment of the present application further provides a possible implementation manner of determining the third preset vector, which may specifically include: and determining a third preset vector according to the character arrangement mode of the text line to be processed and the third mapping relation. The third mapping relationship comprises a corresponding relationship between the text arrangement mode of the text line to be processed and a third preset vector.
The third mapping relation is used for recording reference vectors which are corresponding to different character arrangement modes and used for representing directional connection between the center of the first character area and the center of the local area of the character string; the third mapping relationship is not limited in the embodiments of the present application, for example, the third mapping relationship may include a correspondence relationship between a horizontal row and a fourth standard vector, and a correspondence relationship between a vertical row and a fifth standard vector.
The fourth standard vector is used for representing the directional connection between the center of the first character area of the horizontal text line and the center of the local area of the character string; also, the fourth eigenvector may be determined based on the directional connection between the center of the first character region of the standard lateral text line and the center of the character string local region of the standard lateral text line, for example (for example, as shown in fig. 4, the directional connection between the center of the first character region of the standard lateral text line and the center of the character string local region of the standard lateral text line may be determined as the fourth eigenvector.
The fifth standard vector refers to a standard vector for representing a directional connection between the center of the first character region of the vertical text line and the center of the local region of the character string; also, the fifth normative vector is not limited in the embodiments of the present application, and for example, the fifth normative vector may be determined based on a directional connection between the center of the first character region of the standard vertical text line and the center of the character string partial region of the standard vertical text line (for example, as shown in fig. 4, a directional connection between the center of the first character region of the standard vertical text line and the center of the character string partial region of the standard vertical text line may be determined as the fifth normative vector.
Based on the content of the "third predetermined vector", if it is determined that the text line to be processed is arranged horizontally, the fourth standard vector may be determined as the third predetermined vector; if the text arrangement mode of the text line to be processed is determined to be vertical, the fifth standard vector can be determined as a third preset vector.
In addition, the third rotation angle may be used to represent a text line rotation angle of the text line to be processed.
Based on the related content of the step 92, after the third vector is acquired, an included angle between the third vector and a third preset vector may be determined as a third rotation angle of the text line to be processed, so that the third rotation angle can accurately represent the rotation angle of the text line to be processed.
Step 93: and determining a third rotation direction of the text line to be processed according to the relative relationship and the position difference value in a third preset direction between the center of the first character area of the text line to be processed and the center of the character string local area of the text line to be processed.
The third preset direction refers to a preset reference direction.
In practice, lines of text with different arrangements of words correspond to different reference directions. Based on this, the embodiment of the present application further provides a possible implementation manner of determining the third preset direction, which may specifically include: determining a third preset direction according to the character arrangement mode of the text line to be processed and the fourth mapping relation; the fourth mapping relationship comprises a corresponding relationship between the text arrangement mode of the text line to be processed and the third preset direction.
The fourth mapping relation is used for recording reference directions corresponding to different character arrangement modes and related to the global area center and the first character area center; the fourth mapping relationship is not limited in the embodiments of the present application, and for example, the fourth mapping relationship may include a correspondence relationship between a horizontal row and a vertical direction, and a correspondence relationship between a vertical row and a horizontal direction.
Based on the content of the "third preset direction", for the text line to be processed, if it is determined that the text arrangement manner of the text line to be processed is horizontal, the vertical direction may be determined as the third preset direction; if the text arrangement mode of the text line to be processed is determined to be vertical, the horizontal direction can be determined to be a third preset direction.
In addition, the third rotation direction may be used to describe a text line rotation direction of the text line to be processed.
In addition, the embodiment of step 93 is not limited in the examples of the present application, and for convenience of understanding, the following description is made with reference to two scenarios.
Scene one: when the text arrangement manner of the text line to be processed is horizontal and the third preset direction is vertical, step 93 may specifically include:
if the projection position of the center of the first character area of the text line to be processed in the third preset direction is higher than the projection position of the center of the local character string area of the text line to be processed in the third preset direction, and the difference value of the projection positions of the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed in the third preset direction is higher than the preset position difference value, determining that the third rotation direction of the text line to be processed is clockwise;
if the projection position of the center of the first character area of the text line to be processed in the third preset direction is higher than the projection position of the center of the local character string area of the text line to be processed in the third preset direction, and the difference value of the projection positions of the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed in the third preset direction is lower than the preset position difference value, determining that the third rotation direction of the text line to be processed is in the anticlockwise direction;
and if the projection position of the center of the first character area of the text line to be processed in the third preset direction is not higher than the projection position of the center of the local character string area of the text line to be processed in the third preset direction, determining that the third rotation direction of the text line to be processed is the anticlockwise direction.
The preset position difference may be preset, or may be determined according to a height difference in the vertical direction between the center of the first character area of the forward direction text line corresponding to the text line to be processed and the center of the character string local area of the forward direction text line (for example, a height difference in the vertical direction between the center of the first character area of the forward direction text line corresponding to the text line to be processed and the center of the character string local area of the forward direction text line is determined as the preset position difference).
It can be seen that, as shown in fig. 7, when the text line to be processed is arranged horizontally (that is, the text line to be processed belongs to a horizontal text line), after the relative relationship and the position difference value in the vertical direction between the center of the first character region of the text line to be processed (e.g., "O3" in fig. 7) and the center of the character string local region of the text line to be processed (e.g., "O2" in fig. 7) are acquired, if the relative relationship and the position difference value indicate that the projection position of the center of the first character region of the text line to be processed in the vertical direction is greater than the projection position of the center of the character string local region of the text line to be processed in the vertical direction by a distance greater than a preset position difference value (e.g., "preset position difference value" in fig. 7), it is determined that the third rotation direction of the text line to be processed is clockwise; if the relative relationship and the position difference value indicate that the projection position of the center of the first character region of the line of text to be processed in the vertical direction is higher than the projection position of the center of the character string partial region of the line of text to be processed in the vertical direction by a distance smaller than a preset position difference value, or the relative relationship and the position difference value indicate that the projection position of the center of the first character region of the line of text to be processed in the vertical direction is not higher than (e.g., lower than or equal to) the projection position of the center of the character string partial region of the line of text to be processed in the vertical direction, it is determined that the third rotation direction of the line of text to be processed is the counterclockwise direction.
It should be noted that, if the text line to be processed is arranged horizontally, the projection position of the center of the first character area of the text line to be processed in the vertical direction is higher than the projection position of the center of the local character string area of the text line to be processed in the vertical direction, and the position difference between the projection position of the center of the first character area of the text line to be processed in the vertical direction and the projection position of the center of the local character string area of the text line to be processed in the vertical direction is equal to the preset position difference, it may be determined that the text line to be processed is in the positive direction or the text line to be processed is rotated by 180 °.
Scene two: when the text arrangement mode of the text line to be processed is vertical and the third preset direction is horizontal, step 93 may specifically include: if the projection position of the center of the first character area of the text line to be processed in the third preset direction is more left than the projection position of the center of the local character string area of the text line to be processed in the third preset direction, determining that the third rotation direction of the text line to be processed is in the anticlockwise direction; and if the projection position of the center of the first character area of the text line to be processed in the third preset direction is more right than the center of the local character string area of the text line to be processed, determining that the third rotation direction of the text line to be processed is clockwise.
Here, since the center of the character string local region of the vertical text line coincides with the center of the global region of the vertical text line, "the process of determining the manner of rotating the text line of the vertical text line from the center of the initial character region of the vertical text line and the center of the character string local region" is similar to the above "the process of determining the manner of rotating the text line of the vertical text line from the center of the global region of the vertical text line and the center of the initial character region".
It can be seen that, as shown in fig. 8, if the text line to be processed is arranged vertically (that is, the text line to be processed belongs to a vertical text line), after the relative relationship between the center of the first character region of the text line to be processed (e.g., "O3" in fig. 8) and the center of the local region of the character string (e.g., "O2" in fig. 8) in the horizontal direction is obtained, if the relative relationship indicates that the projection position of the center of the first character region of the text line to be processed in the horizontal direction is more left than the projection position of the center of the local region of the character string of the text line to be processed in the horizontal direction, it is determined that the third rotation direction of the text line to be processed is in the counterclockwise direction; and if the correlation indicates that the projection position of the center of the first character area of the text line to be processed in the horizontal direction is more right than the projection position of the center of the character string local area of the text line to be processed in the horizontal direction, determining that the third rotation direction of the text line to be processed is clockwise.
It should be noted that, if the text arrangement manner of the text line to be processed is vertical, and the projection position of the center of the first character area of the text line to be processed in the horizontal direction coincides with the projection position of the center of the local character string area of the text line to be processed in the horizontal direction, it may be determined that the text line to be processed is in the positive direction or the text line to be processed is rotated by 180 °.
Based on the related content of step 93, after acquiring the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed, a relative relationship between the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed in a first preset direction may be acquired first; and determining a third rotation direction of the text line to be processed according to the correlation, so that the third rotation direction can accurately represent the rotation direction of the text line of the text to be processed.
Step 94: and generating a text line rotation mode of the text line to be processed according to the third rotation angle and/or the third rotation direction of the text line to be processed.
As an example, if the text line rotation manner includes a text line rotation direction and/or a text line rotation angle, step 94 may include step 941 and/or step 942:
step 941: and determining the text line rotation direction of the text line to be processed according to the third rotation direction of the text line to be processed (e.g., the third rotation direction of the text line to be processed can be directly determined as the text line rotation direction of the text line to be processed).
Step 942: and determining the text line rotation angle of the text line to be processed according to the third rotation angle of the text line to be processed (e.g., the third rotation angle of the text line to be processed can be directly determined as the text line rotation angle of the text line to be processed).
Based on the related content of the step 94, in this embodiment of the application, after the third rotation angle and the third rotation direction of the text line to be processed are obtained, the text line rotation direction of the text line to be processed and the text line rotation angle of the text line to be processed may be determined according to the third rotation angle and the third rotation direction, respectively; and determining a text line rotation mode of the text line to be processed according to the text line rotation direction of the text line to be processed and/or the text line rotation angle of the text line to be processed, so that the text line rotation mode can accurately represent the text line rotation direction and/or the text line rotation angle of the text line to be processed, and the text line rotation mode can accurately represent the presentation mode of the text line to be processed in the image to be processed.
Based on the related content of the second mode, after the center of the first character area and the center of the local character string area of the text to be processed are obtained, the text line rotation mode of the text line to be processed may be determined according to the relative position information between the center of the first character area and the center of the local character string area of the text to be processed, so that the text line rotation mode may accurately represent the text line rotation information of the text line to be processed in the image to be processed, and thus the presentation mode of the text line to be processed in the image to be processed may be accurately represented based on the text line rotation mode.
The third method comprises the following steps: and determining the text line rotation mode of the text line to be processed according to the relative position information among the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed.
The embodiment of the present application does not limit the determination process, for example, in a possible implementation manner, the determination process of the text line rotation manner of the text line to be processed may include steps 101 to 103:
step 101: and determining the rotation angle of the text line to be processed according to the relative position information among at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed.
That is, for a text line to be processed, after acquiring a global area center, an initial character area center, and a character string local area center of the text line to be processed, the rotation angle of the text line to be processed may be determined in the following three ways; the three modes can be specifically as follows:
the first mode is as follows: and determining the rotation angle of the text line to be processed according to the relative position relationship between the center of the global area of the text line to be processed and the center of the first character area of the text line to be processed.
It should be noted that, the first manner may be implemented by using the above steps 81 to 82, and only the "first rotation angle of the text line to be processed" in the above steps 81 to 82 needs to be replaced by the "rotation angle of the text line to be processed".
The second mode is as follows: and determining the rotation angle of the text line to be processed according to the relative position information between the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed.
It should be noted that, the second manner may be implemented by adopting the above steps 91 to 92, and only the "third rotation angle of the text line to be processed" in the above steps 91 to 92 needs to be replaced by the "rotation angle of the text line to be processed".
The third mode is as follows: and determining the rotation angle of the text line to be processed according to the relative position information among the center of the global area of the text line to be processed, the center of the first character area of the text line to be processed and the center of the character string local area of the text line to be processed.
The embodiment of the present application is not limited to the implementation of the third method, and for example, the method may specifically include steps 111 to 113:
step 111: and determining the first rotation angle of the text line to be processed according to the relative position relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed.
Step 111 may be implemented by using steps 81 to 82 described above.
Step 112: and determining a third rotation angle of the text line to be processed according to the relative position information between the center of the first character area of the text line to be processed and the center of the character string local area of the text line to be processed.
Step 113 may be implemented by using steps 91 to 92 described above.
Step 113: and determining the text line rotation angle of the text line to be processed according to the first rotation angle of the text line to be processed and the third rotation angle of the text line to be processed.
The embodiment of the present application is not limited to the implementation of step 113, for example, step 113 may specifically be: and determining the maximum value (or median value, or minimum value, or mode) of the first rotation angle of the text line to be processed and the third rotation angle of the text line to be processed as the text line rotation angle of the text line to be processed. For another example, step 113 may specifically be: and determining an average value (or a weighted average value) between the first rotation angle of the text line to be processed and the third rotation angle of the text line to be processed as the text line rotation angle of the text line to be processed.
Based on the related content of step 101, after the global area center, the first character area center, and the character string local area center of the text line to be processed are obtained, the rotation angle of the text line to be processed may be determined according to the relative position information between at least two of the global area center, the first character area center, and the character string local area center of the text line to be processed, so that the text line rotation manner of the text line to be processed can be determined based on the rotation angle of the text line in the following.
Step 102: and determining the text line rotation direction of the text line to be processed according to the relative position information among at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed.
That is, for the text line to be processed, after acquiring the global area center, the first character area center, and the character string local area center of the text line to be processed, the text line rotation direction of the text line to be processed may be determined by adopting the following four embodiments; the three embodiments may be specifically:
the first embodiment: and determining the text line rotation direction of the text line to be processed according to the relative position relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed.
The first embodiment may be implemented by using the step 83, and only the "first rotation direction of the text line to be processed" in the step 83 needs to be replaced by the "text line rotation direction of the text line to be processed".
The second embodiment: and determining the text line rotation direction of the text line to be processed according to the relative position information between the center of the first character area of the text line to be processed and the center of the character string local area of the text line to be processed.
It should be noted that, the second embodiment may be implemented by using the step 93, and it is only necessary to replace the "third rotation direction of the text line to be processed" in the step 93 with the "text line rotation direction of the text line to be processed".
Third embodiment: and determining the text line rotation direction of the text line to be processed according to the relative position information among the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed.
The example of the present application does not limit the implementation manner of the fourth implementation manner, and for example, the method may specifically include steps 121 to 123:
step 121: and determining the first rotation direction of the text line to be processed according to the relative position relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed.
Step 121 may be implemented by using step 83 described above.
Step 122: and determining a third rotation direction of the text line to be processed according to the relative position information between the center of the first character area of the text line to be processed and the center of the character string local area of the text line to be processed.
Step 123 may be implemented by using step 93 described above.
Step 123: and determining the text line rotation direction of the text line to be processed according to the first rotation direction of the text line to be processed and the third rotation direction of the text line to be processed.
The embodiment of the present application is not limited to the implementation of step 123, for example, step 123 may specifically be: and determining the maximum value (or the median value, or the minimum value, or the mode) in the first rotation direction of the text line to be processed and the third rotation direction of the text line to be processed as the text line rotation direction of the text line to be processed. For another example, step 123 may specifically be: and determining an average (or a weighted average) between the first rotation direction of the text line to be processed and the third rotation direction of the text line to be processed as the text line rotation direction of the text line to be processed.
Based on the related content of step 102, after the global area center, the first character area center, and the character string local area center of the text line to be processed are obtained, the text line rotation direction of the text line to be processed may be determined according to the relative position information between at least two of the global area center, the first character area center, and the character string local area center of the text line to be processed, so that the text line rotation manner of the text line to be processed may be determined based on the text line rotation direction in the following.
Step 103: and generating a text line rotation mode of the text line to be processed according to the text line rotation angle of the text line to be processed and the text line rotation direction of the text line to be processed.
In the embodiment of the application, after the text line rotation angle and the rotation direction of the text line to be processed are obtained, the text line rotation mode of the text line to be processed may be generated according to the text line rotation angle and the rotation direction of the text line to be processed, so that the text line rotation mode may accurately represent the text line rotation direction and the text line rotation angle of the text line to be processed, and thus the text line rotation mode may accurately represent text line rotation information of the text line to be processed in the image to be processed, and further, the presentation mode of the text line to be processed in the image to be processed may be accurately represented based on the text line rotation mode.
Based on the related content of the third mode, after the global area center, the first character area center, and the character string local area center of the text to be processed are obtained, the text line rotation mode of the text line to be processed may be determined according to the relative position information between the three centers, so that the text line rotation mode may accurately represent the text line rotation information of the text line to be processed in the image to be processed, and thus the presentation mode of the text line to be processed in the image to be processed may be accurately represented based on the text line rotation mode.
Based on the related contents of the foregoing S1 to S3, after the to-be-processed image is acquired, a global area of a to-be-processed text line in the to-be-processed image and a local area of the to-be-processed text line are determined, so that the global area of the to-be-processed text line represents an area occupied by the to-be-processed text line in the to-be-processed image, and the local area of the to-be-processed text line represents an area occupied by a preset part of the to-be-processed text line in the to-be-processed image; and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed. It can be seen that, since the global region and the local region of the text line to be processed can respectively represent the region of the text line to be processed in the image to be processed and the region of the preset portion of the text line to be processed in the image to be processed, the global region and the local region of the text line to be processed can comprehensively represent the presentation manner of the text line to be processed in the image to be processed, so that the presentation manner of the text line to be processed in the image to be processed can be accurately described based on the text line direction of the text line to be processed determined by the global region and the local region of the text line to be processed, and thus the text line direction of the text line in the image can be accurately determined, which is beneficial to improving the character recognition accuracy of the text line in the image.
Based on the text line direction determination method provided by the above method embodiment, the embodiment of the present application further provides a text line direction determination device, which is explained and explained below with reference to the accompanying drawings.
Device embodiment
Please refer to the above method embodiment for technical details of the text line direction determining apparatus provided in the apparatus embodiment.
Referring to fig. 9, this figure is a schematic structural diagram of a text line direction determining apparatus according to an embodiment of the present application.
The apparatus 900 for determining a text line direction provided in the embodiment of the present application includes:
an image acquisition unit 901 configured to acquire an image to be processed;
a region determining unit 902, configured to determine a global region of a to-be-processed text line and a local region of the to-be-processed text line in the to-be-processed image; the global area of the text line to be processed represents the area occupied by the text line to be processed in the image to be processed; the local area of the text line to be processed represents the area occupied by the preset part of the text line to be processed in the image to be processed;
a direction determining unit 903, configured to compare the position of the global area of the text line to be processed with the position of the local area of the text line to be processed, so as to obtain a text line direction of the text line to be processed.
In a possible implementation manner, the area determining unit 902 is specifically configured to: inputting the image to be processed into a pre-constructed mask image generation model to obtain a predicted text line global mask image corresponding to the image to be processed and a predicted text line local mask image corresponding to the image to be processed, which are output by the mask image generation model; the mask image generation model is constructed according to a sample image, an actual text line global mask image corresponding to the sample image and an actual text line local mask image corresponding to the sample image; the global mask image of the prediction text line corresponding to the image to be processed comprises a global mask region corresponding to the text line to be processed; the local mask image of the prediction text line corresponding to the image to be processed comprises a local mask region corresponding to the text line to be processed; determining a global mask region corresponding to the text line to be processed in a global mask image of a predicted text line corresponding to the image to be processed as a global region of the text line to be processed; and determining a local mask region corresponding to the text line to be processed in the local mask image of the predicted text line corresponding to the image to be processed as the local region of the text line to be processed.
In one possible embodiment, the partial region of the text line to be processed comprises a first character region of the text line to be processed.
In a possible implementation manner, the direction determining unit 903 is specifically configured to:
determining the text line direction of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the first character area center of the text line to be processed; the center of the global area of the text line to be processed represents the center position of the global area of the text line to be processed; the center of the first character area of the text line to be processed represents the center position of the first character area of the text line to be processed.
In one possible embodiment, the text line direction comprises a text line rotation direction and/or a text line rotation angle;
the process for determining the text line rotation direction of the text line to be processed comprises the following steps: determining a first rotation direction of the text line to be processed according to a relative relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed in a first preset direction; determining the text line rotation direction of the text line to be processed according to the first rotation direction of the text line to be processed;
the process for determining the rotation angle of the text line to be processed comprises the following steps: performing directional connection on the center of the global area of the text line to be processed and the center of the first character area of the text line to be processed to obtain a first vector; determining an included angle between the first vector and a first preset vector as a first rotation angle of the text line to be processed; and determining the rotation angle of the text line to be processed according to the first rotation angle of the text line to be processed.
In one possible embodiment, the partial region of the text line to be processed includes the first character region of the text line to be processed and the character string partial region of the text line to be processed; and the character string local area of the text line to be processed is obtained by connecting preset local areas of all characters in the text line to be processed.
In a possible implementation manner, the direction determining unit 903 is specifically configured to: determining the text line direction of the text line to be processed according to the relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed; the center of the global area of the text line to be processed represents the center position of the global area of the text line to be processed; the center of the first character area of the text line to be processed represents the center position of the first character area of the text line to be processed; and the center of the character string local area of the text line to be processed represents the center position of the character string local area of the text line to be processed.
In a possible implementation manner, the text line direction comprises a text line rotation manner and/or a text arrangement manner;
the process for determining the text line rotation mode of the text line to be processed comprises the following steps: determining a text line rotation mode of the text line to be processed according to relative position information between the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed;
the process for determining the text arrangement mode of the text line to be processed comprises the following steps: and determining the character arrangement mode of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the character string local area center of the text line to be processed.
In one possible embodiment, the text line rotation mode includes a text line rotation direction and/or a text line rotation angle;
the process for determining the text line rotation direction of the text line to be processed comprises the following steps: determining a third rotation direction of the text line to be processed according to a relative relationship and a position difference value in a third preset direction between the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed; determining the text line rotating direction of the text line to be processed according to the third rotating direction of the text line to be processed;
the process for determining the rotation angle of the text line to be processed comprises the following steps: performing directional connection on the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed to obtain a third vector; determining an included angle between the third vector and a third preset vector as a third rotation angle of the text line to be processed; and determining the rotation angle of the text line to be processed according to the third rotation angle of the text line to be processed.
In a possible implementation manner, if the text arrangement manner of the text line to be processed is horizontal, the local region of the character string of the text line to be processed is obtained by connecting the bottom boundary regions of the characters in the text line to be processed; if the text arrangement mode of the text line to be processed is vertical, the local area of the character string of the text line to be processed is obtained by connecting the central areas of the characters in the text line to be processed.
In a possible implementation manner, the process of determining the text arrangement manner of the text line to be processed includes: if the center of the global area of the text line to be processed is coincident with the center of the local area of the character string of the text line to be processed, determining that the text arrangement mode of the text line to be processed is vertical; and if the center of the global area of the text line to be processed is not coincident with the center of the local area of the character string of the text line to be processed, determining that the text arrangement mode of the text line to be processed is horizontal.
In a possible embodiment, the text line direction includes at least one of a text line rotation direction, a text line rotation angle, or a character arrangement manner;
the process for determining the text line direction of the text line to be processed comprises the following steps: determining a text line rotation angle of the text line to be processed according to relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed; determining the text line rotation direction of the text line to be processed according to the relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed; and determining the character arrangement mode of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the character string local area center of the text line to be processed.
Further, an embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:
the memory is used for storing a computer program;
the processor is configured to execute any implementation manner of the text line direction determination method provided by the embodiment of the application according to the computer program.
Further, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, where the computer program is used to execute any implementation manner of the text line direction determining method provided in the embodiment of the present application.
Further, an embodiment of the present application also provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation of the text line direction determining method provided in the embodiment of the present application.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.
Claims (16)
1. A method for determining a direction of a line of text, the method comprising:
acquiring an image to be processed;
determining a global area of a text line to be processed and a local area of the text line to be processed in the image to be processed; the global area of the text line to be processed represents the area occupied by the text line to be processed in the image to be processed; the local area of the text line to be processed represents the area occupied by the preset part of the text line to be processed in the image to be processed;
and comparing the position of the global area of the text line to be processed with the position of the local area of the text line to be processed to obtain the text line direction of the text line to be processed.
2. The method according to claim 1, wherein the determining process of the global region of the text line to be processed and the local region of the text line to be processed comprises:
inputting the image to be processed into a pre-constructed mask image generation model to obtain a predicted text line global mask image corresponding to the image to be processed and a predicted text line local mask image corresponding to the image to be processed, which are output by the mask image generation model; the mask image generation model is constructed according to a sample image, an actual text line global mask image corresponding to the sample image and an actual text line local mask image corresponding to the sample image; the global mask image of the prediction text line corresponding to the image to be processed comprises a global mask region corresponding to the text line to be processed; the local mask image of the prediction text line corresponding to the image to be processed comprises a local mask region corresponding to the text line to be processed;
determining a global mask region corresponding to the text line to be processed in a global mask image of a predicted text line corresponding to the image to be processed as a global region of the text line to be processed;
and determining a local mask region corresponding to the text line to be processed in the local mask image of the predicted text line corresponding to the image to be processed as the local region of the text line to be processed.
3. The method according to claim 1, characterized in that the local area of the line of text to be processed comprises an area of first characters of the line of text to be processed.
4. The method according to claim 3, wherein the comparing the position of the global region of the text line to be processed with the local region of the text line to be processed to obtain the text line direction of the text line to be processed comprises:
determining the text line direction of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the first character area center of the text line to be processed; the center of the global area of the text line to be processed represents the center position of the global area of the text line to be processed; the center of the first character area of the text line to be processed represents the center position of the first character area of the text line to be processed.
5. The method of claim 4, wherein the text line direction comprises a text line rotation direction and/or a text line rotation angle;
the process for determining the text line rotation direction of the text line to be processed comprises the following steps:
determining a first rotation direction of the text line to be processed according to a relative relationship between the global area center of the text line to be processed and the first character area center of the text line to be processed in a first preset direction; determining the text line rotation direction of the text line to be processed according to the first rotation direction of the text line to be processed;
the process for determining the rotation angle of the text line to be processed comprises the following steps:
performing directional connection on the center of the global area of the text line to be processed and the center of the first character area of the text line to be processed to obtain a first vector; determining an included angle between the first vector and a first preset vector as a first rotation angle of the text line to be processed; and determining the rotation angle of the text line to be processed according to the first rotation angle of the text line to be processed.
6. The method according to claim 1, characterized in that the partial area of the text line to be processed comprises an initial character area of the text line to be processed and a character string partial area of the text line to be processed; and the character string local area of the text line to be processed is obtained by connecting preset local areas of all characters in the text line to be processed.
7. The method of claim 6, wherein comparing the position of the global region of the text line to be processed with the position of the local region of the text line to be processed to obtain the text line direction of the text line to be processed comprises:
determining the text line direction of the text line to be processed according to the relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed; the center of the global area of the text line to be processed represents the center position of the global area of the text line to be processed; the center of the first character area of the text line to be processed represents the center position of the first character area of the text line to be processed; and the center of the character string local area of the text line to be processed represents the center position of the character string local area of the text line to be processed.
8. The method of claim 7, wherein the text line direction comprises a text line rotation mode and/or a text arrangement mode;
the process for determining the text line rotation mode of the text line to be processed comprises the following steps:
determining a text line rotation mode of the text line to be processed according to relative position information between the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed;
the process for determining the text arrangement mode of the text line to be processed comprises the following steps:
and determining the character arrangement mode of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the character string local area center of the text line to be processed.
9. The method according to claim 8, wherein the text line rotation manner comprises a text line rotation direction and/or a text line rotation angle;
the process for determining the text line rotation direction of the text line to be processed comprises the following steps:
determining a third rotation direction of the text line to be processed according to a relative relationship and a position difference value in a third preset direction between the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed; determining the text line rotating direction of the text line to be processed according to the third rotating direction of the text line to be processed;
the process for determining the rotation angle of the text line to be processed comprises the following steps:
performing directional connection on the center of the first character area of the text line to be processed and the center of the local character string area of the text line to be processed to obtain a third vector; determining an included angle between the third vector and a third preset vector as a third rotation angle of the text line to be processed; and determining the rotation angle of the text line to be processed according to the third rotation angle of the text line to be processed.
10. The method according to claim 7, wherein if the text arrangement manner of the text line to be processed is horizontal, the local region of the character string of the text line to be processed is obtained by connecting the bottom boundary regions of the characters in the text line to be processed;
if the text arrangement mode of the text line to be processed is vertical, the local area of the character string of the text line to be processed is obtained by connecting the central areas of the characters in the text line to be processed.
11. The method according to claim 10, wherein the determining process of the text arrangement of the text line to be processed includes:
if the center of the global area of the text line to be processed is coincident with the center of the local area of the character string of the text line to be processed, determining that the text arrangement mode of the text line to be processed is vertical; and if the center of the global area of the text line to be processed is not coincident with the center of the local area of the character string of the text line to be processed, determining that the text arrangement mode of the text line to be processed is horizontal.
12. The method of claim 7, wherein the text line direction comprises at least one of a text line rotation direction, a text line rotation angle, or a text arrangement;
the process for determining the text line direction of the text line to be processed comprises the following steps:
determining a text line rotation angle of the text line to be processed according to relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed;
determining the text line rotation direction of the text line to be processed according to the relative position information between at least two of the global area center of the text line to be processed, the first character area center of the text line to be processed and the character string local area center of the text line to be processed;
and determining the character arrangement mode of the text line to be processed according to the relative position information between the global area center of the text line to be processed and the character string local area center of the text line to be processed.
13. An apparatus for determining a direction of a line of text, the apparatus comprising:
the image acquisition unit is used for acquiring an image to be processed;
the region determining unit is used for determining a global region of a text line to be processed and a local region of the text line to be processed in the image to be processed; the global area of the text line to be processed represents the area occupied by the text line to be processed in the image to be processed; the local area of the text line to be processed represents the area occupied by the preset part of the text line to be processed in the image to be processed;
and the direction determining unit is used for comparing the positions of the global area of the text line to be processed and the local area of the text line to be processed to obtain the text line direction of the text line to be processed.
14. An apparatus, comprising a processor and a memory:
the memory is used for storing a computer program;
the processor is configured to perform the method of any of claims 1-12 in accordance with the computer program.
15. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1-12.
16. A computer program product, characterized in that the computer program product, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468072.9A CN113191345A (en) | 2021-04-28 | 2021-04-28 | Text line direction determining method and related equipment thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468072.9A CN113191345A (en) | 2021-04-28 | 2021-04-28 | Text line direction determining method and related equipment thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113191345A true CN113191345A (en) | 2021-07-30 |
Family
ID=76980044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110468072.9A Pending CN113191345A (en) | 2021-04-28 | 2021-04-28 | Text line direction determining method and related equipment thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113191345A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150023593A1 (en) * | 2012-03-05 | 2015-01-22 | Omron Corporation | Image processing method for character recognition, character recognition apparatus using this method, and program |
CN109829437A (en) * | 2019-02-01 | 2019-05-31 | 北京旷视科技有限公司 | Image processing method, text recognition method, device and electronic system |
CN110659574A (en) * | 2019-08-22 | 2020-01-07 | 北京易道博识科技有限公司 | Method and system for outputting text line contents after status recognition of document image check box |
WO2020010547A1 (en) * | 2018-07-11 | 2020-01-16 | 深圳前海达闼云端智能科技有限公司 | Character identification method and apparatus, and storage medium and electronic device |
WO2020133442A1 (en) * | 2018-12-29 | 2020-07-02 | 华为技术有限公司 | Text recognition method and terminal device |
-
2021
- 2021-04-28 CN CN202110468072.9A patent/CN113191345A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150023593A1 (en) * | 2012-03-05 | 2015-01-22 | Omron Corporation | Image processing method for character recognition, character recognition apparatus using this method, and program |
WO2020010547A1 (en) * | 2018-07-11 | 2020-01-16 | 深圳前海达闼云端智能科技有限公司 | Character identification method and apparatus, and storage medium and electronic device |
WO2020133442A1 (en) * | 2018-12-29 | 2020-07-02 | 华为技术有限公司 | Text recognition method and terminal device |
CN109829437A (en) * | 2019-02-01 | 2019-05-31 | 北京旷视科技有限公司 | Image processing method, text recognition method, device and electronic system |
CN110659574A (en) * | 2019-08-22 | 2020-01-07 | 北京易道博识科技有限公司 | Method and system for outputting text line contents after status recognition of document image check box |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110490078B (en) | Monitoring video processing method, device, computer equipment and storage medium | |
US20180157965A1 (en) | Device and method for determining convolutional neural network model for database | |
CN112270686B (en) | Image segmentation model training method, image segmentation device and electronic equipment | |
CN110069989B (en) | Face image processing method and device and computer readable storage medium | |
WO2019152144A1 (en) | Object detection based on neural network | |
CN113313083B (en) | Text detection method and device | |
RU2697649C1 (en) | Methods and systems of document segmentation | |
CN113657274B (en) | Table generation method and device, electronic equipment and storage medium | |
JP2015176175A (en) | Information processing apparatus, information processing method and program | |
CN111292377B (en) | Target detection method, device, computer equipment and storage medium | |
CN110807110B (en) | Image searching method and device combining local and global features and electronic equipment | |
CN114021646A (en) | Image description text determination method and related equipment thereof | |
CN112597918A (en) | Text detection method and device, electronic equipment and storage medium | |
CN113205041A (en) | Structured information extraction method, device, equipment and storage medium | |
CN112802108A (en) | Target object positioning method and device, electronic equipment and readable storage medium | |
CN112733969B (en) | Object class identification method and device and server | |
CN117409419A (en) | Image detection method, device and storage medium | |
US20210042565A1 (en) | Method and device for updating database, electronic device, and computer storage medium | |
CN113191345A (en) | Text line direction determining method and related equipment thereof | |
KR20220073444A (en) | Method and apparatus for tracking object and terminal for performing the method | |
CN113780040A (en) | Lip key point positioning method and device, storage medium and electronic equipment | |
CN118115932A (en) | Image regressor training method, related method, device, equipment and medium | |
CN114445716B (en) | Key point detection method, key point detection device, computer device, medium, and program product | |
CN113160258B (en) | Method, system, server and storage medium for extracting building vector polygon | |
CN115205301A (en) | Image segmentation method and device based on characteristic space multi-view analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |