WO2021051527A1

WO2021051527A1 - Image segmentation-based text positioning method, apparatus and device, and storage medium

Info

Publication number: WO2021051527A1
Application number: PCT/CN2019/117036
Authority: WO
Inventors: 孙强
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-19
Filing date: 2019-11-11
Publication date: 2021-03-25
Also published as: CN110807454B; CN110807454A

Abstract

An image segmentation-based text positioning method, apparatus and device, and a storage medium, relating to the field of artificial intelligence. Said method comprises: acquiring an original image, the original image being a ticket image or a certificate image which is acquired under a text background (101); performing image segmentation on the original image by means of a preset image segmentation network model, so as to obtain a distorted image, the distorted image being the ticket image or the certificate image (102); performing affine transformation on the distorted image to obtain a distortion-corrected image, the text in the distortion-corrected image being text in a forward direction (103); and performing text positioning on the distortion-corrected image, so as to obtain a positioning result (104). Image segmentation network processing is performed on an image under a complex text background, so as to obtain an accurate image foreground image, and text positioning processing is performed on the image foreground image, so as to obtain a positioning result, so that the accuracy of image text positioning is improved, and the robustness of the complex background is enhanced.

Description

Character positioning method, device, equipment and storage medium based on image segmentation

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 19, 2019, the application number is 201910884634.0, and the invention title is "Image segmentation-based text positioning method, device, equipment, and storage medium". The entire content of the Chinese patent application Incorporated in the application by reference.

Technical field

This application relates to the field of computer technology, and in particular to text positioning methods, devices, equipment and storage media based on image segmentation.

Background technique

Optical character recognition (OCR) refers to the process in which electronic devices check the characters printed on paper, such as scanners or digital cameras, and then use character recognition methods to translate the shapes into computer text, that is, to scan text data. Then the image file is analyzed and processed to obtain text and layout information. OCR includes text positioning and text recognition. The text positioning is the precise positioning of the text position in the image, mainly based on the extraction of relevant text features.

In the prior art, a special scanner is usually used to scan the bills and documents, and the text on the bills and documents are converted into image information to obtain bill images and document images with higher image quality, and then the bills are scanned by OCR technology. The information in the image and the document image is converted into computer text. The inventor realized that using this method, the accuracy of text positioning for the bill image and the document image collected under a complex background is low.

Summary of the invention

The main purpose of this application is to solve the technical problem of low accuracy of text positioning from images with complex text backgrounds.

In order to achieve the above objective, the first aspect of the present application provides a text positioning method based on image segmentation, including: acquiring an original image, the original image being a bill image or a certificate image collected in a text background; The network model performs image segmentation on the original image to obtain a distorted image, where the distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distortion corrected image, the distortion The text in the corrected image is a forward text; the text positioning is performed on the image after the distortion correction to obtain the positioning result.

A second aspect of the present application provides a text positioning device based on image segmentation, including: an acquisition unit for acquiring an original image, the original image being a bill image or a document image collected in the context of the text; and a segmentation unit for Perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image; a transformation unit for performing affine transformation on the distorted image, Obtain a distortion-corrected image, and the text in the distortion-corrected image is a positive text; the positioning unit is used for positioning the text in the distortion-corrected image to obtain a positioning result.

A third aspect of the present application provides a text positioning device based on image segmentation, including: a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires; At least one processor calls the instructions in the memory, so that the text positioning device based on image segmentation executes the method described in the first aspect.

The fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.

In the technical solution provided by this application, an original image is acquired, and the original image is a bill image or a certificate image collected under a text background; the original image is image-segmented through a preset image segmentation network model to obtain a distorted image, so The distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distorted image, and the text in the distorted image is forward text; the distortion is corrected The corrected image is positioned for text, and the positioning result is obtained. In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.

Description of the drawings

FIG. 1 is a schematic diagram of an embodiment of a text positioning method based on image segmentation in an embodiment of the application;

2 is a schematic diagram of another embodiment of a text positioning method based on image segmentation in an embodiment of this application;

3 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application;

4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation in an embodiment of the application;

Fig. 5 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application.

detailed description

The embodiments of the present application provide a text positioning method, device, equipment, and storage medium based on image segmentation, which are used to obtain accurate image foreground images by performing image segmentation network processing on images under complex backgrounds, and according to preset templates Perform text positioning processing on the image foreground map to obtain positioning results, improve the accuracy of image text positioning, and enhance the robustness of complex backgrounds.

In order to enable those skilled in the art to better understand the solution of the present application, the embodiments of the present application will be described below in conjunction with the accompanying drawings in the embodiments of the present application.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. An embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:

101. Obtain an original image, where the original image is a bill image or a certificate image collected in a text background;

The server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Among them, there is a text background with strong interference in the original image, and a text background with strong interference refers to the presence of text targets in the background of the original image, especially handwritten numbers and printed text, adding direct positioning of the text in the original image The difficulty. Specifically, the server receives the bill image or the credential image collected in the context of the text, and sets the bill image or credential image as the original image; the server stores the original image in the preset path according to the preset format, and stores the original image The path is recorded in the data sheet.

It is understandable that the server stores the original image in the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image. Among them, the preset format includes preset naming rules and picture formats. The picture format is jpg, png or other types of picture formats, which are not specifically limited here. After the server names the original image according to the preset format, the server places the original image in a preset path, which is a pre-designated file directory. For example, the server receives the original image, the original image is the ID card image, the server names the ID card image card1.jpg, and stores card1.jpg in the directory /var/www/html/ID.

102. Perform image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;

The server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image. The distorted image is a bill image or a certificate image. Specifically, the server performs image segmentation on the original image according to a preset image segmentation network model to obtain a segmented label image; the server determines a mask image according to the segmented label image, and processes the original image according to the mask image to obtain a distorted image, where: The distorted image is a partial image obtained after the server separates the complex background in the original image. The shape of the partial image is an irregular quadrilateral, and the partial image includes a bill image or a certificate image.

It is understandable that the server trains the image segmentation network model according to the preset samples, determines the parameters in the image segmentation network model, and obtains the preset image segmentation network model, which is used to perform image segmentation on the original image .

103. Perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is forward text;

The server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text. Among them, the forward text refers to the text that is based on the horizontal reference and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion The text in the corrected image is forward text. Specifically, the server determines the affine transformation rule corresponding to the distorted image; the server performs affine transformation on the distorted image according to the mapping rule and the preset size to obtain a distortion-corrected image. It is understandable that the distorted image is an irregular quadrilateral image. The server performs distortion correction on the distorted image according to affine transformation to obtain a distorted image. The text in the distorted image is positive. After the distortion is corrected The size of the image is a preset fixed value, consistent with the template size corresponding to the distorted image.

It should be noted that the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, a point on the original image is mapped to a corresponding point on the target image. Including the rotation, translation, scaling and shearing of the original image.

104. Perform text positioning on the image after the distortion correction to obtain a positioning result.

The server performs text positioning on the image after the distortion correction to obtain the positioning result. Specifically, the server performs text positioning processing on the distortion-corrected image according to the preset algorithm and template to obtain the positioning result. Wherein, the template includes at least one rectangular frame, the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate value, the positioning result is the text positioning coordinate information selected from the image after the distortion correction, and the text positioning coordinate The amount of information is equal to the number of rectangular boxes. For example: for a rural commercial bank and a transfer check in the bill image after distortion correction, the server matches the corresponding template. There are two rectangular boxes in the template, which are used to indicate a rural commercial bank and the transfer check. The preset coordinate values of the two rectangular boxes determine the positioning result, which includes a rural commercial bank and transfer check and the preset coordinate values of the two rectangular boxes.

It is understandable that if the original image is directly annotated, each text in the original image area must be annotated. At the same time, in order to avoid the interference of the text background, a large number of original images containing different text backgrounds must be collected. When, continue to mark. For example, a bank note has n characters and m backgrounds. In the past, it had to be labeled n*m times. Now the labeling workload is n+m. If m is large, the adaptability of the positioning image to complex backgrounds is stronger, and the robustness is stronger. Among them, m is related to image segmentation processing, and it is sufficient to perform enhancement training on a large number of sample images.

In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.

Referring to FIG. 2, another embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:

201. Obtain an original image, where the original image is a bill image or a certificate image collected in the context of the text;

The server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Specifically, the server receives the bill image or the credential image collected in the text background, and sets the bill image or credential image as the original image; the server sets the name of the original image according to the preset format, and stores the original image in the preset path , Get the storage path of the original image, the preset path is the preset file directory, the preset format includes preset naming rules and picture format, the picture format is jpg, png or other types of picture formats, the specifics are not limited here ; The server writes the storage path of the original image and the name of the original image into the target data table.

For example, the server receives the bank note image and sets the bank note image as the original image, and at the same time names the original image bank1.jpg, and then the server stores bank1.jpg in the directory /var/www/html/bankimage; the server Write the storage path of the original image and the name of the original image into the target data table. For example, the name of the original image is bank1.jpg, and the storage path of the original image is /var/www/html/bankimage/bank1.jpg. The storage path of the image and the name of the original image generate structured data query language SQL insert statements, and write them into the target data table according to the SQL insert statements.

It should be noted that there is a strong noisy text background in the original image. The strong noisy text background refers to the existence of text targets in the background of the original image, especially handwritten numbers and printed text. If you directly locate the original image The text is difficult to locate.

202. Input the original image into a preset image segmentation network model, and perform image semantic segmentation on the original image through the preset image segmentation network model to obtain segmentation label images and image types;

The server inputs the original image into the preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain the segmentation label image and the image type. Further, the server uses a preset deeplabv3+ model to perform image semantic segmentation on the original image. It can be understood that the preset deeplabv3+ model is a preset image segmentation network model. The main purpose of the server to perform semantic image segmentation on the original image through the preset deeplabv3+ model is to specify a semantic label for each pixel of the original image, that is, the value of each pixel in the segmented label image represents the type of the pixel.

It should be noted that deeplabv3+ is a state-of-the-art deep learning model for image semantic segmentation. Its goal is to assign a semantic label to each pixel of the input image. Deeplabv3+ includes a simple and efficient decoder module that improves the segmentation results.

203. The original image is segmented according to the segmented label image to obtain a distorted image, where the distorted image is a bill image or a certificate image;

The server divides the original image according to the segmented label image to obtain a distorted image. The distorted image is a bill image or a certificate image. Specifically, the server determines the area to be divided according to the segmented label image, and sets the pixel value in the area to be divided to 1, and sets the pixel value outside the area to be divided to 0 to obtain the mask image; the server combines the original image and the mask image Multiplication is performed to obtain a distorted image. The distorted image is used to indicate the bill image or the document image separated from the text background from the original image.

Optionally, the server root compares the original image with the segmented label image to obtain a comparison result, and determines the area to be segmented according to the comparison result; the server segment the area to be segmented to obtain a distorted image, and the distorted image is a bill image or a certificate image; The server stores the distorted image.

It should be noted that since multiple certificates can exist in the original image, the final saved file is the foreground four-point coordinate image with the same name as the original image. For example, the server performs image segmentation processing on the certificate image named image1.png to obtain two For the eight coordinate points of the foreground image of each certificate, the server will digitally save the two foreground images of the certificate. The content of the file is as follows:

1|Coordinate 1, coordinate 2, coordinate 3, coordinate 4

2|Coordinate 1, coordinate 2, coordinate 3, coordinate 4

204. Perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text. Among them, the forward text refers to the text that takes the horizontal reference as the positive direction and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion The text in the corrected image is forward text. Specifically, the server determines the standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; The coordinates of each pixel reference point and the corresponding pixel coordinates are calculated to obtain the affine transformation matrix; the server performs affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correction, and the text in the image after distortion correction is forward text . For example, the server determines from the standard image of the ID card that the coordinates of the three pixel reference points are D(x ₁ ,y ₁ ), E(x ₂ ,y ₂ ), and F(x ₃ ,y ₃ ). The reference point coordinates D, E and F determine the corresponding pixel coordinates D'(x' ₁ ,y' ₁ ), E'(x' ₂ ,y' ₂ ) and F'(x' ₃ ,y' from the distorted image ₃ ), the server calculates according to the homogeneous coordinate formula, the homogeneous coordinate formula is as follows:

Among them, (x, y) corresponds to the pixel coordinates of the distorted image, (u, v) corresponds to the three pixel reference point coordinates of the standard image of the ID card, the server will D'(x' ₁ ,y' ₁ ), E '(x' ₂ ,y' ₂ ), F'(x' ₃ ,y' ₃ ) and D(x ₁ ,y ₁ ), E(x ₂ ,y ₂ ), F(x ₃ ,y ₃ ) successively Substitute into the homogeneous coordinate formula for calculation to obtain the affine transformation matrix, that is, the server determines the values of the affine transformation matrix variables a, b, c, d, e, and f, and the server affines the distorted image according to the affine transformation matrix After transformation, the ID card image after distortion correction is obtained, and the corresponding size of the ID card image after distortion correction is 85.6 mm times 54 mm. It is understandable that when performing affine transformation on the distorted image, the server also determines the rotation direction and the rotation angle, so that the text in the distortion corrected image is positive.

It should be noted that the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v). The distorted image is an irregular quadrilateral image. The affine transformation is to put on the original image A point of is mapped to the corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally the distorted image is transformed from an irregular quadrilateral to a rectangle.

205. Determine a template corresponding to the distortion-corrected image according to the image type, where the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;

The server determines a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values. Among them, the rectangular box is a rectangular area composed of 4 point coordinates. For example, the template corresponding to the front horizontal forward image of the ID card includes 6 rectangular boxes of name, gender, ethnicity, date of birth, address, and citizen ID number; bank; The corresponding template for the horizontal forward image of the card front includes a rectangular frame of the bank card number.

It should be noted that the template corresponding to the distortion-corrected image is consistent with the size of the distortion-corrected image. The template includes a rectangular frame indicating the location area where the forward text is located according to the preset coordinate values. The server matches the distortion according to the image type. After the corrected image obtains the template, further, the server determines the text of the distortion corrected image according to the rectangular frame in the template.

206. Perform text positioning on the distortion-corrected image according to the preset algorithm and template, to obtain a positioning result;

The server performs text positioning on the distortion-corrected image according to the preset algorithm and template, and obtains the positioning result. Specifically, the server determines the position information of the strip-shaped object to be divided in the distortion-corrected image according to the preset algorithm and template. The position information of the strip-shaped object includes the coordinates of the upper left point and the lower right point of the corresponding area and the corresponding The text positioning rules follow the order of positioning from the upper left coordinate to the lower right coordinate. The image after distortion correction is scanned line by line, and the same line of the same category information is located at the same time; the server will coordinate the upper left point and the lower right point And the corresponding text is set as the positioning result. For example, the server performs text positioning on the name area of the ID card, and the obtained text positioning results include the coordinates of the upper left point (13, 14), the coordinates of the lower right point (744, 49), and the name.

Optionally, the server uses the PixelLink algorithm to frame the text area of the image after distortion correction. PixelLink proposes instance segmentation to realize text detection. Based on the deep neural network algorithm DNN, two types of pixel prediction are performed, namely text/non-text prediction and link prediction. Specifically, the server marks the text pixels in the distortion-corrected image as positive according to the PixelLink algorithm, and marks the non-text of the distortion-corrected image as negative; the server determines whether the given pixel and an adjacent pixel of the pixel are Are located in the same instance; if a given pixel and an adjacent pixel of the pixel are located in the same instance, the server will mark the link between them as positive; if the given pixel and an adjacent pixel of the pixel are not located In the same instance, the server marks the link between them as negative, and each pixel has 8 neighbors. The predicted positive pixels are connected to the connected components CC through the predicted forward link. Each CC represents a detected text. The server will finally obtain the bounding box of each connected component as the final detection result, and the server will determine the coordinates of the final detection result. The information is set as the positioning result.

207. Store the positioning result in a preset file.

The server will locate the result into the preset file. Specifically, the server locates the image after the distortion correction to obtain multiple positioning rectangular areas. The server records the coordinates of the upper left point and the lower right point of each positioning rectangular area, and saves the multiple positioning results in a txt format. For example, the service performs text positioning for a rural commercial bank, and the positioning result includes 6 rectangular boxes and the text information obtained by the rectangular box positioning. The server saves it in the sds_0.txt file. The content of the file is as follows:

standard_build/sds_0.png|13 14 744 49|

standard_build/sds_0.png|22 52 645 88|

standard_build/sds_0.png|12 94 446 130|

standard_build/sds_0.png|28 135 775 170|

standard_build/sds_0.png|13 177 544 212|

standard_build/sds_0.png|22 217 348 252|;

It should be noted that the positioning result in the sds_0.txt file can be further used for text recognition. At the same time, the positioning result includes a preset mark, which is used to prompt the text recognition to discard the line. For example, for the positioning result standard_build/sds_0.png|13 14 744 49|XXXX, where XXXX is a preset mark, which is used to instruct the server not to perform text recognition. The positioning result can also be marked with other types of preset marks, which are not specifically limited here.

Optionally, the server determines the newly-added type of bill image or credential image; the server sets the newly-added type of bill image or credential image as the sample image to be trained; the server iterates the preset image segmentation network according to the sample image to be trained optimization. For example, the current bill types include 1 to 10 categories. When the detection increases to 11 categories, the newly-added bill image is set as the sample image to be trained, and the image segmentation network is iteratively optimized based on the 11th type of bill image . It is understandable that before the iterative optimization of the preset image segmentation network, the parameters in the preset image segmentation network are frozen, and then the iterative optimization is performed.

The text positioning method based on image segmentation in the embodiment of this application is described above, and the text positioning device based on image segmentation in the embodiment of this application is described below. Please refer to FIG. 3, the text positioning device based on image segmentation in the embodiment of this application An example of includes:

The acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;

The segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;

The transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.

Referring to FIG. 4, another embodiment of the text positioning device based on image segmentation in the embodiment of the present application includes:

Optionally, the dividing unit 302 may further include:

The input subunit 3021 is used to input the original image into the preset image segmentation network model;

The first segmentation subunit 3022 is configured to perform image semantic segmentation on the original image through a preset image segmentation network model to obtain segmentation label images and image types;

The second segmentation subunit 3023 is configured to segment the original image according to the segmented label image to obtain a distorted image, and the distorted image is a bill image or a certificate image.

Optionally, the second dividing subunit 3023 may also be specifically used for:

Determine the area to be segmented according to the segmentation label image, and set the pixel value in the area to be segmented to 1, and set the pixel value outside the area to be segmented to 0 to obtain a mask image;

The original image and the mask image are multiplied to obtain a distorted image. The distorted image is used to indicate the bill image or the document image separated from the text background from the original image.

Optionally, the transformation unit 303 may also be specifically configured to:

Determine the standard image corresponding to the distorted image according to the image type, and determine the coordinates of three pixel reference points from the standard image;

Determine the corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;

Calculate the affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;

According to the affine transformation matrix, the distorted image is subjected to affine transformation to obtain the image after distortion correction.

Optionally, the positioning unit 304 may also be specifically configured to:

Determine a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;

Perform text positioning on the image after distortion correction according to the preset algorithm and template to obtain the positioning result;

Store the positioning result in a preset file.

Optionally, the obtaining unit 301 may also be specifically configured to:

Receive the bill image or certificate image collected in the context of the text, and set the bill image or certificate image as the original image;

Set the name of the original image according to the preset format, and store the original image in the preset path to obtain the storage path of the original image;

Write the storage path of the original image and the name of the original image into the target data table.

Optionally, the text positioning device based on image segmentation may further include:

The determining unit 305 is used to determine the newly-added type of bill image or certificate image;

The setting unit 306 is configured to set the newly-added type of bill image or certificate image as the sample image to be trained;

The iterative unit 307 is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.

The above figures 3 and 4 describe in detail the text positioning device based on image segmentation in the embodiment of the present application from the perspective of modular functional entities. The following describes the text positioning device based on image segmentation in the embodiment of the present application in detail from the perspective of hardware processing. description.

FIG. 5 is a schematic structural diagram of a text positioning device based on image segmentation provided by an embodiment of the present application. The text positioning device 500 based on image segmentation may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506. Among them, the memory 509 and the storage medium 508 may be short-term storage or persistent storage. The program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for character positioning based on image segmentation. Further, the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the text positioning device 500 based on image segmentation.

The text positioning device 500 based on image segmentation may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or one or more operating systems 505, For example, Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the text positioning device based on image segmentation shown in FIG. 5 does not constitute a limitation on the text positioning device based on image segmentation, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:

Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;

Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;

Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The text positioning is performed on the image after the distortion correction to obtain the positioning result.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A text positioning method based on image segmentation, including:

Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;

Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;

Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The text positioning is performed on the image after the distortion correction to obtain the positioning result.
The text positioning method based on image segmentation according to claim 1, wherein the original image is image segmented through a preset image segmentation network model to obtain a distorted image, and the distorted image is the bill image or the certificate The image includes:

Input the original image into a preset image segmentation network model;

Performing image semantic segmentation on the original image by using the preset image segmentation network model to obtain segmentation label images and image types;

The original image is segmented according to the segmented label image to obtain a distorted image, and the distorted image is the bill image or the certificate image.
The character positioning method based on image segmentation according to claim 2, wherein the original image is segmented according to the segmented label image to obtain a distorted image, and the distorted image is the bill image or the certificate image. :

Determine the area to be divided according to the segmentation label image, and set the pixel value in the area to be divided to 1, and set the pixel value outside the area to be divided to 0 to obtain a mask image;

A multiplication operation is performed on the original image and the mask image to obtain a distorted image, and the distorted image is used to indicate the bill image or the certificate image separated from the text background from the original image.
The method for character positioning based on image segmentation according to claim 2, wherein said performing affine transformation on said distorted image to obtain a distortion-corrected image, wherein the text in said distortion-corrected image is forward text comprises:

Determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;

Determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;

Calculating an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;

Perform affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
The method for text positioning based on image segmentation according to claim 4, wherein said positioning the text on the image after distortion correction to obtain a positioning result comprises:

Determining a template corresponding to the distortion-corrected image according to the image type, the template including at least one rectangular frame, the rectangular frame being used to indicate the location area where the forward text is located according to preset coordinate values;

Perform text positioning on the distortion-corrected image according to a preset algorithm and the template to obtain a positioning result;

Store the positioning result in a preset file.
The method for text positioning based on image segmentation according to claim 1, wherein said acquiring an original image, said original image being a bill image or a certificate image collected in a text background, comprises:

Receiving the bill image or the credential image collected in the context of the text, and setting the bill image or the credential image as the original image;

Setting the name of the original image according to a preset format, and storing the original image in a preset path to obtain the storage path of the original image;

The storage path of the original image and the name of the original image are written into the target data table.
The text positioning method based on image segmentation according to any one of claims 1 to 6, wherein the text positioning of the image after the distortion correction is performed, and after the positioning result is obtained, the text positioning method based on image segmentation comprises :

Determine the newly-added type of bill image or certificate image;

Setting the newly-added type of bill image or certificate image as a sample image to be trained;

Perform iterative optimization on the preset image segmentation network model according to the sample image to be trained.
A text positioning device based on image segmentation. The text positioning device based on image segmentation includes:

An acquiring unit, configured to acquire an original image, the original image being a bill image or a certificate image collected in a text background;

A segmentation unit, configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;

A transformation unit, configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The positioning unit is used to position the text on the image after the distortion correction to obtain the positioning result.
The text positioning device based on image segmentation according to claim 8, wherein the segmentation unit comprises:

The input subunit is used to input the original image into a preset image segmentation network model;

The first segmentation subunit is configured to perform image semantic segmentation on the original image through the preset image segmentation network model to obtain segmentation label images and image types;

The second segmentation subunit is configured to segment the original image according to the segmented label image to obtain a distorted image, where the distorted image is the bill image or the certificate image.
The text positioning device based on image segmentation according to claim 9, wherein the second segmentation subunit is specifically configured to:

Determine the area to be divided according to the segmentation label image, and set the pixel value in the area to be divided to 1, and set the pixel value outside the area to be divided to 0 to obtain a mask image;

A multiplication operation is performed on the original image and the mask image to obtain a distorted image, and the distorted image is used to indicate the bill image or the certificate image separated from the text background from the original image.
According to the text positioning device based on image segmentation according to claim 9, the transformation unit is specifically configured to:

Determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;

Determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;

Calculating an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;

Perform affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
According to the text positioning device based on image segmentation according to claim 11, the positioning unit is specifically configured to:

Determining a template corresponding to the distortion-corrected image according to the image type, the template including at least one rectangular frame, the rectangular frame being used to indicate a location area where the forward text is located according to preset coordinate values;

Perform text positioning on the distortion-corrected image according to a preset algorithm and the template to obtain a positioning result; and store the positioning result in a preset file.
According to the text positioning device based on image segmentation according to claim 8, the acquiring unit is specifically configured to:

Receiving the bill image or the credential image collected in the context of the text, and setting the bill image or the credential image as the original image;

Setting the name of the original image according to a preset format, and storing the original image in a preset path to obtain the storage path of the original image;

The storage path of the original image and the name of the original image are written into the target data table.
The text positioning device based on image segmentation according to any one of claims 8 to 13, the text positioning device based on image segmentation further comprising:

The determining unit is used to determine the newly-added type of bill image or certificate image;

A setting unit, configured to set the newly-added type of bill image or certificate image as a sample image to be trained;

The iterative unit is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.
A text positioning device based on image segmentation includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:

Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;

Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;

Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The text positioning is performed on the image after the distortion correction to obtain the positioning result.
The text positioning device based on image segmentation according to claim 15, wherein the processor executes the computer program to realize the image segmentation of the original image through a preset image segmentation network model to obtain a distorted image, the distortion When the image is the bill image or the certificate image, the following steps are included:

Input the original image into a preset image segmentation network model;

Performing image semantic segmentation on the original image by using the preset image segmentation network model to obtain segmentation label images and image types;

The original image is segmented according to the segmented label image to obtain a distorted image, and the distorted image is the bill image or the certificate image.
The text positioning device based on image segmentation according to claim 16, wherein the processor executes the computer program to implement the segmentation of the original image according to the segmented label image to obtain a distorted image, and the distorted image is The bill image or the certificate image includes the following steps:

Determine the area to be divided according to the segmentation label image, and set the pixel value in the area to be divided to 1, and set the pixel value outside the area to be divided to 0 to obtain a mask image;

A multiplication operation is performed on the original image and the mask image to obtain a distorted image, and the distorted image is used to indicate the bill image or the certificate image separated from the text background from the original image.
The text positioning device based on image segmentation according to claim 16, wherein the processor executes the computer program to implement the affine transformation of the distorted image to obtain a distortion-corrected image, and the distortion-corrected image When the text in the image is forward text, include the following steps:

Determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;

Determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;

Calculating an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;

Perform affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
18. The text positioning device based on image segmentation according to claim 18, wherein said processor executes said computer program to realize said text positioning on said distortion-corrected image, and when a positioning result is obtained, it comprises the following steps:

Determining a template corresponding to the distortion-corrected image according to the image type, the template including at least one rectangular frame, the rectangular frame being used to indicate the location area where the forward text is located according to preset coordinate values;

Perform text positioning on the distortion-corrected image according to a preset algorithm and the template to obtain a positioning result;

Store the positioning result in a preset file.
A computer-readable storage medium in which computer instructions are stored, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps:

Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;

Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;

Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;

The text positioning is performed on the image after the distortion correction to obtain the positioning result.