WO2021051527A1 - Image segmentation-based text positioning method, apparatus and device, and storage medium - Google Patents

Image segmentation-based text positioning method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2021051527A1
WO2021051527A1 PCT/CN2019/117036 CN2019117036W WO2021051527A1 WO 2021051527 A1 WO2021051527 A1 WO 2021051527A1 CN 2019117036 W CN2019117036 W CN 2019117036W WO 2021051527 A1 WO2021051527 A1 WO 2021051527A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
text
segmentation
distorted
distortion
Prior art date
Application number
PCT/CN2019/117036
Other languages
French (fr)
Chinese (zh)
Inventor
孙强
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051527A1 publication Critical patent/WO2021051527A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This application relates to the field of computer technology, and in particular to text positioning methods, devices, equipment and storage media based on image segmentation.
  • OCR Optical character recognition
  • electronic devices check the characters printed on paper, such as scanners or digital cameras, and then use character recognition methods to translate the shapes into computer text, that is, to scan text data. Then the image file is analyzed and processed to obtain text and layout information.
  • OCR includes text positioning and text recognition. The text positioning is the precise positioning of the text position in the image, mainly based on the extraction of relevant text features.
  • the main purpose of this application is to solve the technical problem of low accuracy of text positioning from images with complex text backgrounds.
  • the first aspect of the present application provides a text positioning method based on image segmentation, including: acquiring an original image, the original image being a bill image or a certificate image collected in a text background;
  • the network model performs image segmentation on the original image to obtain a distorted image, where the distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distortion corrected image, the distortion
  • the text in the corrected image is a forward text; the text positioning is performed on the image after the distortion correction to obtain the positioning result.
  • a second aspect of the present application provides a text positioning device based on image segmentation, including: an acquisition unit for acquiring an original image, the original image being a bill image or a document image collected in the context of the text; and a segmentation unit for Perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image; a transformation unit for performing affine transformation on the distorted image, Obtain a distortion-corrected image, and the text in the distortion-corrected image is a positive text; the positioning unit is used for positioning the text in the distortion-corrected image to obtain a positioning result.
  • a third aspect of the present application provides a text positioning device based on image segmentation, including: a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires; At least one processor calls the instructions in the memory, so that the text positioning device based on image segmentation executes the method described in the first aspect.
  • the fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.
  • an original image is acquired, and the original image is a bill image or a certificate image collected under a text background; the original image is image-segmented through a preset image segmentation network model to obtain a distorted image, so The distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distorted image, and the text in the distorted image is forward text; the distortion is corrected The corrected image is positioned for text, and the positioning result is obtained.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • FIG. 1 is a schematic diagram of an embodiment of a text positioning method based on image segmentation in an embodiment of the application
  • FIG. 2 is a schematic diagram of another embodiment of a text positioning method based on image segmentation in an embodiment of this application;
  • FIG. 3 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application
  • FIG. 4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation in an embodiment of the application;
  • Fig. 5 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application.
  • the embodiments of the present application provide a text positioning method, device, equipment, and storage medium based on image segmentation, which are used to obtain accurate image foreground images by performing image segmentation network processing on images under complex backgrounds, and according to preset templates Perform text positioning processing on the image foreground map to obtain positioning results, improve the accuracy of image text positioning, and enhance the robustness of complex backgrounds.
  • An embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
  • the server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text.
  • a text background with strong interference refers to the presence of text targets in the background of the original image, especially handwritten numbers and printed text, adding direct positioning of the text in the original image
  • the difficulty Specifically, the server receives the bill image or the credential image collected in the context of the text, and sets the bill image or credential image as the original image; the server stores the original image in the preset path according to the preset format, and stores the original image The path is recorded in the data sheet.
  • the server stores the original image in the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image.
  • the preset format includes preset naming rules and picture formats.
  • the picture format is jpg, png or other types of picture formats, which are not specifically limited here.
  • the server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image.
  • the distorted image is a bill image or a certificate image.
  • the server performs image segmentation on the original image according to a preset image segmentation network model to obtain a segmented label image; the server determines a mask image according to the segmented label image, and processes the original image according to the mask image to obtain a distorted image, where:
  • the distorted image is a partial image obtained after the server separates the complex background in the original image.
  • the shape of the partial image is an irregular quadrilateral, and the partial image includes a bill image or a certificate image.
  • the server trains the image segmentation network model according to the preset samples, determines the parameters in the image segmentation network model, and obtains the preset image segmentation network model, which is used to perform image segmentation on the original image .
  • the server performs affine transformation on the distorted image to obtain a distortion-corrected image
  • the text in the distortion-corrected image is a forward text.
  • the forward text refers to the text that is based on the horizontal reference and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion
  • the text in the corrected image is forward text.
  • the server determines the affine transformation rule corresponding to the distorted image; the server performs affine transformation on the distorted image according to the mapping rule and the preset size to obtain a distortion-corrected image. It is understandable that the distorted image is an irregular quadrilateral image.
  • the server performs distortion correction on the distorted image according to affine transformation to obtain a distorted image.
  • the text in the distorted image is positive.
  • the size of the image is a preset fixed value, consistent with the template size corresponding to the distorted image.
  • the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, a point on the original image is mapped to a corresponding point on the target image. Including the rotation, translation, scaling and shearing of the original image.
  • the server performs text positioning on the image after the distortion correction to obtain the positioning result. Specifically, the server performs text positioning processing on the distortion-corrected image according to the preset algorithm and template to obtain the positioning result.
  • the template includes at least one rectangular frame, the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate value, the positioning result is the text positioning coordinate information selected from the image after the distortion correction, and the text positioning coordinate
  • the amount of information is equal to the number of rectangular boxes. For example: for a rural commercial bank and a transfer check in the bill image after distortion correction, the server matches the corresponding template. There are two rectangular boxes in the template, which are used to indicate a rural commercial bank and the transfer check. The preset coordinate values of the two rectangular boxes determine the positioning result, which includes a rural commercial bank and transfer check and the preset coordinate values of the two rectangular boxes.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • another embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
  • the server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Specifically, the server receives the bill image or the credential image collected in the text background, and sets the bill image or credential image as the original image; the server sets the name of the original image according to the preset format, and stores the original image in the preset path , Get the storage path of the original image, the preset path is the preset file directory, the preset format includes preset naming rules and picture format, the picture format is jpg, png or other types of picture formats, the specifics are not limited here ; The server writes the storage path of the original image and the name of the original image into the target data table.
  • the server receives the bank note image and sets the bank note image as the original image, and at the same time names the original image bank1.jpg, and then the server stores bank1.jpg in the directory /var/www/html/bankimage; the server Write the storage path of the original image and the name of the original image into the target data table.
  • the name of the original image is bank1.jpg
  • the storage path of the original image is /var/www/html/bankimage/bank1.jpg.
  • the storage path of the image and the name of the original image generate structured data query language SQL insert statements, and write them into the target data table according to the SQL insert statements.
  • the strong noisy text background refers to the existence of text targets in the background of the original image, especially handwritten numbers and printed text. If you directly locate the original image The text is difficult to locate.
  • the server inputs the original image into the preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain the segmentation label image and the image type. Further, the server uses a preset deeplabv3+ model to perform image semantic segmentation on the original image. It can be understood that the preset deeplabv3+ model is a preset image segmentation network model.
  • the main purpose of the server to perform semantic image segmentation on the original image through the preset deeplabv3+ model is to specify a semantic label for each pixel of the original image, that is, the value of each pixel in the segmented label image represents the type of the pixel.
  • Deeplabv3+ is a state-of-the-art deep learning model for image semantic segmentation. Its goal is to assign a semantic label to each pixel of the input image. Deeplabv3+ includes a simple and efficient decoder module that improves the segmentation results.
  • the original image is segmented according to the segmented label image to obtain a distorted image, where the distorted image is a bill image or a certificate image;
  • the server divides the original image according to the segmented label image to obtain a distorted image.
  • the distorted image is a bill image or a certificate image. Specifically, the server determines the area to be divided according to the segmented label image, and sets the pixel value in the area to be divided to 1, and sets the pixel value outside the area to be divided to 0 to obtain the mask image; the server combines the original image and the mask image Multiplication is performed to obtain a distorted image.
  • the distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
  • the server root compares the original image with the segmented label image to obtain a comparison result, and determines the area to be segmented according to the comparison result; the server segment the area to be segmented to obtain a distorted image, and the distorted image is a bill image or a certificate image; The server stores the distorted image.
  • the final saved file is the foreground four-point coordinate image with the same name as the original image.
  • the server performs image segmentation processing on the certificate image named image1.png to obtain two For the eight coordinate points of the foreground image of each certificate, the server will digitally save the two foreground images of the certificate.
  • the content of the file is as follows:
  • the server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
  • the forward text refers to the text that takes the horizontal reference as the positive direction and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion
  • the text in the corrected image is forward text.
  • the server determines the standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; The coordinates of each pixel reference point and the corresponding pixel coordinates are calculated to obtain the affine transformation matrix; the server performs affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correction, and the text in the image after distortion correction is forward text .
  • the server determines from the standard image of the ID card that the coordinates of the three pixel reference points are D(x 1 ,y 1 ), E(x 2 ,y 2 ), and F(x 3 ,y 3 ).
  • the reference point coordinates D, E and F determine the corresponding pixel coordinates D'(x' 1 ,y' 1 ), E'(x' 2 ,y' 2 ) and F'(x' 3 ,y' from the distorted image 3 ), the server calculates according to the homogeneous coordinate formula, the homogeneous coordinate formula is as follows:
  • (x, y) corresponds to the pixel coordinates of the distorted image
  • (u, v) corresponds to the three pixel reference point coordinates of the standard image of the ID card
  • the server will D'(x' 1 ,y' 1 ), E '(x' 2 ,y' 2 ), F'(x' 3 ,y' 3 ) and D(x 1 ,y 1 ), E(x 2 ,y 2 ), F(x 3 ,y 3 ) successively Substitute into the homogeneous coordinate formula for calculation to obtain the affine transformation matrix, that is, the server determines the values of the affine transformation matrix variables a, b, c, d, e, and f, and the server affines the distorted image according to the affine transformation matrix After transformation, the ID card image after distortion correction is obtained, and the corresponding size of the ID card image after distortion correction is 85.6 mm times 54 mm. It is understandable that when performing affine transformation on the distorted
  • the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v).
  • the distorted image is an irregular quadrilateral image.
  • the affine transformation is to put on the original image A point of is mapped to the corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally the distorted image is transformed from an irregular quadrilateral to a rectangle.
  • 205 Determine a template corresponding to the distortion-corrected image according to the image type, where the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
  • the server determines a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values.
  • the rectangular box is a rectangular area composed of 4 point coordinates.
  • the template corresponding to the front horizontal forward image of the ID card includes 6 rectangular boxes of name, gender, ethnicity, date of birth, address, and citizen ID number; bank;
  • the corresponding template for the horizontal forward image of the card front includes a rectangular frame of the bank card number.
  • the template corresponding to the distortion-corrected image is consistent with the size of the distortion-corrected image.
  • the template includes a rectangular frame indicating the location area where the forward text is located according to the preset coordinate values.
  • the server matches the distortion according to the image type. After the corrected image obtains the template, further, the server determines the text of the distortion corrected image according to the rectangular frame in the template.
  • the server performs text positioning on the distortion-corrected image according to the preset algorithm and template, and obtains the positioning result. Specifically, the server determines the position information of the strip-shaped object to be divided in the distortion-corrected image according to the preset algorithm and template.
  • the position information of the strip-shaped object includes the coordinates of the upper left point and the lower right point of the corresponding area and the corresponding
  • the text positioning rules follow the order of positioning from the upper left coordinate to the lower right coordinate.
  • the image after distortion correction is scanned line by line, and the same line of the same category information is located at the same time; the server will coordinate the upper left point and the lower right point And the corresponding text is set as the positioning result.
  • the server performs text positioning on the name area of the ID card, and the obtained text positioning results include the coordinates of the upper left point (13, 14), the coordinates of the lower right point (744, 49), and the name.
  • the server uses the PixelLink algorithm to frame the text area of the image after distortion correction.
  • PixelLink proposes instance segmentation to realize text detection.
  • DNN deep neural network algorithm
  • two types of pixel prediction are performed, namely text/non-text prediction and link prediction.
  • the server marks the text pixels in the distortion-corrected image as positive according to the PixelLink algorithm, and marks the non-text of the distortion-corrected image as negative; the server determines whether the given pixel and an adjacent pixel of the pixel are Are located in the same instance; if a given pixel and an adjacent pixel of the pixel are located in the same instance, the server will mark the link between them as positive; if the given pixel and an adjacent pixel of the pixel are not located In the same instance, the server marks the link between them as negative, and each pixel has 8 neighbors.
  • the predicted positive pixels are connected to the connected components CC through the predicted forward link. Each CC represents a detected text.
  • the server will finally obtain the bounding box of each connected component as the final detection result, and the server will determine the coordinates of the final detection result.
  • the information is set as the positioning result.
  • the server will locate the result into the preset file. Specifically, the server locates the image after the distortion correction to obtain multiple positioning rectangular areas. The server records the coordinates of the upper left point and the lower right point of each positioning rectangular area, and saves the multiple positioning results in a txt format. For example, the service performs text positioning for a rural commercial bank, and the positioning result includes 6 rectangular boxes and the text information obtained by the rectangular box positioning. The server saves it in the sds_0.txt file.
  • the content of the file is as follows:
  • the positioning result in the sds_0.txt file can be further used for text recognition.
  • the positioning result includes a preset mark, which is used to prompt the text recognition to discard the line.
  • a preset mark which is used to prompt the text recognition to discard the line.
  • XXXX where XXXX is a preset mark, which is used to instruct the server not to perform text recognition.
  • the positioning result can also be marked with other types of preset marks, which are not specifically limited here.
  • the server determines the newly-added type of bill image or credential image; the server sets the newly-added type of bill image or credential image as the sample image to be trained; the server iterates the preset image segmentation network according to the sample image to be trained optimization.
  • the current bill types include 1 to 10 categories.
  • the newly-added bill image is set as the sample image to be trained, and the image segmentation network is iteratively optimized based on the 11th type of bill image . It is understandable that before the iterative optimization of the preset image segmentation network, the parameters in the preset image segmentation network are frozen, and then the iterative optimization is performed.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • the text positioning device based on image segmentation in the embodiment of this application includes:
  • the acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
  • the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
  • the transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
  • the positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • another embodiment of the text positioning device based on image segmentation in the embodiment of the present application includes:
  • the acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
  • the segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
  • the transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
  • the positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
  • the dividing unit 302 may further include:
  • the input subunit 3021 is used to input the original image into the preset image segmentation network model
  • the first segmentation subunit 3022 is configured to perform image semantic segmentation on the original image through a preset image segmentation network model to obtain segmentation label images and image types;
  • the second segmentation subunit 3023 is configured to segment the original image according to the segmented label image to obtain a distorted image, and the distorted image is a bill image or a certificate image.
  • the second dividing subunit 3023 may also be specifically used for:
  • the original image and the mask image are multiplied to obtain a distorted image.
  • the distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
  • the transformation unit 303 may also be specifically configured to:
  • the distorted image is subjected to affine transformation to obtain the image after distortion correction.
  • the positioning unit 304 may also be specifically configured to:
  • the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
  • the obtaining unit 301 may also be specifically configured to:
  • the text positioning device based on image segmentation may further include:
  • the determining unit 305 is used to determine the newly-added type of bill image or certificate image
  • the setting unit 306 is configured to set the newly-added type of bill image or certificate image as the sample image to be trained
  • the iterative unit 307 is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.
  • an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
  • FIG. 5 is a schematic structural diagram of a text positioning device based on image segmentation provided by an embodiment of the present application.
  • the text positioning device 500 based on image segmentation may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506.
  • the memory 509 and the storage medium 508 may be short-term storage or persistent storage.
  • the program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for character positioning based on image segmentation.
  • the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the text positioning device 500 based on image segmentation.
  • the text positioning device 500 based on image segmentation may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or one or more operating systems 505, For example, Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc.
  • Windows Serve Windows Serve
  • Mac OS X Unix
  • Linux FreeBSD
  • FIG. 5 does not constitute a limitation on the text positioning device based on image segmentation, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
  • the text positioning is performed on the image after the distortion correction to obtain the positioning result.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

An image segmentation-based text positioning method, apparatus and device, and a storage medium, relating to the field of artificial intelligence. Said method comprises: acquiring an original image, the original image being a ticket image or a certificate image which is acquired under a text background (101); performing image segmentation on the original image by means of a preset image segmentation network model, so as to obtain a distorted image, the distorted image being the ticket image or the certificate image (102); performing affine transformation on the distorted image to obtain a distortion-corrected image, the text in the distortion-corrected image being text in a forward direction (103); and performing text positioning on the distortion-corrected image, so as to obtain a positioning result (104). Image segmentation network processing is performed on an image under a complex text background, so as to obtain an accurate image foreground image, and text positioning processing is performed on the image foreground image, so as to obtain a positioning result, so that the accuracy of image text positioning is improved, and the robustness of the complex background is enhanced.

Description

基于图像分割的文字定位方法、装置、设备及存储介质Character positioning method, device, equipment and storage medium based on image segmentation
本申请要求于2019年9月19日提交中国专利局、申请号为201910884634.0、发明名称为“基于图像分割的文字定位方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 19, 2019, the application number is 201910884634.0, and the invention title is "Image segmentation-based text positioning method, device, equipment, and storage medium". The entire content of the Chinese patent application Incorporated in the application by reference.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及基于图像分割的文字定位方法、装置、设备及存储介质。This application relates to the field of computer technology, and in particular to text positioning methods, devices, equipment and storage media based on image segmentation.
背景技术Background technique
光学字符识别(optical character recognition,OCR)是指电子设备检查纸上打印的字符,例如扫描仪或数码相机,然后用字符识别方法将形状翻译成计算机文字的过程,也就是对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息的过程。OCR包括文字定位和文字识别,其中文字定位是对图像中文本位置的精确定位,主要是根据提取相关的文字特征。Optical character recognition (OCR) refers to the process in which electronic devices check the characters printed on paper, such as scanners or digital cameras, and then use character recognition methods to translate the shapes into computer text, that is, to scan text data. Then the image file is analyzed and processed to obtain text and layout information. OCR includes text positioning and text recognition. The text positioning is the precise positioning of the text position in the image, mainly based on the extraction of relevant text features.
在现有技术中,通常是采用专用的扫描仪对票据和证件进行扫描,将票据和证件上的文字转化为图像信息,得到图像质量较高的票据图像和证件图像,再通过OCR技术将票据图像和证件图像中的信息转化为计算机文字,发明人意识到采用此种方式,对复杂背景下采集的票据图像和证件图像进行文字定位准确率低。In the prior art, a special scanner is usually used to scan the bills and documents, and the text on the bills and documents are converted into image information to obtain bill images and document images with higher image quality, and then the bills are scanned by OCR technology. The information in the image and the document image is converted into computer text. The inventor realized that using this method, the accuracy of text positioning for the bill image and the document image collected under a complex background is low.
发明内容Summary of the invention
本申请的主要目的在于解决了从复杂文字背景的图像中进行文字定位准确率低的技术问题。The main purpose of this application is to solve the technical problem of low accuracy of text positioning from images with complex text backgrounds.
为实现上述目的,本申请第一方面提供了一种基于图像分割的文字定位方法,包括:获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;对所述畸变校正后的图像进行文字定位,得到定位结果。In order to achieve the above objective, the first aspect of the present application provides a text positioning method based on image segmentation, including: acquiring an original image, the original image being a bill image or a certificate image collected in a text background; The network model performs image segmentation on the original image to obtain a distorted image, where the distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distortion corrected image, the distortion The text in the corrected image is a forward text; the text positioning is performed on the image after the distortion correction to obtain the positioning result.
本申请第二方面提供了一种基于图像分割的文字定位装置,包括:获取单元,用于获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证 件图像;分割单元,用于通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;变换单元,用于对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;定位单元,用于对所述畸变校正后的图像进行文字定位,得到定位结果。A second aspect of the present application provides a text positioning device based on image segmentation, including: an acquisition unit for acquiring an original image, the original image being a bill image or a document image collected in the context of the text; and a segmentation unit for Perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image; a transformation unit for performing affine transformation on the distorted image, Obtain a distortion-corrected image, and the text in the distortion-corrected image is a positive text; the positioning unit is used for positioning the text in the distortion-corrected image to obtain a positioning result.
本申请第三方面提供了一种基于图像分割的文字定位设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互联;所述至少一个处理器调用所述存储器中的所述指令,以使得所述基于图像分割的文字定位设备执行上述第一方面所述的方法。A third aspect of the present application provides a text positioning device based on image segmentation, including: a memory and at least one processor, the memory stores instructions, and the memory and the at least one processor are interconnected by wires; At least one processor calls the instructions in the memory, so that the text positioning device based on image segmentation executes the method described in the first aspect.
本申请的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。The fourth aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.
本申请提供的技术方案中,获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;对所述畸变校正后的图像进行文字定位,得到定位结果。本申请实施例中,通过对复杂背景下的图像进行图像分割网络处理,得到准确的图像前景图,并根据预置模板对图像前景图进行文字定位处理,得到定位结果,提高图像文字定位的精准度,增强复杂背景的鲁棒性。In the technical solution provided by this application, an original image is acquired, and the original image is a bill image or a certificate image collected under a text background; the original image is image-segmented through a preset image segmentation network model to obtain a distorted image, so The distorted image is the bill image or the certificate image; affine transformation is performed on the distorted image to obtain a distorted image, and the text in the distorted image is forward text; the distortion is corrected The corrected image is positioned for text, and the positioning result is obtained. In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
附图说明Description of the drawings
图1为本申请实施例中基于图像分割的文字定位方法的一个实施例示意图;FIG. 1 is a schematic diagram of an embodiment of a text positioning method based on image segmentation in an embodiment of the application;
图2为本申请实施例中基于图像分割的文字定位方法的另一个实施例示意图;2 is a schematic diagram of another embodiment of a text positioning method based on image segmentation in an embodiment of this application;
图3为本申请实施例中基于图像分割的文字定位装置的一个实施例示意图;3 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application;
图4为本申请实施例中基于图像分割的文字定位装置的另一个实施例示意图;4 is a schematic diagram of another embodiment of a text positioning device based on image segmentation in an embodiment of the application;
图5为本申请实施例中基于图像分割的文字定位设备的一个实施例示意图。Fig. 5 is a schematic diagram of an embodiment of a text positioning device based on image segmentation in an embodiment of the application.
具体实施方式detailed description
本申请实施例提供了一种基于图像分割的文字定位方法、装置、设备及存储介质,用于通过对复杂背景下的图像进行图像分割网络处理,得到准确的图像前景图,并根据预置模板对图像前景图进行文字定位处理,得到定位结果,提高图像文字定位的精准度,增强复杂背景的鲁棒性。The embodiments of the present application provide a text positioning method, device, equipment, and storage medium based on image segmentation, which are used to obtain accurate image foreground images by performing image segmentation network processing on images under complex backgrounds, and according to preset templates Perform text positioning processing on the image foreground map to obtain positioning results, improve the accuracy of image text positioning, and enhance the robustness of complex backgrounds.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例进行描述。In order to enable those skilled in the art to better understand the solution of the present application, the embodiments of the present application will be described below in conjunction with the accompanying drawings in the embodiments of the present application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中基于图像分割的文字定位方法的一个实施例包括:For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. An embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
101、获取原始图像,原始图像为在文字背景下采集的票据图像或者证件图像;101. Obtain an original image, where the original image is a bill image or a certificate image collected in a text background;
服务器获取原始图像,原始图像为在文字背景下采集的票据图像或者证件图像。其中,在原始图像中存在干扰性较强的文字背景,干扰性较强的文字背景是指原始图像的背景中存在文字目标,尤其是手写数字和打印文字,增加对原始图像内的文字直接定位的难度。具体的,服务器接收在文字背景下采集的票据图像或者证件图像,并将票据图像或者证件图像设置为原始图像;服务器根据预置格式将原始图像存储到预置路径中,并将原始图像的存储路径记录在数据表中。The server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Among them, there is a text background with strong interference in the original image, and a text background with strong interference refers to the presence of text targets in the background of the original image, especially handwritten numbers and printed text, adding direct positioning of the text in the original image The difficulty. Specifically, the server receives the bill image or the credential image collected in the context of the text, and sets the bill image or credential image as the original image; the server stores the original image in the preset path according to the preset format, and stores the original image The path is recorded in the data sheet.
可以理解的是,服务器根据预置格式将原始图像存储到预置路径中,得到原始图像的存储路径和原始图像的名称。其中,预置格式包括预设命名规则和图片格式,该图片格式为jpg、png或者其他类型的图片格式,具体此处不做 限定。服务器根据预置格式对原始图像命名后,服务器将原始图像放置在预置路径中,该预置路径为预先指定的文件目录。例如,服务器接收原始图像,原始图像为身份证图像,服务器将身份证图像命名为card1.jpg,并将card1.jpg存储在目录/var/www/html/ID下。It is understandable that the server stores the original image in the preset path according to the preset format, and obtains the storage path of the original image and the name of the original image. Among them, the preset format includes preset naming rules and picture formats. The picture format is jpg, png or other types of picture formats, which are not specifically limited here. After the server names the original image according to the preset format, the server places the original image in a preset path, which is a pre-designated file directory. For example, the server receives the original image, the original image is the ID card image, the server names the ID card image card1.jpg, and stores card1.jpg in the directory /var/www/html/ID.
102、通过预设图像分割网络模型对原始图像进行图像分割,得到畸变图像,畸变图像为票据图像或者证件图像;102. Perform image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
服务器通过预设图像分割网络模型对原始图像进行图像分割,得到畸变图像,畸变图像为票据图像或者证件图像。具体的,服务器根据预设图像分割网络模型对原始图像进行图像分割,得到分割标签图像;服务器根据分割标签图像确定掩膜图像,并根据掩膜图像对原始图像进行处理,得到畸变图像,其中,畸变图像为服务器将原始图像中的复杂背景进行分离以后得到的局部图像,该局部图像的形状为不规则四边形,局部图像包括票据图像或者证件图像。The server performs image segmentation on the original image through a preset image segmentation network model to obtain a distorted image. The distorted image is a bill image or a certificate image. Specifically, the server performs image segmentation on the original image according to a preset image segmentation network model to obtain a segmented label image; the server determines a mask image according to the segmented label image, and processes the original image according to the mask image to obtain a distorted image, where: The distorted image is a partial image obtained after the server separates the complex background in the original image. The shape of the partial image is an irregular quadrilateral, and the partial image includes a bill image or a certificate image.
可以理解的是,服务器根据预置样本对图像分割网络模型进行训练,确定图像分割网络模型中的参数,得到预设图像分割网络模型,该预设图像分割网络模型用于对原始图像进行图像分割。It is understandable that the server trains the image segmentation network model according to the preset samples, determines the parameters in the image segmentation network model, and obtains the preset image segmentation network model, which is used to perform image segmentation on the original image .
103、对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字;103. Perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is forward text;
服务器对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字。其中,正向文字是指为以水平基准为正向,并且上下不颠倒的文字,也就是将偏离水平基准的90度、180度和270度的畸变图像校正为偏离水平基准0度,使得畸变校正后的图像中的文字为正向文字。具体的,服务器确定畸变图像对应的仿射变换规则;服务器根据映射规则和预置尺寸对畸变图像进行仿射变换,得到畸变校正后的图像。可以理解的是,畸变图像是一个不规则四边形的图像,服务器根据仿射变换对畸变图像进行畸变校正,得到畸变校正后的图像,畸变校正后的图像中的文字为正向的,畸变校正后的图像的尺寸是一个预先设置的固定值,与畸变图像相对应的模板尺寸一致。The server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text. Among them, the forward text refers to the text that is based on the horizontal reference and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion The text in the corrected image is forward text. Specifically, the server determines the affine transformation rule corresponding to the distorted image; the server performs affine transformation on the distorted image according to the mapping rule and the preset size to obtain a distortion-corrected image. It is understandable that the distorted image is an irregular quadrilateral image. The server performs distortion correction on the distorted image according to affine transformation to obtain a distorted image. The text in the distorted image is positive. After the distortion is corrected The size of the image is a preset fixed value, consistent with the template size corresponding to the distorted image.
需要说明的是,仿射变换是一种二维坐标(x,y)到二维坐标(u,v)的线性变换,也就是把原图上的一个点映射到目标图上的对应点,包括对原图的 旋转、平移、缩放和切变。It should be noted that the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v), that is, a point on the original image is mapped to a corresponding point on the target image. Including the rotation, translation, scaling and shearing of the original image.
104、对畸变校正后的图像进行文字定位,得到定位结果。104. Perform text positioning on the image after the distortion correction to obtain a positioning result.
服务器对畸变校正后的图像进行文字定位,得到定位结果。具体的,服务器根据预置算法和模板对畸变校正后的图像进行文字定位处理,得到定位结果。其中,模板包括至少一个矩形框,矩形框用于指示根据预置坐标值标识正向文字所在的位置区域,定位结果为从畸变校正后的图像中框选的文字定位坐标信息,该文字定位坐标信息的数量与矩形框的数量相等。例如:针对畸变校正后的票据图像中的某农村商业银行和转账支票,服务器匹配得到对应的模板,模板中存在两个矩形框,用于指示某农村商业银行和转账支票,进一步地,服务器根据这两个矩形框的预置坐标值确定定位结果,定位结果包括某农村商业银行和转账支票以及两个矩形框的预置坐标值。The server performs text positioning on the image after the distortion correction to obtain the positioning result. Specifically, the server performs text positioning processing on the distortion-corrected image according to the preset algorithm and template to obtain the positioning result. Wherein, the template includes at least one rectangular frame, the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate value, the positioning result is the text positioning coordinate information selected from the image after the distortion correction, and the text positioning coordinate The amount of information is equal to the number of rectangular boxes. For example: for a rural commercial bank and a transfer check in the bill image after distortion correction, the server matches the corresponding template. There are two rectangular boxes in the template, which are used to indicate a rural commercial bank and the transfer check. The preset coordinate values of the two rectangular boxes determine the positioning result, which includes a rural commercial bank and transfer check and the preset coordinate values of the two rectangular boxes.
可以理解的是,若直接对原始图像进行标注,则要标注原始图像区域内的每个文字,同时为避免文字背景干扰,则要采集大量的包含不同文字背景的原始图像,当新增票据种类时,继续标注。例如,一张银行票据有n个文字,有m种背景,以往要进行n*m次标注,现在标注工作量为n+m。m大,定位图像对于复杂背景的适应性越强,鲁棒性也就越强,其中,m与图像分割处理相关,对大量样本图像进行增强训练即可。It is understandable that if the original image is directly annotated, each text in the original image area must be annotated. At the same time, in order to avoid the interference of the text background, a large number of original images containing different text backgrounds must be collected. When, continue to mark. For example, a bank note has n characters and m backgrounds. In the past, it had to be labeled n*m times. Now the labeling workload is n+m. If m is large, the adaptability of the positioning image to complex backgrounds is stronger, and the robustness is stronger. Among them, m is related to image segmentation processing, and it is sufficient to perform enhancement training on a large number of sample images.
本申请实施例中,通过对复杂背景下的图像进行图像分割网络处理,得到准确的图像前景图,并根据预置模板对图像前景图进行文字定位处理,得到定位结果,提高图像文字定位的精准度,增强复杂背景的鲁棒性。In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
请参阅图2,本申请实施例中基于图像分割的文字定位方法的另一个实施例包括:Referring to FIG. 2, another embodiment of the text positioning method based on image segmentation in the embodiment of the present application includes:
201、获取原始图像,原始图像为在文字背景下采集的票据图像或者证件图像;201. Obtain an original image, where the original image is a bill image or a certificate image collected in the context of the text;
服务器获取原始图像,原始图像为在文字背景下采集的票据图像或者证件图像。具体的,服务器接收在文字背景下采集的票据图像或者证件图像,并将票据图像或者证件图像设置为原始图像;服务器根据预置格式设置原始图像的名称,并将原始图像存储到预置路径中,得到原始图像的存储路径,预置路径为预先设置的文件目录,预置格式包括预设命名规则和图片格式,该图片格式 为jpg、png或者其他类型的图片格式,具体此处不做限定;服务器将原始图像的存储路径和原始图像的名称写入到目标数据表中。The server obtains the original image, and the original image is the bill image or the certificate image collected in the context of the text. Specifically, the server receives the bill image or the credential image collected in the text background, and sets the bill image or credential image as the original image; the server sets the name of the original image according to the preset format, and stores the original image in the preset path , Get the storage path of the original image, the preset path is the preset file directory, the preset format includes preset naming rules and picture format, the picture format is jpg, png or other types of picture formats, the specifics are not limited here ; The server writes the storage path of the original image and the name of the original image into the target data table.
举例说明,服务器接收银行票据图像,并将银行票据图像设置为原始图像,同时将该原始图像命名为bank1.jpg,然后服务器将bank1.jpg存储在目录/var/www/html/bankimage下;服务器将原始图像存储路径和原始图像的名称写入目标数据表中,例如,原始图像的名称为bank1.jpg,原始图像的存储路径为/var/www/html/bankimage/bank1.jpg,服务器根据原始图像的存储路径和原始图像的名称生成结构化数据查询语言SQL插入语句,并根据SQL插入语句写入目标数据表中。For example, the server receives the bank note image and sets the bank note image as the original image, and at the same time names the original image bank1.jpg, and then the server stores bank1.jpg in the directory /var/www/html/bankimage; the server Write the storage path of the original image and the name of the original image into the target data table. For example, the name of the original image is bank1.jpg, and the storage path of the original image is /var/www/html/bankimage/bank1.jpg. The storage path of the image and the name of the original image generate structured data query language SQL insert statements, and write them into the target data table according to the SQL insert statements.
需要说明的是,原始图像中存在干扰性较强的文字背景,其中,干扰性较强的文字背景是指原始图像背景中存在文字目标,尤其是手写数字和打印文字,若直接定位原始图像内的文字,则定位难度大。It should be noted that there is a strong noisy text background in the original image. The strong noisy text background refers to the existence of text targets in the background of the original image, especially handwritten numbers and printed text. If you directly locate the original image The text is difficult to locate.
202、将原始图像输入到预设图像分割网络模型中,并通过预设图像分割网络模型对原始图像进行图像语义分割,得到分割标签图像和图像类型;202. Input the original image into a preset image segmentation network model, and perform image semantic segmentation on the original image through the preset image segmentation network model to obtain segmentation label images and image types;
服务器将原始图像输入到预设图像分割网络模型中,并通过预设图像分割网络模型对原始图像进行图像语义分割,得到分割标签图像和图像类型。进一步地,服务器采用预置的deeplabv3+模型对原始图像进行图像语义分割,可以理解的是,预置的deeplabv3+模型为预设图像分割网络模型。服务器通过预置的deeplabv3+模型对原始图像进行语义图像分割的主要目的是为原始图像的每个像素指定语义标签,也就是分割标签图像中每个像素点的数值代表了其像素点所属的类型。The server inputs the original image into the preset image segmentation network model, and performs image semantic segmentation on the original image through the preset image segmentation network model to obtain the segmentation label image and the image type. Further, the server uses a preset deeplabv3+ model to perform image semantic segmentation on the original image. It can be understood that the preset deeplabv3+ model is a preset image segmentation network model. The main purpose of the server to perform semantic image segmentation on the original image through the preset deeplabv3+ model is to specify a semantic label for each pixel of the original image, that is, the value of each pixel in the segmented label image represents the type of the pixel.
需要说明的是,deeplabv3+是一种用于图像语义分割的顶尖深度学习模型,其目标是将语义标签分配给输入图像的每个像素,deeplabv3+包括一个简单而高效的改善分割结果的解码器模块。It should be noted that deeplabv3+ is a state-of-the-art deep learning model for image semantic segmentation. Its goal is to assign a semantic label to each pixel of the input image. Deeplabv3+ includes a simple and efficient decoder module that improves the segmentation results.
203、根据分割标签图像对原始图像进行分割,得到畸变图像,畸变图像为票据图像或者证件图像;203. The original image is segmented according to the segmented label image to obtain a distorted image, where the distorted image is a bill image or a certificate image;
服务器根据分割标签图像对原始图像进行分割,得到畸变图像,畸变图像为票据图像或者证件图像。具体的,服务器根据分割标签图像确定待分割区域,并待分割区域内的像素值设置为1,将待分割区域外的像素值设置为0,得到 掩膜图像;服务器将原始图像和掩膜图像进行乘法操作,得到畸变图像,畸变图像用于指示从原始图像中与文字背景分离的票据图像或者证件图像。The server divides the original image according to the segmented label image to obtain a distorted image. The distorted image is a bill image or a certificate image. Specifically, the server determines the area to be divided according to the segmented label image, and sets the pixel value in the area to be divided to 1, and sets the pixel value outside the area to be divided to 0 to obtain the mask image; the server combines the original image and the mask image Multiplication is performed to obtain a distorted image. The distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
可选的,服务器根将原始图像与分割标签图像进行比较,得到比较结果,并根据比较结果确定为待分割区域;服务器对待分割区域进行分割,得到畸变图像,畸变图像为票据图像或者证件图像;服务器将畸变图像进行存储。Optionally, the server root compares the original image with the segmented label image to obtain a comparison result, and determines the area to be segmented according to the comparison result; the server segment the area to be segmented to obtain a distorted image, and the distorted image is a bill image or a certificate image; The server stores the distorted image.
需要说明的是,由于原始图像中可以存在多个证件,最终保存文件为与原始图像名称相同的前景四点坐标图像,例如,服务器对名称为image1.png的证件图像进行图像分割处理,得到两个证件前景图像的八个坐标点,服务器将两个证件前景图像进行数字化保存,文件内容如下所示:It should be noted that since multiple certificates can exist in the original image, the final saved file is the foreground four-point coordinate image with the same name as the original image. For example, the server performs image segmentation processing on the certificate image named image1.png to obtain two For the eight coordinate points of the foreground image of each certificate, the server will digitally save the two foreground images of the certificate. The content of the file is as follows:
1|坐标1,坐标2,坐标3,坐标41|Coordinate 1, coordinate 2, coordinate 3, coordinate 4
2|坐标1,坐标2,坐标3,坐标42|Coordinate 1, coordinate 2, coordinate 3, coordinate 4
204、对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字;204. Perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
服务器对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字。其中,正向文字是指为以水平基准为正向,并且上下不颠倒的文字,也就是将偏离水平基准的90度、180度和270度的畸变图像校正为偏离水平基准0度,使得畸变校正后的图像中的文字为正向文字。具体的,服务器根据图像类型确定与畸变图像对应的标准图像,并从标准图像中确定三个像素参考点坐标;服务器根据三个像素参考点坐标从畸变图像中确定对应的像素坐标;服务器根据三个像素参考点坐标和对应的像素坐标计算得到仿射变换矩阵;服务器根据仿射变换矩阵对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字。例如,服务器从身份证的标准图像中确定三个像素参考点坐标为D(x 1,y 1)、E(x 2,y 2)和F(x 3,y 3),服务器根据三个像素参考点坐标D、E和F从畸变图像中确定对应的像素坐标D'(x' 1,y' 1)、E'(x' 2,y' 2)和F'(x' 3,y' 3),服务器根据齐次坐标公式进行计算,该齐次坐标公式如下所示: The server performs affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text. Among them, the forward text refers to the text that takes the horizontal reference as the positive direction and is not upside down, that is, the distortion image of 90 degrees, 180 degrees, and 270 degrees that deviate from the horizontal reference is corrected to 0 degrees from the horizontal reference, so that the distortion The text in the corrected image is forward text. Specifically, the server determines the standard image corresponding to the distorted image according to the image type, and determines three pixel reference point coordinates from the standard image; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; the server determines the corresponding pixel coordinates from the distorted image according to the three pixel reference point coordinates; The coordinates of each pixel reference point and the corresponding pixel coordinates are calculated to obtain the affine transformation matrix; the server performs affine transformation on the distorted image according to the affine transformation matrix to obtain the image after distortion correction, and the text in the image after distortion correction is forward text . For example, the server determines from the standard image of the ID card that the coordinates of the three pixel reference points are D(x 1 ,y 1 ), E(x 2 ,y 2 ), and F(x 3 ,y 3 ). The reference point coordinates D, E and F determine the corresponding pixel coordinates D'(x' 1 ,y' 1 ), E'(x' 2 ,y' 2 ) and F'(x' 3 ,y' from the distorted image 3 ), the server calculates according to the homogeneous coordinate formula, the homogeneous coordinate formula is as follows:
Figure PCTCN2019117036-appb-000001
Figure PCTCN2019117036-appb-000001
其中,(x,y)对应于畸变图像的像素坐标,(u,v)对应于身份证的标准 图像的三个像素参考点坐标,服务器将D'(x' 1,y' 1)、E'(x' 2,y' 2)、F'(x' 3,y' 3)和D(x 1,y 1)、E(x 2,y 2)、F(x 3,y 3)依次代入齐次坐标公式中进行计算,得到仿射变换矩阵,也就是服务器确定仿射变换矩阵变量a、b、c、d、e和f的值,服务器根据仿射变换矩阵对畸变图像进行仿射变换,得到畸变校正后的身份证图像,畸变校正后的身份证图像对应的尺寸为85.6毫米乘以54毫米。可以理解的是,当对畸变图像进行仿射变换时,服务器还确定了旋转方向和旋转角度,使得畸变校正后的图像中的文字是正向的。 Among them, (x, y) corresponds to the pixel coordinates of the distorted image, (u, v) corresponds to the three pixel reference point coordinates of the standard image of the ID card, the server will D'(x' 1 ,y' 1 ), E '(x' 2 ,y' 2 ), F'(x' 3 ,y' 3 ) and D(x 1 ,y 1 ), E(x 2 ,y 2 ), F(x 3 ,y 3 ) successively Substitute into the homogeneous coordinate formula for calculation to obtain the affine transformation matrix, that is, the server determines the values of the affine transformation matrix variables a, b, c, d, e, and f, and the server affines the distorted image according to the affine transformation matrix After transformation, the ID card image after distortion correction is obtained, and the corresponding size of the ID card image after distortion correction is 85.6 mm times 54 mm. It is understandable that when performing affine transformation on the distorted image, the server also determines the rotation direction and the rotation angle, so that the text in the distortion corrected image is positive.
需要说明的是,仿射变换是一种二维坐标(x,y)到二维坐标(u,v)的线性变换,畸变图像是一个不规则四边形的图像,仿射变换就是把原图上的一个点映射到目标图上的对应点,包括对原图的旋转、平移、缩放和切变,最后将畸变图像从不规则四边形变换为矩形。It should be noted that the affine transformation is a linear transformation from two-dimensional coordinates (x, y) to two-dimensional coordinates (u, v). The distorted image is an irregular quadrilateral image. The affine transformation is to put on the original image A point of is mapped to the corresponding point on the target image, including rotation, translation, scaling and shearing of the original image, and finally the distorted image is transformed from an irregular quadrilateral to a rectangle.
205、根据图像类型确定畸变校正后的图像对应的模板,模板包括至少一个矩形框,矩形框用于指示根据预置坐标值标识正向文字所在的位置区域;205. Determine a template corresponding to the distortion-corrected image according to the image type, where the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
服务器根据图像类型确定畸变校正后的图像对应的模板,模板包括至少一个矩形框,矩形框用于指示根据预置坐标值标识正向文字所在的位置区域。其中,矩形框为4个点坐标构成的矩形区域,例如,身份证正面水平正向图像对应的模板包括姓名、性别、民族、出生年月日、地址以及公民身份证号码6个矩形框;银行卡正面水平正向图像对应模板包括银行卡号码的1个矩形框。The server determines a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values. Among them, the rectangular box is a rectangular area composed of 4 point coordinates. For example, the template corresponding to the front horizontal forward image of the ID card includes 6 rectangular boxes of name, gender, ethnicity, date of birth, address, and citizen ID number; bank; The corresponding template for the horizontal forward image of the card front includes a rectangular frame of the bank card number.
需要说明的是,畸变校正后的图像对应的模板与畸变校正后的图像的尺寸一致,模板中包括指示根据预置坐标值标识正向文字所在的位置区域的矩形框,服务器根据图像类型匹配畸变校正后的图像得到模板后,进一步地,服务器根据模板中矩形框确定畸变校正后的图像的文字。It should be noted that the template corresponding to the distortion-corrected image is consistent with the size of the distortion-corrected image. The template includes a rectangular frame indicating the location area where the forward text is located according to the preset coordinate values. The server matches the distortion according to the image type. After the corrected image obtains the template, further, the server determines the text of the distortion corrected image according to the rectangular frame in the template.
206、根据预置算法和模板对畸变校正后的图像进行文字定位,得到定位结果;206. Perform text positioning on the distortion-corrected image according to the preset algorithm and template, to obtain a positioning result;
服务器根据预置算法和模板对畸变校正后的图像进行文字定位,得到定位结果。具体的,服务器根据预置算法和模板确定畸变校正后的图像的待分割的长条状对象的位置信息,长条状对象的位置信息包括相应区域的左上点坐标和右下点坐标以及相对应的文字,其中,文字定位规则遵循从左上坐标定位到右下坐标的顺序,对畸变校正后的图像进行逐行顺序扫描,同一行同一类别信息 同时定位;服务器将左上点坐标和右下点坐标以及相对应的文字设置为定位结果。例如,服务器对身份证的姓名区域进行文字定位,得到的文字定位结果包括左上点坐标(13,14)、右下点的坐标(744,49)和姓名。The server performs text positioning on the distortion-corrected image according to the preset algorithm and template, and obtains the positioning result. Specifically, the server determines the position information of the strip-shaped object to be divided in the distortion-corrected image according to the preset algorithm and template. The position information of the strip-shaped object includes the coordinates of the upper left point and the lower right point of the corresponding area and the corresponding The text positioning rules follow the order of positioning from the upper left coordinate to the lower right coordinate. The image after distortion correction is scanned line by line, and the same line of the same category information is located at the same time; the server will coordinate the upper left point and the lower right point And the corresponding text is set as the positioning result. For example, the server performs text positioning on the name area of the ID card, and the obtained text positioning results include the coordinates of the upper left point (13, 14), the coordinates of the lower right point (744, 49), and the name.
可选的,服务器采用PixelLink算法将畸变校正后的图像的文字区域框选出来。PixelLink提出实例分割来实现文本检测,基于深度神经网络算法DNN进行两种像素预测,也就是文本/非文本预测和link预测。具体的,服务器根据PixelLink算法将畸变校正后的图像中的文本像素标记为正,并将畸变校正后的图像的非文本标记为负;服务器判断给定的像素和该像素的一个相邻像素是否位于同一实例中;若给定的像素和该像素的一个相邻像素位于同一实例中,则服务器将它们之间的链接标记为正;若给定的像素和该像素的一个相邻像素不位于同一实例中,则服务器将它们之间的链接标记为负,每个像素有8个近邻。预测的正像素通过预测的正向链路连接在连通分量CC中,每个CC表示一个检测到的文本,服务器最终将得到各个连通分量的边界框作为最终检测结果,服务器将最终检测结果的坐标信息设置为定位结果。Optionally, the server uses the PixelLink algorithm to frame the text area of the image after distortion correction. PixelLink proposes instance segmentation to realize text detection. Based on the deep neural network algorithm DNN, two types of pixel prediction are performed, namely text/non-text prediction and link prediction. Specifically, the server marks the text pixels in the distortion-corrected image as positive according to the PixelLink algorithm, and marks the non-text of the distortion-corrected image as negative; the server determines whether the given pixel and an adjacent pixel of the pixel are Are located in the same instance; if a given pixel and an adjacent pixel of the pixel are located in the same instance, the server will mark the link between them as positive; if the given pixel and an adjacent pixel of the pixel are not located In the same instance, the server marks the link between them as negative, and each pixel has 8 neighbors. The predicted positive pixels are connected to the connected components CC through the predicted forward link. Each CC represents a detected text. The server will finally obtain the bounding box of each connected component as the final detection result, and the server will determine the coordinates of the final detection result. The information is set as the positioning result.
207、将定位结果存储到预置文件中。207. Store the positioning result in a preset file.
服务器将定位结果到预置文件中。具体的,服务器对畸变校正后的图像进行定位,得到的多个定位矩形区域,服务器记录各个定位矩形区域的左上点和右下点的坐标,并将多个定位结果保存为txt格式。例如,服务对某农村商业银行进行文字定位,得到定位结果包括6个矩形框和矩形框定位得到的文字信息,服务器将其保存到sds_0.txt文件中,文件内容如下所示:The server will locate the result into the preset file. Specifically, the server locates the image after the distortion correction to obtain multiple positioning rectangular areas. The server records the coordinates of the upper left point and the lower right point of each positioning rectangular area, and saves the multiple positioning results in a txt format. For example, the service performs text positioning for a rural commercial bank, and the positioning result includes 6 rectangular boxes and the text information obtained by the rectangular box positioning. The server saves it in the sds_0.txt file. The content of the file is as follows:
standard_build/sds_0.png|13 14 744 49|standard_build/sds_0.png|13 14 744 49|
standard_build/sds_0.png|22 52 645 88|standard_build/sds_0.png|22 52 645 88|
standard_build/sds_0.png|12 94 446 130|standard_build/sds_0.png|12 94 446 130|
standard_build/sds_0.png|28 135 775 170|standard_build/sds_0.png|28 135 775 170|
standard_build/sds_0.png|13 177 544 212|standard_build/sds_0.png|13 177 544 212|
standard_build/sds_0.png|22 217 348 252|;standard_build/sds_0.png|22 217 348 252|;
需要说明的是,sds_0.txt文件中的定位结果可进一步用于文字识别,同时定位结果中包括预置标识,用于提示文字识别丢弃该行,例如,对于定位结果standard_build/sds_0.png|13 14 744 49|XXXX,其中,XXXX为预置标识, 用于指示服务器不进行文字识别,定位结果也可以采用其他类型的预置标识进行标记,具体此处不做限定。It should be noted that the positioning result in the sds_0.txt file can be further used for text recognition. At the same time, the positioning result includes a preset mark, which is used to prompt the text recognition to discard the line. For example, for the positioning result standard_build/sds_0.png|13 14 744 49|XXXX, where XXXX is a preset mark, which is used to instruct the server not to perform text recognition. The positioning result can also be marked with other types of preset marks, which are not specifically limited here.
可选的,服务器确定新增类型的票据图像或者证件图像;服务器将新增类型的票据图像或者证件图像设置为待训练的样本图像;服务器根据待训练的样本图像对预设图像分割网络进行迭代优化。例如,当前的票据类型包括1至10类,当检测到增加到11类时,将新增类型的票据图像设置为待训练的样本图像,并根据第11类票据图像对图像分割网络进行迭代优化。可以理解的是,在对预设图像分割网络进行迭代优化前,冻结预设图像分割网络中的参数,再进行迭代优化。Optionally, the server determines the newly-added type of bill image or credential image; the server sets the newly-added type of bill image or credential image as the sample image to be trained; the server iterates the preset image segmentation network according to the sample image to be trained optimization. For example, the current bill types include 1 to 10 categories. When the detection increases to 11 categories, the newly-added bill image is set as the sample image to be trained, and the image segmentation network is iteratively optimized based on the 11th type of bill image . It is understandable that before the iterative optimization of the preset image segmentation network, the parameters in the preset image segmentation network are frozen, and then the iterative optimization is performed.
本申请实施例中,通过对复杂背景下的图像进行图像分割网络处理,得到准确的图像前景图,并根据预置模板对图像前景图进行文字定位处理,得到定位结果,提高图像文字定位的精准度,增强复杂背景的鲁棒性。In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
上面对本申请实施例中基于图像分割的文字定位方法进行了描述,下面对本申请实施例中基于图像分割的文字定位装置进行描述,请参阅图3,本申请实施例中基于图像分割的文字定位装置的一个实施例包括:The text positioning method based on image segmentation in the embodiment of this application is described above, and the text positioning device based on image segmentation in the embodiment of this application is described below. Please refer to FIG. 3, the text positioning device based on image segmentation in the embodiment of this application An example of includes:
获取单元301,用于获取原始图像,原始图像为在文字背景下采集的票据图像或者证件图像;The acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
分割单元302,用于通过预设图像分割网络模型对原始图像进行图像分割,得到畸变图像,畸变图像为票据图像或者证件图像;The segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
变换单元303,用于对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字;The transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
定位单元304,用于对畸变校正后的图像进行文字定位,得到定位结果。The positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
本申请实施例中,通过对复杂背景下的图像进行图像分割网络处理,得到准确的图像前景图,并根据预置模板对图像前景图进行文字定位处理,得到定位结果,提高图像文字定位的精准度,增强复杂背景的鲁棒性。In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
请参阅图4,本申请实施例中基于图像分割的文字定位装置的另一个实施例包括:Referring to FIG. 4, another embodiment of the text positioning device based on image segmentation in the embodiment of the present application includes:
获取单元301,用于获取原始图像,原始图像为在文字背景下采集的票据图像或者证件图像;The acquiring unit 301 is configured to acquire an original image, and the original image is a bill image or a certificate image collected in a text background;
分割单元302,用于通过预设图像分割网络模型对原始图像进行图像分割,得到畸变图像,畸变图像为票据图像或者证件图像;The segmentation unit 302 is configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is a bill image or a certificate image;
变换单元303,用于对畸变图像进行仿射变换,得到畸变校正后的图像,畸变校正后的图像中的文字为正向文字;The transformation unit 303 is configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
定位单元304,用于对畸变校正后的图像进行文字定位,得到定位结果。The positioning unit 304 is configured to perform text positioning on the image after the distortion correction to obtain a positioning result.
可选的,分割单元302还可以进一步包括:Optionally, the dividing unit 302 may further include:
输入子单元3021,用于将原始图像输入到预设图像分割网络模型中;The input subunit 3021 is used to input the original image into the preset image segmentation network model;
第一分割子单元3022,用于通过预设图像分割网络模型对原始图像进行图像语义分割,得到分割标签图像和图像类型;The first segmentation subunit 3022 is configured to perform image semantic segmentation on the original image through a preset image segmentation network model to obtain segmentation label images and image types;
第二分割子单元3023,用于根据分割标签图像对原始图像进行分割,得到畸变图像,畸变图像为票据图像或者证件图像。The second segmentation subunit 3023 is configured to segment the original image according to the segmented label image to obtain a distorted image, and the distorted image is a bill image or a certificate image.
可选的,第二分割子单元3023还可以具体用于:Optionally, the second dividing subunit 3023 may also be specifically used for:
根据分割标签图像确定待分割区域,并将待分割区域内的像素值设置为1,将待分割区域外的像素值设置为0,得到掩膜图像;Determine the area to be segmented according to the segmentation label image, and set the pixel value in the area to be segmented to 1, and set the pixel value outside the area to be segmented to 0 to obtain a mask image;
将原始图像和掩膜图像进行乘法操作,得到畸变图像,畸变图像用于指示从原始图像中与文字背景分离的票据图像或者证件图像。The original image and the mask image are multiplied to obtain a distorted image. The distorted image is used to indicate the bill image or the document image separated from the text background from the original image.
可选的,变换单元303还可以具体用于:Optionally, the transformation unit 303 may also be specifically configured to:
根据图像类型确定与畸变图像对应的标准图像,并从标准图像中确定三个像素参考点坐标;Determine the standard image corresponding to the distorted image according to the image type, and determine the coordinates of three pixel reference points from the standard image;
根据三个像素参考点坐标从畸变图像中确定对应的像素坐标;Determine the corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;
根据三个像素参考点坐标和对应的像素坐标计算得到仿射变换矩阵;Calculate the affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;
根据仿射变换矩阵对畸变图像进行仿射变换,得到畸变校正后的图像。According to the affine transformation matrix, the distorted image is subjected to affine transformation to obtain the image after distortion correction.
可选的,定位单元304还可以具体用于:Optionally, the positioning unit 304 may also be specifically configured to:
根据图像类型确定畸变校正后的图像对应的模板,模板包括至少一个矩形框,矩形框用于指示根据预置坐标值标识正向文字所在的位置区域;Determine a template corresponding to the distortion-corrected image according to the image type, the template includes at least one rectangular frame, and the rectangular frame is used to indicate the location area where the forward text is located according to the preset coordinate values;
根据预置算法和模板对畸变校正后的图像进行文字定位,得到定位结果;Perform text positioning on the image after distortion correction according to the preset algorithm and template to obtain the positioning result;
将定位结果存储到预置文件中。Store the positioning result in a preset file.
可选的,获取单元301还可以具体用于:Optionally, the obtaining unit 301 may also be specifically configured to:
接收在文字背景下采集的票据图像或者证件图像,并将票据图像或者证件 图像设置为原始图像;Receive the bill image or certificate image collected in the context of the text, and set the bill image or certificate image as the original image;
根据预置格式设置原始图像的名称,并将原始图像存储到预置路径中,得到原始图像的存储路径;Set the name of the original image according to the preset format, and store the original image in the preset path to obtain the storage path of the original image;
将原始图像的存储路径和原始图像的名称写入到目标数据表中。Write the storage path of the original image and the name of the original image into the target data table.
可选的,基于图像分割的文字定位装置还可以进一步包括:Optionally, the text positioning device based on image segmentation may further include:
确定单元305,用于确定新增类型的票据图像或者证件图像;The determining unit 305 is used to determine the newly-added type of bill image or certificate image;
设置单元306,用于将新增类型的票据图像或者证件图像设置为待训练的样本图像;The setting unit 306 is configured to set the newly-added type of bill image or certificate image as the sample image to be trained;
迭代单元307,用于根据待训练的样本图像对预设图像分割网络模型进行迭代优化。The iterative unit 307 is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.
本申请实施例中,通过对复杂背景下的图像进行图像分割网络处理,得到准确的图像前景图,并根据预置模板对图像前景图进行文字定位处理,得到定位结果,提高图像文字定位的精准度,增强复杂背景的鲁棒性。In the embodiment of the application, an accurate image foreground image is obtained by performing image segmentation network processing on an image under a complex background, and the image foreground image is subjected to text positioning processing according to a preset template to obtain the positioning result and improve the accuracy of image text positioning Enhance the robustness of complex backgrounds.
上面图3和图4从模块化功能实体的角度对本申请实施例中的基于图像分割的文字定位装置进行详细描述,下面从硬件处理的角度对本申请实施例中基于图像分割的文字定位设备进行详细描述。The above figures 3 and 4 describe in detail the text positioning device based on image segmentation in the embodiment of the present application from the perspective of modular functional entities. The following describes the text positioning device based on image segmentation in the embodiment of the present application in detail from the perspective of hardware processing. description.
图5是本申请实施例提供的一种基于图像分割的文字定位设备的结构示意图,该基于图像分割的文字定位设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)501(例如,一个或一个以上处理器)和存储器509,一个或一个以上存储应用程序507或数据506的存储介质508(例如一个或一个以上海量存储设备)。其中,存储器509和存储介质508可以是短暂存储或持久存储。存储在存储介质508的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对基于图像分割的文字定位中的一系列指令操作。更进一步地,处理器501可以设置为与存储介质508通信,在基于图像分割的文字定位设备500上执行存储介质508中的一系列指令操作。FIG. 5 is a schematic structural diagram of a text positioning device based on image segmentation provided by an embodiment of the present application. The text positioning device 500 based on image segmentation may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506. Among them, the memory 509 and the storage medium 508 may be short-term storage or persistent storage. The program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for character positioning based on image segmentation. Further, the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the text positioning device 500 based on image segmentation.
基于图像分割的文字定位设备500还可以包括一个或一个以上电源502,一个或一个以上有线或无线网络接口503,一个或一个以上输入输出接口504,和/或,一个或一个以上操作系统505,例如Windows Serve,Mac OS X,Unix, Linux,FreeBSD等等。本领域技术人员可以理解,图5中示出的基于图像分割的文字定位设备结构并不构成对基于图像分割的文字定位设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The text positioning device 500 based on image segmentation may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or one or more operating systems 505, For example, Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the text positioning device based on image segmentation shown in FIG. 5 does not constitute a limitation on the text positioning device based on image segmentation, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;
通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;
对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
对所述畸变校正后的图像进行文字定位,得到定位结果。The text positioning is performed on the image after the distortion correction to obtain the positioning result.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (20)

  1. 一种基于图像分割的文字定位方法,包括:A text positioning method based on image segmentation, including:
    获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;
    通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;
    对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
    对所述畸变校正后的图像进行文字定位,得到定位结果。The text positioning is performed on the image after the distortion correction to obtain the positioning result.
  2. 根据权利要求1所述的基于图像分割的文字定位方法,所述通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像包括:The text positioning method based on image segmentation according to claim 1, wherein the original image is image segmented through a preset image segmentation network model to obtain a distorted image, and the distorted image is the bill image or the certificate The image includes:
    将所述原始图像输入到预设图像分割网络模型中;Input the original image into a preset image segmentation network model;
    通过所述预设图像分割网络模型对原始图像进行图像语义分割,得到分割标签图像和图像类型;Performing image semantic segmentation on the original image by using the preset image segmentation network model to obtain segmentation label images and image types;
    根据所述分割标签图像对所述原始图像进行分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像。The original image is segmented according to the segmented label image to obtain a distorted image, and the distorted image is the bill image or the certificate image.
  3. 根据权利要求2所述的基于图像分割的文字定位方法,所述根据所述分割标签图像对所述原始图像进行分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像包括:The character positioning method based on image segmentation according to claim 2, wherein the original image is segmented according to the segmented label image to obtain a distorted image, and the distorted image is the bill image or the certificate image. :
    根据所述分割标签图像确定待分割区域,并将所述待分割区域内的像素值设置为1,将所述待分割区域外的像素值设置为0,得到掩膜图像;Determine the area to be divided according to the segmentation label image, and set the pixel value in the area to be divided to 1, and set the pixel value outside the area to be divided to 0 to obtain a mask image;
    将所述原始图像和所述掩膜图像进行乘法操作,得到畸变图像,所述畸变图像用于指示从所述原始图像中与所述文字背景分离的所述票据图像或者所述证件图像。A multiplication operation is performed on the original image and the mask image to obtain a distorted image, and the distorted image is used to indicate the bill image or the certificate image separated from the text background from the original image.
  4. 根据权利要求2所述的基于图像分割的文字定位方法,所述对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字包括:The method for character positioning based on image segmentation according to claim 2, wherein said performing affine transformation on said distorted image to obtain a distortion-corrected image, wherein the text in said distortion-corrected image is forward text comprises:
    根据所述图像类型确定与所述畸变图像对应的标准图像,并从所述标准图 像中确定三个像素参考点坐标;Determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;
    根据所述三个像素参考点坐标从所述畸变图像中确定对应的像素坐标;Determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;
    根据所述三个像素参考点坐标和所述对应的像素坐标计算得到仿射变换矩阵;Calculating an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;
    根据所述仿射变换矩阵对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字。Perform affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
  5. 根据权利要求4所述的基于图像分割的文字定位方法,所述对所述畸变校正后的图像进行文字定位,得到定位结果包括:The method for text positioning based on image segmentation according to claim 4, wherein said positioning the text on the image after distortion correction to obtain a positioning result comprises:
    根据所述图像类型确定所述畸变校正后的图像对应的模板,所述模板包括至少一个矩形框,所述矩形框用于指示根据预置坐标值标识所述正向文字所在的位置区域;Determining a template corresponding to the distortion-corrected image according to the image type, the template including at least one rectangular frame, the rectangular frame being used to indicate the location area where the forward text is located according to preset coordinate values;
    根据预置算法和所述模板对所述畸变校正后的图像进行文字定位,得到定位结果;Perform text positioning on the distortion-corrected image according to a preset algorithm and the template to obtain a positioning result;
    将所述定位结果存储到预置文件中。Store the positioning result in a preset file.
  6. 根据权利要求1所述的基于图像分割的文字定位方法,所述获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像包括:The method for text positioning based on image segmentation according to claim 1, wherein said acquiring an original image, said original image being a bill image or a certificate image collected in a text background, comprises:
    接收在文字背景下采集的票据图像或者证件图像,并将所述票据图像或者所述证件图像设置为原始图像;Receiving the bill image or the credential image collected in the context of the text, and setting the bill image or the credential image as the original image;
    根据预置格式设置所述原始图像的名称,并将所述原始图像存储到预置路径中,得到所述原始图像的存储路径;Setting the name of the original image according to a preset format, and storing the original image in a preset path to obtain the storage path of the original image;
    将所述原始图像的存储路径和所述原始图像的名称写入到目标数据表中。The storage path of the original image and the name of the original image are written into the target data table.
  7. 根据权利要求1至6中任一项所述的基于图像分割的文字定位方法,所述对所述畸变校正后的图像进行文字定位,得到定位结果之后,所述基于图像分割的文字定位方法包括:The text positioning method based on image segmentation according to any one of claims 1 to 6, wherein the text positioning of the image after the distortion correction is performed, and after the positioning result is obtained, the text positioning method based on image segmentation comprises :
    确定新增类型的票据图像或者证件图像;Determine the newly-added type of bill image or certificate image;
    将所述新增类型的票据图像或者证件图像设置为待训练的样本图像;Setting the newly-added type of bill image or certificate image as a sample image to be trained;
    根据所述待训练的样本图像对所述预设图像分割网络模型进行迭代优化。Perform iterative optimization on the preset image segmentation network model according to the sample image to be trained.
  8. 一种基于图像分割的文字定位装置,所述基于图像分割的文字定位装置包括:A text positioning device based on image segmentation. The text positioning device based on image segmentation includes:
    获取单元,用于获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;An acquiring unit, configured to acquire an original image, the original image being a bill image or a certificate image collected in a text background;
    分割单元,用于通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;A segmentation unit, configured to perform image segmentation on the original image through a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;
    变换单元,用于对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;A transformation unit, configured to perform affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
    定位单元,用于对所述畸变校正后的图像进行文字定位,得到定位结果。The positioning unit is used to position the text on the image after the distortion correction to obtain the positioning result.
  9. 根据权利要求8所述的基于图像分割的文字定位装置,所述分割单元包括:The text positioning device based on image segmentation according to claim 8, wherein the segmentation unit comprises:
    输入子单元,用于将所述原始图像输入到预设图像分割网络模型中;The input subunit is used to input the original image into a preset image segmentation network model;
    第一分割子单元,用于通过所述预设图像分割网络模型对原始图像进行图像语义分割,得到分割标签图像和图像类型;The first segmentation subunit is configured to perform image semantic segmentation on the original image through the preset image segmentation network model to obtain segmentation label images and image types;
    第二分割子单元,用于根据所述分割标签图像对所述原始图像进行分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像。The second segmentation subunit is configured to segment the original image according to the segmented label image to obtain a distorted image, where the distorted image is the bill image or the certificate image.
  10. 根据权利要求9所述的基于图像分割的文字定位装置,所述第二分割子单元具体用于:The text positioning device based on image segmentation according to claim 9, wherein the second segmentation subunit is specifically configured to:
    根据所述分割标签图像确定待分割区域,并将所述待分割区域内的像素值设置为1,将所述待分割区域外的像素值设置为0,得到掩膜图像;Determine the area to be divided according to the segmentation label image, and set the pixel value in the area to be divided to 1, and set the pixel value outside the area to be divided to 0 to obtain a mask image;
    将所述原始图像和所述掩膜图像进行乘法操作,得到畸变图像,所述畸变图像用于指示从所述原始图像中与所述文字背景分离的所述票据图像或者所述证件图像。A multiplication operation is performed on the original image and the mask image to obtain a distorted image, and the distorted image is used to indicate the bill image or the certificate image separated from the text background from the original image.
  11. 根据权利要求9所述的基于图像分割的文字定位装置,所述变换单元具体用于:According to the text positioning device based on image segmentation according to claim 9, the transformation unit is specifically configured to:
    根据所述图像类型确定与所述畸变图像对应的标准图像,并从所述标准图像中确定三个像素参考点坐标;Determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;
    根据所述三个像素参考点坐标从所述畸变图像中确定对应的像素坐标;Determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;
    根据所述三个像素参考点坐标和所述对应的像素坐标计算得到仿射变换矩阵;Calculating an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;
    根据所述仿射变换矩阵对所述畸变图像进行仿射变换,得到畸变校正后的 图像,所述畸变校正后的图像中的文字为正向文字。Perform affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
  12. 根据权利要求11所述的基于图像分割的文字定位装置,所述定位单元具体用于:According to the text positioning device based on image segmentation according to claim 11, the positioning unit is specifically configured to:
    根据所述图像类型确定与所述畸变校正后的图像对应的模板,所述模板包括至少一个矩形框,所述矩形框用于指示根据预置坐标值标识所述正向文字所在的位置区域;Determining a template corresponding to the distortion-corrected image according to the image type, the template including at least one rectangular frame, the rectangular frame being used to indicate a location area where the forward text is located according to preset coordinate values;
    根据预置算法和所述模板对所述畸变校正后的图像进行文字定位,得到定位结果;将所述定位结果存储到预置文件中。Perform text positioning on the distortion-corrected image according to a preset algorithm and the template to obtain a positioning result; and store the positioning result in a preset file.
  13. 根据权利要求8所述的基于图像分割的文字定位装置,所述获取单元具体用于:According to the text positioning device based on image segmentation according to claim 8, the acquiring unit is specifically configured to:
    接收在文字背景下采集的票据图像或者证件图像,并将所述票据图像或者所述证件图像设置为原始图像;Receiving the bill image or the credential image collected in the context of the text, and setting the bill image or the credential image as the original image;
    根据预置格式设置所述原始图像的名称,并将所述原始图像存储到预置路径中,得到所述原始图像的存储路径;Setting the name of the original image according to a preset format, and storing the original image in a preset path to obtain the storage path of the original image;
    将所述原始图像的存储路径和所述原始图像的名称写入到目标数据表中。The storage path of the original image and the name of the original image are written into the target data table.
  14. 根据权利要求8至13中任一项所述的基于图像分割的文字定位装置,所述基于图像分割的文字定位装置还包括:The text positioning device based on image segmentation according to any one of claims 8 to 13, the text positioning device based on image segmentation further comprising:
    确定单元,用于确定新增类型的票据图像或者证件图像;The determining unit is used to determine the newly-added type of bill image or certificate image;
    设置单元,用于将所述新增类型的票据图像或者证件图像设置为待训练的样本图像;A setting unit, configured to set the newly-added type of bill image or certificate image as a sample image to be trained;
    迭代单元,用于根据所述待训练的样本图像对所述预设图像分割网络模型进行迭代优化。The iterative unit is configured to iteratively optimize the preset image segmentation network model according to the sample image to be trained.
  15. 一种基于图像分割的文字定位设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A text positioning device based on image segmentation includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:
    获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;
    通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;
    对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
    对所述畸变校正后的图像进行文字定位,得到定位结果。The text positioning is performed on the image after the distortion correction to obtain the positioning result.
  16. 根据权利要求15所述的基于图像分割的文字定位设备,所述处理器执行所述计算机程序实现所述通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像时,包括以下步骤:The text positioning device based on image segmentation according to claim 15, wherein the processor executes the computer program to realize the image segmentation of the original image through a preset image segmentation network model to obtain a distorted image, the distortion When the image is the bill image or the certificate image, the following steps are included:
    将所述原始图像输入到预设图像分割网络模型中;Input the original image into a preset image segmentation network model;
    通过所述预设图像分割网络模型对原始图像进行图像语义分割,得到分割标签图像和图像类型;Performing image semantic segmentation on the original image by using the preset image segmentation network model to obtain segmentation label images and image types;
    根据所述分割标签图像对所述原始图像进行分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像。The original image is segmented according to the segmented label image to obtain a distorted image, and the distorted image is the bill image or the certificate image.
  17. 根据权利要求16所述的基于图像分割的文字定位设备,所述处理器执行所述计算机程序实现所述根据所述分割标签图像对所述原始图像进行分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像时,包括以下步骤:The text positioning device based on image segmentation according to claim 16, wherein the processor executes the computer program to implement the segmentation of the original image according to the segmented label image to obtain a distorted image, and the distorted image is The bill image or the certificate image includes the following steps:
    根据所述分割标签图像确定待分割区域,并将所述待分割区域内的像素值设置为1,将所述待分割区域外的像素值设置为0,得到掩膜图像;Determine the area to be divided according to the segmentation label image, and set the pixel value in the area to be divided to 1, and set the pixel value outside the area to be divided to 0 to obtain a mask image;
    将所述原始图像和所述掩膜图像进行乘法操作,得到畸变图像,所述畸变图像用于指示从所述原始图像中与所述文字背景分离的所述票据图像或者所述证件图像。A multiplication operation is performed on the original image and the mask image to obtain a distorted image, and the distorted image is used to indicate the bill image or the certificate image separated from the text background from the original image.
  18. 根据权利要求16所述的基于图像分割的文字定位设备,所述处理器执行所述计算机程序实现所述对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字时,包括以下步骤:The text positioning device based on image segmentation according to claim 16, wherein the processor executes the computer program to implement the affine transformation of the distorted image to obtain a distortion-corrected image, and the distortion-corrected image When the text in the image is forward text, include the following steps:
    根据所述图像类型确定与所述畸变图像对应的标准图像,并从所述标准图像中确定三个像素参考点坐标;Determining a standard image corresponding to the distorted image according to the image type, and determining three pixel reference point coordinates from the standard image;
    根据所述三个像素参考点坐标从所述畸变图像中确定对应的像素坐标;Determining corresponding pixel coordinates from the distorted image according to the coordinates of the three pixel reference points;
    根据所述三个像素参考点坐标和所述对应的像素坐标计算得到仿射变换矩阵;Calculating an affine transformation matrix according to the coordinates of the three pixel reference points and the corresponding pixel coordinates;
    根据所述仿射变换矩阵对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字。Perform affine transformation on the distorted image according to the affine transformation matrix to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text.
  19. 根据权利要求18所述的基于图像分割的文字定位设备,所述处理器执行所述计算机程序实现所述对所述畸变校正后的图像进行文字定位,得到定位结果时,包括以下步骤:18. The text positioning device based on image segmentation according to claim 18, wherein said processor executes said computer program to realize said text positioning on said distortion-corrected image, and when a positioning result is obtained, it comprises the following steps:
    根据所述图像类型确定所述畸变校正后的图像对应的模板,所述模板包括至少一个矩形框,所述矩形框用于指示根据预置坐标值标识所述正向文字所在的位置区域;Determining a template corresponding to the distortion-corrected image according to the image type, the template including at least one rectangular frame, the rectangular frame being used to indicate the location area where the forward text is located according to preset coordinate values;
    根据预置算法和所述模板对所述畸变校正后的图像进行文字定位,得到定位结果;Perform text positioning on the distortion-corrected image according to a preset algorithm and the template to obtain a positioning result;
    将所述定位结果存储到预置文件中。Store the positioning result in a preset file.
  20. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:A computer-readable storage medium in which computer instructions are stored, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps:
    获取原始图像,所述原始图像为在文字背景下采集的票据图像或者证件图像;Acquiring an original image, where the original image is a bill image or a certificate image collected in a text background;
    通过预设图像分割网络模型对所述原始图像进行图像分割,得到畸变图像,所述畸变图像为所述票据图像或者所述证件图像;Performing image segmentation on the original image by using a preset image segmentation network model to obtain a distorted image, where the distorted image is the bill image or the certificate image;
    对所述畸变图像进行仿射变换,得到畸变校正后的图像,所述畸变校正后的图像中的文字为正向文字;Performing affine transformation on the distorted image to obtain a distortion-corrected image, and the text in the distortion-corrected image is a forward text;
    对所述畸变校正后的图像进行文字定位,得到定位结果。The text positioning is performed on the image after the distortion correction to obtain the positioning result.
PCT/CN2019/117036 2019-09-19 2019-11-11 Image segmentation-based text positioning method, apparatus and device, and storage medium WO2021051527A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910884634.0 2019-09-19
CN201910884634.0A CN110807454B (en) 2019-09-19 2019-09-19 Text positioning method, device, equipment and storage medium based on image segmentation

Publications (1)

Publication Number Publication Date
WO2021051527A1 true WO2021051527A1 (en) 2021-03-25

Family

ID=69487698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117036 WO2021051527A1 (en) 2019-09-19 2019-11-11 Image segmentation-based text positioning method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN110807454B (en)
WO (1) WO2021051527A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111880A (en) * 2021-05-12 2021-07-13 中国平安人寿保险股份有限公司 Certificate image correction method and device, electronic equipment and storage medium
CN113687823A (en) * 2021-07-30 2021-11-23 稿定(厦门)科技有限公司 Quadrilateral block nonlinear transformation method and system based on HTML (Hypertext markup language)
CN114140811A (en) * 2021-11-04 2022-03-04 北京中交兴路信息科技有限公司 Certificate sample generation method and device, electronic equipment and storage medium
CN114565915A (en) * 2022-04-24 2022-05-31 深圳思谋信息科技有限公司 Sample text image acquisition method, text recognition model training method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343965A (en) * 2020-03-02 2021-09-03 北京有限元科技有限公司 Image tilt correction method, apparatus and storage medium
CN111768345B (en) * 2020-05-12 2023-07-14 北京奇艺世纪科技有限公司 Correction method, device, equipment and storage medium for identity card back image
CN113963339B (en) * 2021-09-02 2024-08-13 泰康保险集团股份有限公司 Information extraction method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458770A (en) * 2008-12-24 2009-06-17 北京文通科技有限公司 Character recognition method and system
CN101515984A (en) * 2008-02-19 2009-08-26 佳能株式会社 Electronic document producing device and electronic document producing method
US20170124417A1 (en) * 2014-11-14 2017-05-04 Adobe Systems Incorporated Facilitating Text Identification and Editing in Images
CN108885699A (en) * 2018-07-11 2018-11-23 深圳前海达闼云端智能科技有限公司 Character identifying method, device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005093653A1 (en) * 2004-03-25 2005-10-06 Sanyo Electric Co., Ltd Image correcting device and method, image correction database creating method, information data providing device, image processing device, information terminal, and information database device
CN105574513B (en) * 2015-12-22 2017-11-24 北京旷视科技有限公司 Character detecting method and device
CN109993160B (en) * 2019-02-18 2022-02-25 北京联合大学 Image correction and text and position identification method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515984A (en) * 2008-02-19 2009-08-26 佳能株式会社 Electronic document producing device and electronic document producing method
CN101458770A (en) * 2008-12-24 2009-06-17 北京文通科技有限公司 Character recognition method and system
US20170124417A1 (en) * 2014-11-14 2017-05-04 Adobe Systems Incorporated Facilitating Text Identification and Editing in Images
CN108885699A (en) * 2018-07-11 2018-11-23 深圳前海达闼云端智能科技有限公司 Character identifying method, device, storage medium and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111880A (en) * 2021-05-12 2021-07-13 中国平安人寿保险股份有限公司 Certificate image correction method and device, electronic equipment and storage medium
CN113111880B (en) * 2021-05-12 2023-10-17 中国平安人寿保险股份有限公司 Certificate image correction method, device, electronic equipment and storage medium
CN113687823A (en) * 2021-07-30 2021-11-23 稿定(厦门)科技有限公司 Quadrilateral block nonlinear transformation method and system based on HTML (Hypertext markup language)
CN113687823B (en) * 2021-07-30 2023-08-01 稿定(厦门)科技有限公司 Quadrilateral block nonlinear transformation method and system based on HTML
CN114140811A (en) * 2021-11-04 2022-03-04 北京中交兴路信息科技有限公司 Certificate sample generation method and device, electronic equipment and storage medium
CN114565915A (en) * 2022-04-24 2022-05-31 深圳思谋信息科技有限公司 Sample text image acquisition method, text recognition model training method and device

Also Published As

Publication number Publication date
CN110807454B (en) 2024-05-14
CN110807454A (en) 2020-02-18

Similar Documents

Publication Publication Date Title
WO2021051527A1 (en) Image segmentation-based text positioning method, apparatus and device, and storage medium
US11645826B2 (en) Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
CN110569832B (en) Text real-time positioning and identifying method based on deep learning attention mechanism
US20190304066A1 (en) Synthesis method of chinese printed character images and device thereof
WO2018233055A1 (en) Method and apparatus for entering policy information, computer device and storage medium
CN110874618B (en) OCR template learning method and device based on small sample, electronic equipment and medium
CN109255300B (en) Bill information extraction method, bill information extraction device, computer equipment and storage medium
US11341605B1 (en) Document rectification via homography recovery using machine learning
Khare et al. Arbitrarily-oriented multi-lingual text detection in video
CN109635743A (en) A kind of text detection deep learning method and system of combination STN module
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
Zhang et al. Marior: Margin removal and iterative content rectification for document dewarping in the wild
CN112396047B (en) Training sample generation method and device, computer equipment and storage medium
CN112580499A (en) Text recognition method, device, equipment and storage medium
CN111145124A (en) Image tilt correction method and device
WO2019071476A1 (en) Express information input method and system based on intelligent terminal
US11881043B2 (en) Image processing system, image processing method, and program
CN118135584A (en) Automatic handwriting form recognition method and system based on deep learning
CN114332866B (en) Literature curve separation and coordinate information extraction method based on image processing
JPH07168910A (en) Document layout analysis device and document format identification device
WO2021098861A1 (en) Text recognition method, apparatus, recognition device, and storage medium
CN115457559A (en) Method, device and equipment for intelligently correcting text and license pictures
CN115457585A (en) Processing method and device for homework correction, computer equipment and readable storage medium
WO2023272656A1 (en) Picture book recognition method and apparatus, family education machine, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945752

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945752

Country of ref document: EP

Kind code of ref document: A1