CN106469304A

CN106469304A - Handwritten signature location positioning method in bill based on depth convolutional neural networks

Info

Publication number: CN106469304A
Application number: CN201610841643.8A
Authority: CN
Inventors: 张二虎; 李雪薇
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2016-09-22
Filing date: 2016-09-22
Publication date: 2017-03-01

Abstract

The invention discloses handwritten signature location positioning method in bill based on depth convolutional neural networks, specifically implement according to following steps：Step 1：Build the platform based on the caffe deep learning framework comprising multiple convolutional neural networks models；Step 2：Prepare the data set of bill；Step 3：Training network obtains detection and localization model；Step 4：The detection and localization model being obtained with the step 3 signature position hand-written to bill positioning to be detected.Handwritten signature location positioning method in the bill based on depth convolutional neural networks for the present invention, can accurately calibrate the position containing handwritten signature in bill.

Description

Handwritten signature location positioning method in bill based on depth convolutional neural networks

Technical field

The invention belongs to framing detection technique field is and in particular to a kind of bill based on depth convolutional neural networks Middle handwritten signature location positioning method.

Background technology

At present, China to the research of bill automatic checkout system mostly also in the development phase, practical application is less, and joins Putting a set of bill specifies handwritten signature position detecting system relatively costly, also limit the development of bill test technique automatic. This also just proposes demand to vast research institution and scholar, needs to work out a kind of technology of bill Aulomatizeted Detect.Ticket According in handwritten signature there is the characteristic of character, the identification for character mainly has：Statistical nature technology of identification, architectural feature are known Other technology, the technology of identification based on neutral net.Statistical nature has character two dimensional surface position feature, character horizontally or vertically Histogram feature of direction projection etc., the character recognition technologies based on statistical nature are weak for nearly word form symbol separating capacity, only suitable Close character rough sort.Architectural feature has the trend of stroke, isolated point and whether contains closure stroke etc., and this method is easy to Distinguish font change character greatly.At present, is in the research of neutral net a new peak time, neutral net is It is widely used in area of pattern recognition.With academia for deep learning research, the algorithm of deep learning is more and more ripe, Application also gets more and more.But, most of neutral nets are only used for extracting clarification of objective it is impossible to position for target.

Content of the invention

It is an object of the invention to provide handwritten signature location positioning side in a kind of bill based on depth convolutional neural networks Method, can accurately calibrate the position containing handwritten signature in bill.

The technical solution adopted in the present invention is, handwritten signature location positioning in the bill based on depth convolutional neural networks Method, specifically implements according to following steps：

Step 1：Build the platform based on the caffe deep learning framework comprising multiple convolutional neural networks models；

Step 2：Prepare the data set of bill；

Step 3：Training network obtains detection and localization model；

Step 4：The detection and localization model being obtained with the step 3 signature position hand-written to bill positioning to be detected.

The feature of the present invention also resides in：

Step 2 is specially：

Step 2.1：Bill is taken pictures, obtains the raw image data of bill, and sample is carried out to raw image data This expansion；

Step 2.2：The all view data obtaining in step 2.1 are numbered and demarcate, calibrate in bill images The coordinate of handwritten signature position, obtain the top left co-ordinate (Xmin, Ymin) of this signature position and bottom right angular coordinate (Xmax, Ymax), and by the sequence number of all images and corresponding co-ordinate position information it is written in xml document；

Step 2.3：All view data are divided into training dataset and test data set, then training dataset is divided into instruction Practice data and checking data.

In step 2.1, sample expansion is carried out to raw image data, including：

1. raw image data is carried out with the rotation of different angles and the translation of different directions；

2. linear interpolation scaling is carried out to raw image data；

3. raw image data is increased with salt-pepper noise, the Gaussian noise of varying strength.

Step 3 is specially：

Step 3.1：Image Adjusting in the data set obtaining in step 2 is become W₁*H₁The image of size simultaneously puts into ZF network Front 5 layer by layer in do feature extraction, export 256The characteristic pattern of size；

Step 3.2：256 characteristic patterns being obtained with step 3.1 with the convolution kernel of 3*3 carry out convolution, obtain 256 dimensional features Vector, as the ground floor of RPN；

Step 3.3：256 dimensional feature vectors that step 3.2 is obtained are input to classification layer and return two parallel-convolution of layer Layer, according to the score height of prospect probability in positive sample, chooses high front 300 candidate frames of score；

Step 3.4：Using ROI_Pooling layer, 300 candidate frames in step 3.3 are mapped to the layer 5 of ZF network On the 256 dimensional feature figures obtaining after convolution, obtain the characteristic pattern of size 6*6 after the normalizing of pond；

Step 3.5, the characteristic pattern of each 6*6 size is input to two continuously full articulamentum fc6, fc7, first passes through Fc6 obtains 4096 dimensional features, then 4096 dimensional features are input in fc7, finally gives 1*4096 dimensional feature；

Step 3.6：1*4096 dimensional feature is input to two parallel full articulamentum cls_score layers, bbox_ Predict layer, cls_score layer is used for classifying, and exports the probability of background and the probability of K class sample, wherein, K is sample class Number；Bbox_predict layer is used for adjusting candidate region location, and (x', y', w', the h') of output candidate frame, after x' is adjustment Candidate frame upper left corner abscissa, y' is the candidate frame upper left corner vertical coordinate after adjustment, and w' is the width of the candidate frame after adjustment, h' Height for the candidate frame after adjustment；

Step 3.7：Judge whether total iterationses are more than threshold value, if no more than threshold value, go to step 3.2；If More than threshold value, then terminate.

Image Adjusting in data set in step 3.1 becomes W₁*H₁The image of size needs according to image size W* in data set H calculating, that is,：

In step 3, during training network, the initial value of learning rate is set as lr=0.01, whenever current iteration number of times reaches During the integral multiple of step value, learning rate is decayed once, when iterationses reach total iterationses, terminates；Lr after decay =lr*gamma, wherein, gamma=0.1, iterationses≤total iterationses.

In step 3, during training network, mini-batch size is set to 256.

The invention has the beneficial effects as follows：Handwritten signature location positioning in the bill based on depth convolutional neural networks for the present invention Method, uses deep neural network to carry out framing, more equal in the speed and accuracy of positioning than traditional method It is improved, accuracy can reach 90.9%, speed can accomplish real-time positioning substantially, positioning one pictures need 0.3s；And the bill images data base of the present invention contains various data samples, and data sample has multiformity, makes The accuracy that must position has risen.

Brief description

Fig. 1 is the normal bill images of collection in the inventive method；

Fig. 2 is 45 ° of bill images of rotation of collection in the inventive method；

Fig. 3 is collection plus salt-pepper noise bill images in the inventive method；

Fig. 4 is the bill images to be detected rotating 90 ° in the inventive method；

Fig. 5 is the testing result figure of Fig. 4；

Fig. 6 is the bill images to be detected of 180 ° of rotation in the inventive method；

Fig. 7 is the testing result figure of Fig. 6；

Fig. 8 is the bill images to be detected of 45 ° of rotation in the inventive method；

Fig. 9 is the testing result figure of Fig. 8；

Figure 10 is in the inventive method plus the bill images to be detected of Gaussian noise；

Figure 11 is the testing result figure of Figure 10；

Figure 12 is in the inventive method plus the bill images to be detected of salt-pepper noise；

Figure 13 is the testing result figure of Figure 12.

Specific embodiment

The present invention is described in detail with reference to the accompanying drawings and detailed description.

Handwritten signature location positioning method in the bill based on depth convolutional neural networks for the present invention, specifically according to following step Rapid enforcement：

Step 1：Build to be based under ubuntu system environmentss or Windos environment and comprise multiple convolutional neural networks moulds The platform of the caffe deep learning framework of type；

Step 2：Prepare the data set of bill, specially：

Step 2.1, due to being all to be taken pictures on mobile phone or camera upload by user oneself in the image in true environment To billing system, so when preparing pictures, needing the photo that the mobile phone considering different resolution shoots, and shoot The condition such as ambient lighting.The present invention is shot to bill using the mobile phone of multiple different resolutions, and this part mobile phone is clapped The image taken the photograph is referred to as raw image data.In order that view data is abundant enough and can meet various practical situations, the present invention Sample expansion has been carried out to raw image data：1. raw image data is carried out with the rotation of different angles and the flat of different directions Move；2. linear interpolation scaling is carried out to raw image data, allow for the image that different cameral collects of different sizes；3. right Raw image data increases the salt-pepper noise of varying strength, Gaussian noise；

As Fig. 1-3 provides the part bill sample image collecting, including the bill of normal bill images, rotation As, contain noisy bill images.

Step 2.2：The all view data obtaining in step 2.1 are numbered and demarcate, calibrate in bill images The coordinate of handwritten signature position, obtain the top left co-ordinate (Xmin, Ymin) of this signature position and bottom right angular coordinate (Xmax, Ymax), and by all picture numbers and corresponding co-ordinate position information it is written in xml document；

Step 2.3：Training dataset trainval and test data set are randomly divided into all ready view data Two partial datas of test, are provided with trainval data set and account for 8/10, the test data set of whole data set accounting for and entirely counting According to collection 2/10.It is divided into train data and val data in trainval data set, wherein trian data is used to instruct again Practice, 4/5, the val data accounting for trainval data set is used to do and verifies, accounts for the 1/5 of trainval data set.

Step 3：Training network obtains detection and localization model and training parameter is optimized

Step 3.1：Image Adjusting in the data set obtaining in step 2 is become W₁*H₁(being 600*800 in the present invention) is big Little image and before putting into ZF network 5 layer by layer in do feature extraction, export 256(being 37*50 in the present invention) size Characteristic pattern.In data set, the size of Image Adjusting needs length-width ratio (the image longest edge/figure according to image in data images As minor face) calculating, if input picture size is W*H, the image size after adjustment is W₁*H₁, relational expression is：

The reason in data set, image size is adjusted to 600*800：

Image in data set is not of uniform size, when calculating image size after normalizing, selectes that to account for whole data set number more Several types size image, and in data set the ratio of image longest edge and image minor face to account for whole data set more Several types image, qualified inclusion 600*800,1200*1600,1500*2000,2000*2600,3000* 4000, using these as adjustment after image size, the image after adjustment need to carry out convolutional calculation it is contemplated that amount of calculation big Little, GPU memory size, need to choose less adjustment ratio as far as possible, be tentatively revised as 600*800,1200*1600 this two big Little image is trained, and the image choosing 600*800 size is trained accuracy for 90.84%, chooses 1200*1600 big The accuracy that little image is trained is 90.86%, and accuracy the latter only improves 0.02% than the former, but the training time with And the latter is more more complex than the former on computation complexity, therefore finally choose this ratio of 600*800 as the ratio of adjustment.

Step 3.2：Carry out convolution with 256 characteristic patterns that the convolution kernel (sliding window) of 3*3 is obtained with step 3.1, because For on the region of this 3*3, each characteristic pattern obtaining 1 dimensional vector, 256 characteristic patterns can get 256 Wei Te Levy vector, as the ground floor of RPN (region proposal network, region candidate network)；

3*3 sliding window center position, 3 kinds of yardsticks (128,256,512) of corresponding prediction input picture, 3 kinds of length-width ratios (1： 2、2：1、7：1) target area candidate frame, the mechanism of this mapping is referred to as anchor point, creates k=9 anchor point.It is each 3*3 Region can produce 9 target area candidate frames.So for the characteristic pattern of the 37*50 in the present invention, a total of about 20000 (37*50*9) individual anchor point, that is, predict 20000 target area candidate frames over an input image.

The yardstick of target area candidate frame has three (128,256,512), and this essence refers to that the area of candidate frame is 128*128,256*256,512*512, the size after Image Adjusting normalization in this candidate frame size and step 3.1 has Target in image is enclosed in interior by pass, the as far as possible area of candidate frame.

The length-width ratio of target area candidate frame selects 1：2、2：1 and 7：1 these three ratios are according to each figure in data set Width (W as interior target candidate frame_box) and high (H_box) ratio draw, choose three most ratios of wide high proportion picture number and make The ratio of width to height for candidate frameWherein, W_box=Xmax-Xmin, H_box=Ymax-Ymin, (Xmin, Ymin), (Xmax, Ymax) it is the upper left corner of handwritten signature position demarcated in step 2.2 and lower right corner coordinate figure.

Step 3.3：256 dimensional feature vectors are input to two parallel-convolution layers, that is, classification layer and recurrence layer, are respectively used to Classification and frame return.For local, this two-layer is fully-connected network；For the overall situation, due to network in all positions (altogether 37*50) parameter identical, so the actual convolutional network with a size of 1 × 1 is realized.It should be noted that：Not explicit Ground extracts any candidate window, completely completes to judge using network itself and revises.

To each candidate frame, layer of classifying exports, from 256 dimensional features, the probability belonging to foreground and background, and to each candidate Frame is demarcated：Positive sample be overlapping with real estate be more than 0.7, negative sample be overlapping with real estate be less than 0.3, reservation just Sample；

Return layer simultaneously and export 4 translations zooming parameter (x, y, w, h) from 256 dimensional features, wherein, x is that candidate frame is left Upper angle abscissa, y is candidate frame upper left corner vertical coordinate, and w is the width of candidate frame, and h is the height of candidate frame, this four coordinate units Element is used for determining target location.

Through step 3.2, the present invention predicts over an input image and obtains 20000 candidate frames, after step 3.3, 20000 predicting candidate frames are left 2000 about candidate frames, finally according to the score height of prospect probability in positive sample, choose High front 300 candidate frames of score.

Step 3.4：Using ROI_Pooling layer, 300 candidate frames in step 3.3 are mapped to the layer 5 of ZF network On the 256 dimensional feature figures obtaining after convolution, obtain the characteristic pattern of size 6*6 after the normalizing of pond.

ROI_Pooling layer is exactly to realize from artwork area maps to conv5 region after-bay to the work(of fixed size Energy.

Calculate the coordinate that the candidate frame of prediction is mapped on characteristic pattern first, that is, original coordinates are multiplied by 1/16th, so It is directed to each afterwards to export being calculated, the 300 different size of candidate frames that will map on characteristic pattern carry out pond Change, then the result unification of Chi Huahou is normalized to the characteristic pattern of size 6*6.

Step 3.5, the characteristic pattern (300 characteristic patterns altogether) of each 6*6 size is input to two continuously entirely connects Stratum fc6, fc7, this two full articulamentums are not to be continuously parallel, first pass through fc6 and obtain 4096 dimensional features, then by 4096 Dimensional feature is input in fc7, finally gives 1*4096 dimensional feature.

Step 3.7：Judge whether total iterationses (for 8000 in the present invention) are more than threshold value, if no more than threshold value, Go to step 3.2；If greater than threshold value, then terminate.

The selection of total iterationses：Observe the value of loss in the training process, when the value of loss no longer significantly declines, becomes In stablizing, we may be selected by current iterationses iterationses the most final.

In training network, the initial value of learning rate is set as lr=0.01, whenever current iteration number of times reaches step value During the integral multiple of (present invention is 6000), learning rate is decayed once, when iterationses reach total iterationses, terminates；Decline Lr=lr*gamma after subtracting, wherein, gamma=0.1, iterationses≤total iterationses.

When being optimized with gradient descent algorithm, in the more new regulation of weight, a coefficient can be multiplied by before gradient terms, This coefficient is just learning rate lr.If learning rate is too little, network convergence can be made excessively slow, if learning rate is too big, Cost function then can be led to vibrate, a reasonable strategy is first learning rate to be set to 0.01, Ran Houguan in practice Examine the trend of training cost, if training cost is reducing, that can tune up learning rate step by step, such as 0.1, 1.0….If training cost is increasing, that is reduced by learning rate, such as 0.001,0.0001 ....Through said method Determine the value of learning rate.

It is relevant with step-length when learning rate decays, and how much relevant with gamma reduces.When selecting step-length, Ke Yijin Possible close total iterationses.

During training network, mini-batch size is set to 256, using the more new regulation of weight during mini-batch is：

Namely the gradient of 256 samples is averaged.

When using mini-batch, all samples in a batch can be placed in a matrix for we, utilize The calculating of accelerating gradient is carried out in linear algebra storehouse, and this is one of Project Realization optimization method.

One big batch size, can make full use of matrix, linear algebra storehouse come the acceleration being calculated, batch Size is less, then acceleration effect may be more inconspicuous.Batch size is not the bigger the better, too big, and the renewal of weight will Less frequent, lead to optimization process too very long.General batch size size is 256, if image data collection less, GPU Below internal memory 4G it may be considered that change little by batch size.

Step 4：The detection and localization model being obtained with the step 3 signature position hand-written to bill positioning to be detected, that is, wrap Include top left co-ordinate and the bottom right angular coordinate of the rectangle frame of handwritten signature position.

Fig. 4-Figure 13 is the bill images to be detected and use basis with different rotary angle and different noise types Handwritten signature position at inventive method positioning uses rectangle frame labelling figure in the figure.As can be seen that the robust of the inventive method Property is very good, can overcome the impact to positioning result for situations such as rotation, noise, has the fast spy of accurate positioning, locating speed Point.

Claims

1. in the bill based on depth convolutional neural networks handwritten signature location positioning method it is characterised in that specifically according to Lower step is implemented：

Step 2：Prepare the data set of bill；

Step 3：Training network obtains detection and localization model；

2. handwritten signature location positioning method in the bill based on depth convolutional neural networks according to claim 1, its It is characterised by, described step 2 is specially：

Step 2.1：Bill is taken pictures, obtains the raw image data of bill, and sample expansion is carried out to raw image data Fill；

Step 2.2：The all view data obtaining in step 2.1 are numbered and demarcate, calibrate hand-written in bill images The coordinate of signature position, obtains top left co-ordinate (Xmin, Ymin) and the bottom right angular coordinate (Xmax, Ymax) of this signature position, And the sequence number of all images and corresponding co-ordinate position information are written in xml document；

Step 2.3：All view data are divided into training dataset and test data set, then training dataset is divided into training number According to checking data.

3. handwritten signature location positioning method in the bill based on depth convolutional neural networks according to claim 2, its It is characterised by, in described step 2.1, sample expansion is carried out to raw image data, including：

2. linear interpolation scaling is carried out to raw image data；

4. handwritten signature location positioning method in the bill based on depth convolutional neural networks according to claim 1, its It is characterised by, described step 3 is specially：

Step 3.1：Image Adjusting in the data set obtaining in step 2 is become W₁*H₁The image of size is simultaneously put into 5 before ZF network Do feature extraction in layer by layer, export 256The characteristic pattern of size；

Step 3.2：256 characteristic patterns being obtained with step 3.1 with the convolution kernel of 3*3 carry out convolution, obtain 256 dimensional features to Amount, as the ground floor of RPN；

Step 3.3：256 dimensional feature vectors that step 3.2 is obtained are input to classification layer and return two parallel-convolution layer of layer, root According to the score height of prospect probability in positive sample, choose high front 300 candidate frames of score；

Step 3.4：Using ROI_Pooling layer, 300 candidate frames in step 3.3 are mapped to the layer 5 convolution of ZF network On the 256 dimensional feature figures obtaining afterwards, obtain the characteristic pattern of size 6*6 after the normalizing of pond；

Step 3.5, the characteristic pattern of each 6*6 size is input to two continuously full articulamentum fc6, fc7, first passes through fc6 Obtain 4096 dimensional features, then 4096 dimensional features are input in fc7, finally give 1*4096 dimensional feature；

Step 3.6：1*4096 dimensional feature is input to two parallel full articulamentum cls_score layers, bbox_predict layers, Cls_score layer is used for classifying, and exports the probability of background and the probability of K class sample, wherein, K is sample class number；bbox_ Predict layer is used for adjusting candidate region location, (x', y', w', the h') of output candidate frame, and x' is that the candidate frame after adjustment is left Upper angle abscissa, y' is the candidate frame upper left corner vertical coordinate after adjustment, and w' is the width of the candidate frame after adjustment, after h' is adjustment Candidate frame height；

Step 3.7：Judge whether total iterationses are more than threshold value, if no more than threshold value, go to step 3.2；If greater than Threshold value, then terminate.

5. handwritten signature location positioning method in the bill based on depth convolutional neural networks according to claim 4, its It is characterised by, the Image Adjusting in data set in described step 3.1 becomes W₁*H₁The image of size needs according to image in data set Size W*H calculating, that is,：

\frac{W}{H} = \frac{W_{1}}{H_{1}} .

6. handwritten signature location positioning method in the bill based on depth convolutional neural networks according to claim 4, its It is characterised by, in described step 3, during training network, the initial value of learning rate is set as lr=0.01, whenever current iteration number of times Reach step value integral multiple when, learning rate decay once, when iterationses reach total iterationses, terminate；After decay Lr=lr*gamma, wherein, gamma=0.1, iterationses≤total iterationses.

7. handwritten signature location positioning method in the bill based on depth convolutional neural networks according to claim 4, its It is characterised by, in described step 3, during training network, mini-batch size is set to 256.