CN112785626A - Twin network small target tracking method based on multi-scale feature fusion - Google Patents
Twin network small target tracking method based on multi-scale feature fusion Download PDFInfo
- Publication number
- CN112785626A CN112785626A CN202110111717.3A CN202110111717A CN112785626A CN 112785626 A CN112785626 A CN 112785626A CN 202110111717 A CN202110111717 A CN 202110111717A CN 112785626 A CN112785626 A CN 112785626A
- Authority
- CN
- China
- Prior art keywords
- layer
- size
- convolution
- image
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000004927 fusion Effects 0.000 title claims abstract description 40
- 230000008569 process Effects 0.000 claims abstract description 18
- 230000004044 response Effects 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000003321 amplification Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 claims description 2
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 2
- 238000013434 data augmentation Methods 0.000 claims 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000010606 normalization Methods 0.000 claims 1
- 230000001131 transforming effect Effects 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 abstract description 5
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000011160 research Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a twin network small target tracking method based on multi-scale feature fusion.A multi-scale fusion feature module and an optimized twin neural network comprehensively consider the accurate position of a target in favor of a low layer in a deep neural network structure, the advantage that a high layer can capture semantic information of the target is taken into consideration, the problem that the information of the small target is abandoned by convolution operation of a deep network is avoided by fully utilizing information of a bottom layer through effective fusion of different layers, the small target challenge in the tracking process is solved, and therefore a good tracking effect is realized.
Description
Technical Field
The invention relates to a visual identification technology, in particular to a twin network small target tracking method based on multi-scale feature fusion.
Background
Moving object tracking means that after position information of an object of interest of a first frame of a given video sequence is given, a tracker can continuously accurately track the object in real time in a subsequent sequence and return the position information. In recent years, theoretical methods for target tracking have been developed rapidly, which is an important research direction in the field of computer vision, and have been successfully applied to a plurality of fields such as video surveillance, unmanned driving, semantic segmentation, and the like. The development of tracking problems is greatly promoted by the emergence of deep learning methods, but the small target tracking problem is still a very large challenge, and particularly how to accurately track small targets in real time in a complex context is a key research problem.
At present, the challenges of small target tracking are mainly derived from two aspects: the characteristics of the small target object are very difficult to acquire along with the increase of the depth of the neural network, so that the characteristic acquisition is difficult to represent. On the other hand, during tracking, small objects tend to drift suddenly and substantially compared to normal sized objects due to lens jitter. Current research focuses only on the tracking results of normal-sized target objects on a generic data set, but ignores the small target tracking problem.
The existing small target tracking algorithm is based on the traditional machine learning algorithm, and has great limitation in precision improvement or tracking real-time performance, and the deep neural network can extract high-level semantic information due to the deeper network layer number so as to better express characteristics, but for small target objects, the position information of the small target can be gradually lost by the continuous convolution operation along with the deepening of the network layer number.
Therefore, by utilizing the deep neural network structure of the twin network, the small target object tracking with real-time performance and robustness under complex scenes and environments can be realized by fusing complementary feature information of different network layers from the perspective of multi-scale feature fusion, but the application of the existing twin network has the following problems: how to effectively fuse multi-scale features of different network layers, the existing deep neural network has fuzzy target positions, less semantic information and the like, and finally, the small target features are difficult to obtain.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the defects in the prior art and provides a twin network small target tracking method based on multi-scale feature fusion.
The technical scheme is as follows: the invention discloses a twin network small target tracking method based on multi-scale feature fusion, which comprises the following steps:
step (1), respectively and sequentially carrying out size modification and data amplification pretreatment on a template image x and an image y to be searched to obtain a cut training sample pair with a fixed corresponding size, and respectively inputting a template branch and a search branch in a twin network structure;
step (2), the template branch and the search branch share a feature extractor, namely a multi-scale feature fusion module is used for obtaining a multi-scale fusion feature vector, and the multi-scale fusion feature vector comprises two stages of feature extraction from bottom to top and feature transverse fusion from top to bottom;
when characteristics are extracted from bottom to top, an optimized twin network structure is constructed, the optimized twin network structure comprises 5 convolutional layers, and the output of each layer is sequentially marked as { C1, C2, C3, C4 and C5 };
when the features are transversely fused from top to bottom, firstly, the features of a high layer are subjected to up-sampling and size expansion and then fused with the features of a lower layer, and then, a multi-scale fused feature map of the template branches and the branches to be searched is generated respectively through iteration;
step (3), inputting the template characteristic diagram and the search characteristic diagram obtained in the step (2) into a similarity function, performing related cross operation to obtain a response diagram, and determining the position of a target, namely the target position in the image to be searched (namely the frame to be tracked), as the position with the highest value in the response diagram is determined to be the most similar position of the target object of the two images;
and (4) expanding the response diagram to the size y (for example, 225 x 225) of the original image to be searched, analyzing the response diagram to obtain a final tracking result, and multiplying the position with the maximum score by the total step size of five-layer convolution of the optimized twin network structure to obtain the position information of the current target on the image to be searched.
Further, the specific method for modifying the size of the template image x in the step (1) is as follows:
the size of the first frame target frame in the target tracking process is known, and the size of the first frame target frame is set as (x _ min, y _ min, w, h); then, the size of the template image x is calculated according to the first frame target frame, that is, a square area is cut out by taking the target to be tracked as the center, and the calculation formula is as follows:
s(w+2p)×s(h+2p)=A
wherein, (x _ min, y _ min) refers to the coordinate value of the lower left corner of the target frame, w and h represent the width and height of the frame, s refers to the modification size, and A is set to 127 × 127; the target frame size is expanded by the above operation, and then the size is modified to 127 × 127 size to obtain the template image x.
The invention calls the first frame in a video frame as a template frame (namely a template image x), the subsequent frames are all target positions to be searched (namely an image y to be searched), and the positions are all represented by four coordinates of the lower left corner and the width and the height.
The specific method for modifying the size of the image y to be searched comprises the following steps:
firstly, the center of a target frame predicted according to the previous frame is taken as a cutting center, and then the side length of a square area cut out according to a template image x is determined according to the proportion; finally, the size is modified to 255 x 255.
Further, an optimized twin network structure is constructed in the step (2) to extract features from bottom to top, and the optimized twin network structure is set as follows:
firstly, the first layer is a convolution layer, a convolution operation is carried out on the image by using a convolution kernel of 11 × 96 with the step size of 2, and then a maximum pooling operation of 3 × 3 and a batch standardization operation are used for outputting C1;
secondly, the second layer is a convolution layer, 5 × 256 convolution kernels with the step length of 1 are used for performing convolution operation by using two groups of GPUs respectively, then the maximum pooling operation of 3 × 3 and batch standardization operation are used for extracting characteristic information, and C2 is output;
thirdly, the third layer is a convolution layer, convolution operation is carried out by using convolution kernel groups of 3 × 192, batch standardization operation is continued, and C3 is output;
fourthly, the fourth layer is a convolution layer, the convolution kernel grouping of 3 × 192 is used for operation, the batch standardization operation is continued, and C4 is output;
fifthly, the fifth layer is a convolution layer, only the convolution operation of 3 x 128 is used, and finally 256-dimensional high-level semantic features C5 are output.
Further, the specific method for transversely fusing the features from top to bottom in the step (2) is as follows:
(A) by adopting an interpolation method, new elements are inserted between pixels by adopting 2 times of upsampling (nearest neighbor upsampling method) on the basis of the characteristic image pixels of the fifth layer, and the size of the new elements is changed into the characteristic size of the fourth layer, so that the characteristic size of the high layer is enlarged, and the next step of fusion is facilitated; then sequentially enlarging the feature sizes of the fourth layer, the third layer and the second layer;
(B) using a convolution operation of 1 × 1 at a layer C5 to obtain a feature P5 with low resolution, then changing the number of channels of a fourth layer feature map C4 generated in the bottom-up process by using a convolution kernel of 1 × 1, uniformly fixing the channels to 256-d, facilitating subsequent feature fusion, then adding and fusing the result after the fourth layer processing and the result after sampling performed by a fifth layer, using a convolution kernel of 3 × 3 to process the fused result to solve the aliasing effect possibly generated in the up-sampling process, and recording the finally obtained result as P4;
and (C) iterating the process (B) to finally generate a more accurate feature map, and respectively obtaining the feature map after multi-scale fusion of the template branches and the branches to be searched.
Further, in the step (3), the template branch and the multi-scale fused feature map corresponding to the branch to be searched are subjected to cross-correlation operation to obtain a response map. Performing a cross-correlation operation, specifically, using the template branch and the multi-scale fused feature corresponding to the branch to be searched, where the two features have the sizes of 22 × 256 and 6 × 256, respectively, and then performing a convolution operation on the feature of 22 × 256 by using 6 × 256 as a convolution kernel to obtain a response map of 17 × 17, where the score of the target position tracked on the response map of 17 × 17 is higher;
during the training process, a 17 × 17 response map is obtained followed by the determination of positive and negative samples: if the value of the distance target on the search image is smaller than R, the search image is calculated as a positive sample, otherwise, the search image is regarded as a negative sample;
finally, a binary-class cross entropy logic loss function is adopted, a random gradient descent method is utilized, the training iteration number is set to be 50, the minimum batch is set to be 8, and the learning rate is 10-2Attenuation of 10-8Training the whole deep network;
wherein,for convolution kernel, inIs subjected to convolution, b1The value of each position on the score map is represented.
Has the advantages that: the invention is provided with a multi-scale fusion characteristic module, comprehensively considers the accurate position of a target in a low layer in a deep neural network structure, captures the advantage of semantic information of the target in a high layer, and fully utilizes information of a bottom layer to avoid the problem that the convolution operation of a deep network discards information of a small target through effective fusion of different layers. In addition, the invention optimizes the existing twin network structure and can accurately track the visual target tracking method of the small target object.
In conclusion, the invention can comprehensively and effectively fuse the structures of different network layer characteristics, solves the small target challenge in the tracking process and realizes good tracking effect.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a block diagram illustrating a multi-scale feature fusion module for a branch to be searched according to an embodiment of the present invention;
FIG. 3 is a comparative illustration of an embodiment of the present invention;
fig. 3(a) is a visualization characteristic diagram obtained by using the present invention, and fig. 3(b) is a visualization characteristic diagram obtained by using an existing twin network.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
In the practical application of target tracking, a target shot by a camera is tracked at a medium-high altitude, and how to continuously and accurately track the target in a long-distance scene is a difficult problem of research in the tracking field.
The invention is based on the twin network after optimization, carries on the feature fusion through the multi-scale fusion method from top to bottom, has solved the difficult problem of tracking the small object in the prior art, as shown in figure 1, the invention is a twin network small target tracking method based on multi-scale feature fusion, comprising the following steps:
step (1), preprocessing of modifying size data and increasing is sequentially carried out on a template image x and an image y to be searched respectively to obtain a pair of cut training samples with fixed sizes, and the training samples are respectively input into a template branch and a search branch in a twin network structure;
in the target tracking process, the size of a first frame target frame is set as (x _ min, y _ min, w, h); then, the size of the template image x is calculated according to the first frame target frame, that is, a square area is cut out by taking the target to be tracked as the center, and the calculation formula is as follows:
s(w+2p)×s(h+2p)=A
wherein s refers to the modified size, and a is set to 127 × 127; expanding the size of the target frame through the above operation, and then modifying the size to 127 × 127 size to obtain a template image x;
in training, the specific method for modifying the size of the image y to be searched comprises the following steps:
firstly, the center of a target frame predicted according to the previous frame is taken as a cutting center, and then the side length of a square area cut out according to a template image x is determined according to the proportion; finally, modifying the size to 255 x 255;
step (2), the template branch and the search branch share a feature extractor, namely a multi-scale feature fusion module is used for obtaining a multi-scale fusion feature vector, and the multi-scale fusion feature vector comprises two stages of feature extraction from bottom to top and feature transverse fusion from top to bottom;
as shown in fig. 2, an optimized twin network structure is constructed to extract features from bottom to top, and the optimized twin network structure is set as follows:
firstly, the first layer is a convolution layer, a convolution operation is carried out on the image by using a convolution kernel of 11 × 96 with the step size of 2, and then a maximum pooling operation of 3 × 3 and a batch standardization operation are used for outputting C1;
secondly, the second layer is a convolution layer, 5 × 256 convolution kernels with the step length of 1 are used for performing convolution operation by using two groups of GPUs respectively, then the maximum pooling operation of 3 × 3 and batch standardization operation are used for extracting characteristic information, and C2 is output;
thirdly, the third layer is a convolution layer, convolution operation is carried out by using convolution kernel groups of 3 × 192, batch standardization operation is continued, and C3 is output;
fourthly, the fourth layer is a convolution layer, the convolution kernel grouping of 3 × 192 is used for operation, the batch standardization operation is continued, and C4 is output;
fifthly, the fifth layer is a convolution layer, only the convolution operation of 3 x 128 is used, and finally 256-dimensional high-level semantic features C5 are output.
The specific method for transversely fusing the features from top to bottom comprises the following steps:
(A) by adopting an interpolation method, new elements are inserted between pixels by adopting 2 times of upsampling (nearest neighbor upsampling method) on the basis of the characteristic image pixels of the fifth layer, and the size of the new elements is changed into the characteristic size of the fourth layer, so that the characteristic size of the high layer is enlarged, and the next step of fusion is facilitated; then sequentially enlarging the feature sizes of the fourth layer, the third layer and the second layer;
(B) using a convolution operation of 1 × 1 at a layer C5 to obtain a feature P5 with low resolution, then changing the number of channels of a fourth layer feature map C4 generated in the bottom-up process by using a convolution kernel of 1 × 1, uniformly fixing the channels to 256-d, facilitating subsequent feature fusion, then adding and fusing the result after the fourth layer processing and the result after sampling performed by a fifth layer, using a convolution kernel of 3 × 3 to process the fused result to solve the aliasing effect possibly generated in the up-sampling process, and recording the finally obtained result as P4;
iterating the process (B) to finally generate a more accurate feature map, and respectively obtaining the feature map after multi-scale fusion of the template branches and the branches to be searched;
and (3) acquiring a response graph by utilizing a cross-correlation operation on the multi-scale fused feature graph corresponding to the template branch and the branch to be searched in the step (3). Performing a cross-correlation operation, specifically, using the template branch and the multi-scale fused feature corresponding to the branch to be searched, where the two features have the sizes of 22 × 256 and 6 × 256, respectively, and then performing a convolution operation on the feature of 22 × 256 by using 6 × 256 as a convolution kernel to obtain a response map of 17 × 17, where the score of the target position tracked on the response map of 17 × 17 is higher;
in the training process, positive and negative samples need to be determined after the response map is obtained: if the value of the distance target on the search image is smaller than R, the search image is calculated as a positive sample, otherwise, the search image is regarded as a negative sample;
finally, a binary-class cross entropy logic loss function is adopted, a random gradient descent method is utilized, the training iteration number is set to be 50, the minimum batch is set to be 8, and the learning rate is 10-2Attenuation of 10-8Training the whole deep network;
wherein,for convolution kernel, inIs subjected to convolution, b1Representing the value of each position on the score map;
and (4) expanding the response image to the size of the original image, analyzing the response image to obtain a final tracking result, and multiplying the position with the maximum score by the total step length of the five-layer convolution of the optimized twin network structure to obtain the position information of the current target on the image to be searched.
As shown in FIG. 3, the target obtained by the method of the present invention has accurate positioning and more clear effect.
As can be seen from the above embodiments, the present invention regards target tracking as learning of the similarity metric problem. Inputting the template image x and the image y to be searched into a twin network structure for the same transformation, designing a multi-scale feature fusion module to respectively obtain corresponding feature vectors, finally, using the template feature image as a convolution kernel to perform cross correlation operation on the searched features, generating a response image so as to compare the similarity between the template feature image and the search features, returning a high score, namely a target position, at a position with higher similarity, and otherwise, returning a low score.
Claims (6)
1. A twin network small target tracking method based on multi-scale feature fusion is characterized in that: the method comprises the following steps:
step (1), respectively and sequentially carrying out size modification and data amplification pretreatment on a template image x and an image y to be searched to obtain a cut training sample pair with a fixed corresponding size, and respectively inputting a template branch and a search branch in a twin network structure;
step (2), the template branch and the search branch share a feature extractor, namely a multi-scale feature fusion module is used for obtaining a multi-scale fusion feature vector, and the multi-scale fusion feature vector comprises two stages of feature extraction from bottom to top and feature transverse fusion from top to bottom;
when characteristics are extracted from bottom to top, an optimized twin network structure is constructed, the optimized twin network structure comprises 5 convolutional layers, and the output of each layer is sequentially marked as { C1, C2, C3, C4 and C5 };
when the features are transversely fused from top to bottom, firstly, the features of a high layer are subjected to up-sampling and size expansion and then fused with the features of a lower layer, and then, a multi-scale fused feature map of the template branches and the branches to be searched is generated respectively through iteration;
step (3), inputting the template characteristic diagram and the search characteristic diagram obtained in the step (2) into a similarity function, performing related cross operation to obtain a response diagram, and determining the position of the target object in the image y to be searched as the most similar position of the target object in the two images by considering the position with higher value in the response diagram;
and (4) expanding the response image to the size of the original image y to be searched, analyzing the response image to obtain a final tracking result, and multiplying the position with the maximum score by the total step length of the five-layer convolution of the optimized twin network structure to obtain the position information of the current target on the image to be searched.
2. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the specific method for modifying the size of the template image x in the step (1) is as follows:
setting the size of the first frame target frame as (x _ min, y _ min, w, h); then, the size of the template image x is calculated according to the first frame target frame, that is, a square area is cut out by taking the target to be tracked as the center, and the calculation formula is as follows:
s(w+2p)×s(h+2p)=A
wherein s is the modified dimension transform and a is set to 127 × 127; expanding the size of the target frame through the above operation, and then transforming the modified size to 127 × 127 size to obtain a template image x;
the specific method for modifying the size of the image y to be searched comprises the following steps:
firstly, the center of a target frame predicted according to the previous frame is taken as a cutting center, and then the side length of a square area cut out according to a template image x is determined according to the proportion; finally, the size is modified to 255 x 255.
3. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the method for data augmentation in the step (1) comprises the following steps of in order to increase deep learning training data, wherein four data augmentation modes are utilized: random stretching of randomtretch, random crop random, normalization and totensor conversion into tensor;
finally, the size is modified to the size that needs to be input into the network structure.
4. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: in the step (2), an optimized twin network structure is constructed to extract features from bottom to top, and the optimized twin network structure is set as follows:
firstly, the first layer is a convolution layer, a convolution operation is carried out on the image by using a convolution kernel of 11 × 96 with the step size of 2, and then a maximum pooling operation of 3 × 3 and a batch standardization operation are used for outputting C1;
secondly, the second layer is a convolution layer, 5 × 256 convolution kernels with the step length of 1 are used for performing convolution operation by using two groups of GPUs respectively, then the maximum pooling operation of 3 × 3 and batch standardization operation are used for extracting characteristic information, and C2 is output;
thirdly, the third layer is a convolution layer, convolution operation is carried out by using convolution kernel groups of 3 × 192, batch standardization operation is continued, and C3 is output;
fourthly, the fourth layer is a convolution layer, the convolution kernel grouping of 3 × 192 is used for operation, the batch standardization operation is continued, and C4 is output;
fifthly, the fifth layer is a convolution layer, only the convolution operation of 3 x 128 is used, and finally 256-dimensional high-level semantic features C5 are output.
5. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the specific method for transversely fusing the features from top to bottom in the step (2) comprises the following steps:
(A) by adopting an interpolation method, new elements are inserted between pixels by adopting 2 times of upsampling on the basis of the characteristic image pixels of the fifth layer, and the size of the new elements is changed into the characteristic size of the fourth layer, so that the characteristic size of the high layer is enlarged, and the next step of fusion is facilitated; then sequentially enlarging the feature sizes of the fourth layer, the third layer and the second layer;
(B) using a convolution operation of 1 × 1 at a layer C5 to obtain a feature P5 with low resolution, then changing the number of channels of a fourth layer feature map C4 generated in the bottom-up process by using a convolution kernel of 1 × 1, uniformly fixing the channels to 256-d, then adding and fusing the result after the fourth layer processing and the result after the fifth layer sampling, processing the fused result by using a convolution kernel of 3 × 3, and recording the finally obtained result as P4;
and (C) iterating the process (B) to finally generate a feature map, and respectively obtaining the multi-scale fused feature map of the template branch and the branch to be searched.
6. The twin network small target tracking method based on multi-scale feature fusion as claimed in claim 1, wherein: the step (3) is to acquire a response graph by utilizing cross correlation operation on the multi-scale fused feature graph corresponding to the template branch and the branch to be searched;
the specific process of the cross-correlation operation is as follows: utilizing the template branches and the multi-scale fused features corresponding to the branches to be searched, wherein the sizes of the two features are 22 × 256 and 6 × 256 respectively, and then performing convolution operation on the features 22 × 256 by taking 6 × 256 as a convolution kernel to obtain a response graph 17 × 17;
during the training process, a 17 × 17 response map is obtained followed by the determination of positive and negative samples: if the value of the distance target on the search image is smaller than R, the search image is calculated as a positive sample, otherwise, the search image is regarded as a negative sample;
finally, performing iterative training by adopting a binary-class cross entropy logic loss function and a random gradient descent method to train the whole depth network;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110111717.3A CN112785626A (en) | 2021-01-27 | 2021-01-27 | Twin network small target tracking method based on multi-scale feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110111717.3A CN112785626A (en) | 2021-01-27 | 2021-01-27 | Twin network small target tracking method based on multi-scale feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112785626A true CN112785626A (en) | 2021-05-11 |
Family
ID=75758302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110111717.3A Pending CN112785626A (en) | 2021-01-27 | 2021-01-27 | Twin network small target tracking method based on multi-scale feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112785626A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113223053A (en) * | 2021-05-27 | 2021-08-06 | 广东技术师范大学 | Anchor-free target tracking method based on fusion of twin network and multilayer characteristics |
CN113627488A (en) * | 2021-07-13 | 2021-11-09 | 武汉大学 | Twin network online update-based single target tracking method and device |
CN113808166A (en) * | 2021-09-15 | 2021-12-17 | 西安电子科技大学 | Single-target tracking method based on clustering difference and depth twin convolutional neural network |
CN114372999A (en) * | 2021-12-20 | 2022-04-19 | 浙江大华技术股份有限公司 | Object detection method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN111179307A (en) * | 2019-12-16 | 2020-05-19 | 浙江工业大学 | Visual target tracking method for full-volume integral and regression twin network structure |
CN111291679A (en) * | 2020-02-06 | 2020-06-16 | 厦门大学 | Target specific response attention target tracking method based on twin network |
CN111489361A (en) * | 2020-03-30 | 2020-08-04 | 中南大学 | Real-time visual target tracking method based on deep feature aggregation of twin network |
CN111681259A (en) * | 2020-05-17 | 2020-09-18 | 天津理工大学 | Vehicle tracking model establishing method based on Anchor-free mechanism detection network |
CN111898504A (en) * | 2020-07-20 | 2020-11-06 | 南京邮电大学 | Target tracking method and system based on twin circulating neural network |
CN112184752A (en) * | 2020-09-08 | 2021-01-05 | 北京工业大学 | Video target tracking method based on pyramid convolution |
-
2021
- 2021-01-27 CN CN202110111717.3A patent/CN112785626A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN111179307A (en) * | 2019-12-16 | 2020-05-19 | 浙江工业大学 | Visual target tracking method for full-volume integral and regression twin network structure |
CN111291679A (en) * | 2020-02-06 | 2020-06-16 | 厦门大学 | Target specific response attention target tracking method based on twin network |
CN111489361A (en) * | 2020-03-30 | 2020-08-04 | 中南大学 | Real-time visual target tracking method based on deep feature aggregation of twin network |
CN111681259A (en) * | 2020-05-17 | 2020-09-18 | 天津理工大学 | Vehicle tracking model establishing method based on Anchor-free mechanism detection network |
CN111898504A (en) * | 2020-07-20 | 2020-11-06 | 南京邮电大学 | Target tracking method and system based on twin circulating neural network |
CN112184752A (en) * | 2020-09-08 | 2021-01-05 | 北京工业大学 | Video target tracking method based on pyramid convolution |
Non-Patent Citations (4)
Title |
---|
崔洲涓 等: "面向无人机的轻量级Siamese注意力网络目标跟踪", 《光学学报》 * |
杨哲 等: "基于孪生网络融合多模板的目标跟踪算法", 《计算机工程与应用》 * |
武玉伟: "《深度学习基础与应用》", 30 April 2020, 北京:北京理工大学出版社 * |
董洪义: "《深度学习之PyTorch物体检测实战》", 31 January 2020, 北京:机械工业出版社 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113223053A (en) * | 2021-05-27 | 2021-08-06 | 广东技术师范大学 | Anchor-free target tracking method based on fusion of twin network and multilayer characteristics |
CN113627488A (en) * | 2021-07-13 | 2021-11-09 | 武汉大学 | Twin network online update-based single target tracking method and device |
CN113627488B (en) * | 2021-07-13 | 2023-07-21 | 武汉大学 | Single-target tracking method and device based on online update of twin network |
CN113808166A (en) * | 2021-09-15 | 2021-12-17 | 西安电子科技大学 | Single-target tracking method based on clustering difference and depth twin convolutional neural network |
CN113808166B (en) * | 2021-09-15 | 2023-04-18 | 西安电子科技大学 | Single-target tracking method based on clustering difference and depth twin convolutional neural network |
CN114372999A (en) * | 2021-12-20 | 2022-04-19 | 浙江大华技术股份有限公司 | Object detection method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111462175B (en) | Space-time convolution twin matching network target tracking method, device, medium and equipment | |
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
US10719940B2 (en) | Target tracking method and device oriented to airborne-based monitoring scenarios | |
CN110443827B (en) | Unmanned aerial vehicle video single-target long-term tracking method based on improved twin network | |
CN107273800B (en) | Attention mechanism-based motion recognition method for convolutional recurrent neural network | |
CN112785626A (en) | Twin network small target tracking method based on multi-scale feature fusion | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN111340844B (en) | Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism | |
CN111460914B (en) | Pedestrian re-identification method based on global and local fine granularity characteristics | |
CN112132856B (en) | Twin network tracking method based on self-adaptive template updating | |
CN111639692A (en) | Shadow detection method based on attention mechanism | |
CN113743269B (en) | Method for recognizing human body gesture of video in lightweight manner | |
CN113744311A (en) | Twin neural network moving target tracking method based on full-connection attention module | |
CN113034545A (en) | Vehicle tracking method based on CenterNet multi-target tracking algorithm | |
CN112163498A (en) | Foreground guiding and texture focusing pedestrian re-identification model establishing method and application thereof | |
CN118097150B (en) | Small sample camouflage target segmentation method | |
CN116740135B (en) | Infrared dim target tracking method and device, electronic equipment and storage medium | |
CN114519807A (en) | Global self-attention target detection method combining channel space attention | |
CN117218378A (en) | High-precision regression infrared small target tracking method | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN116051601A (en) | Depth space-time associated video target tracking method and system | |
CN111882581A (en) | Multi-target tracking method for depth feature association | |
CN115661754A (en) | Pedestrian re-identification method based on dimension fusion attention | |
CN113129332A (en) | Method and apparatus for performing target object tracking | |
CN112509014B (en) | Robust interpolation light stream computing method matched with pyramid shielding detection block |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210511 |
|
RJ01 | Rejection of invention patent application after publication |