CN114842051B

CN114842051B - Unmanned aerial vehicle tracking model migration learning method based on depth attribution map

Info

Publication number: CN114842051B
Application number: CN202210473138.8A
Authority: CN
Inventors: 陈志旺; 雷海鹏; 吕昌昊; 杨天宇; 雷春明
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2024-11-08
Anticipated expiration: 2042-04-29
Also published as: CN114842051A

Abstract

The invention relates to an unmanned aerial vehicle tracking model migration learning method based on a depth attribution map, which belongs to the technical field of migration learning and comprises the steps of collecting detection data; selecting a deep neural network pre-training model; constructing a forward propagation path, and collecting output characteristics of each convolution layer of the pre-training model; calculating the similarity of different data points among the same characteristics, and constructing an edge similarity sequence; constructing a node attribution value sequence; each time a layer is added to the convolution layer, calculating cosine similarity of attribution values of the layer and a last layer node, and calculating a spearman correlation coefficient of similarity of the layer and a last layer side; constructing a depth attribution map similarity function; and (3) obtaining the correlation coefficient corresponding to each convolution layer, setting a threshold value, screening the correlation coefficient larger than the threshold value, wherein the value corresponds to a critical point of fine tuning of the convolution layer as a model parameter, and training parameters behind the layer. The method is simple to operate, short in training period and free of a large amount of image data.

Description

Unmanned aerial vehicle tracking model migration learning method based on depth attribution map

Technical Field

The invention relates to an unmanned aerial vehicle tracking model migration learning method based on a depth attribution map, and belongs to the technical field of migration learning.

Background

Currently, the mainstream target tracking algorithm is mainly used for tracking any target in a video sequence, and tracking specific targets mainly depends on the generalization capability of the tracking algorithm, so that the application of the tracking algorithm in a specific actual scene is often difficult to obtain a satisfactory tracking effect. In addition, deep learning often requires a large number of labeled training samples, and the data distribution is the same as that of test samples, but the data set sample acquisition difficulty aiming at a specific tracking scene is large, the time consumption is long, and the problem that fitting is easy to happen when the deep neural network is trained from the beginning by utilizing the data set, so that the trained deep neural network model has no practical application value. The occurrence of the migration learning theory provides an important method and a path for solving the problem, but most of the fields currently adopt a traditional model-based pre-training-fine tuning migration learning method, and the method can obtain a migration model with better performance, but needs a large amount of experiments in the process of selecting a model fine tuning critical point, consumes time and occupies a large amount of calculation resources, and is not fully applicable in solving the practical problem.

Literature (Maqsood M,Nazir F,Khan U,et al.Transfer learning assisted classification and detection of Alzheimer's disease stages using 3D MRI scans[J].Sensors,2019,19(11):2645.) proposes a method for detecting and classifying the diseased stages of alzheimer's disease with the aid of transfer learning. The method adopts AlexNet networks trained on a large image dataset ImageNet, replaces the last three full-connection layers of the networks with a softmax layer, a full-connection layer and an output layer, then trains the networks on the Alzheimer's disease medical image dataset by utilizing a pretraining-fine tuning method in transfer learning, and finally realizes the detection of the Alzheimer's disease. The method adopts a pre-training-fine-tuning transfer learning method, however, the medical image is required to be strictly marked, so that the method has a large limitation.

Literature (Rathi D.Optimization of Transfer Learning for Sign Language Recognition Targeting Mobile Platform[J].arXiv preprint arXiv:1805.06618,2018.) proposes a mobile platform-based american sign language recognition algorithm, the pre-training model of which is a MobileNet model and a Inception V model trained on ImageNet, and the pre-training model is trained on Sign Language MNIST by means of transfer learning and deployed on the mobile platform. The method still only uses the traditional transfer learning method, and the transfer learning efficiency is not improved.

Document (Nguyen D,Nguyen K,Sridharan S,et al.Meta transfer learning for facial emotion recognition[C]//2018 24th International Conference on Pattern Recognition(ICPR).IEEE,2018:3543-3548.) proposes an algorithm for automatic facial expression recognition, which trains a system capable of automatically recognizing facial expressions on SAVEE and ENTERFACE data sets by using a meta-shift learning method PathNet, and overcomes the problem that a priori knowledge is lost in multiple cross-domain shifts due to lack of the facial expression data sets and a pretraining-fine tuning shift method. The method uses a meta transfer learning method, and is not an efficient transfer learning method.

Disclosure of Invention

Aiming at the problems, the invention provides an unmanned aerial vehicle tracking model migration learning method based on a depth attribution map, which can obtain critical points of model migration fine adjustment in a simple mode, thereby improving migration learning efficiency.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

an unmanned aerial vehicle tracking model migration learning method based on a depth attribution map comprises the following steps:

S1: collecting image data containing a target unmanned aerial vehicle model as detection data D _p, detection data D _p＝{x₁,x₂,…x_a,x_n and n unmanned aerial vehicle image data;

S2: using a tracking model SiamRPN ++ which is trained by using the universal tracking data set as a deep neural network pre-training model m ₁;

S3: constructing a forward propagation path, inputting detection data D _p acquired in the step S1 into a deep neural network pre-training model m ₁ in the step S2, calculating an output characteristic F _k ¹ of a convolution layer every time an image in the detection data D _p passes through the convolution layer, and storing a result; through n convolution layers of the deep neural network pre-training model m ₁, constructing a knowledge pool omega containing n output features, wherein omega= { F ₁ ¹,F₂ ¹,…,F_k ¹,F_n ¹ };

S4: calculating the similarity of the same output characteristic F _k ¹ between every two image data points in the detection data D _p by using cosine similarity to obtain the similarity of edges

S5: constructing a reverse propagation path, inputting the detection data D _p into a deep neural network pre-training model m ₁, and calculating the attribution value of the input data x _a for the characteristic output F _k ¹(x_a) by using a gradient input modeThe node attribution value for obtaining the output characteristic of the layer is

S6: constructing a node attribution value sequence, and calculating the similarity of each layer of convolution and the last layer of convolution characteristic embedded space node attribution value in the deep neural network pretraining model m ₁ by using cosine similarity

S7: constructing a similarity sequence of the feature embedding space according to the arrangement sequence of the convolution layers, and calculating the edge similarity of the feature embedding space of each layer of convolution and the final layer of convolutionAnd (3) withIs obtained as the correlation coefficient

S8: constructing a depth attribution map, wherein the similarity function is as followsThen according to the similarity function, the correlation coefficient r _k of the depth attribution map of each layer convolution and the last layer convolution is obtained;

S9: setting a correlation coefficient threshold r _set, comparing the correlation coefficient r _k obtained by each calculation with r _set, and reserving if the correlation coefficient is larger than or equal to the threshold, otherwise discarding;

s10: and taking the k layer where r _k is positioned as a model parameter fine tuning critical point.

The technical scheme of the invention is further improved as follows: the detection data D _p in the step S1 is data randomly sampled in a self-made multi-rotor unmanned aerial vehicle data set; the self-made multi-rotor unmanned aerial vehicle comprises a Sinkiang unmanned aerial vehicle model Mavic; the attribute of homemade many rotor unmanned aerial vehicle dataset includes rapid movement, background confusion, similar thing interference, deformation, shelter from, motion blur, illumination change, scale change and surpass the field of vision, the scene of homemade many rotor unmanned aerial vehicle dataset includes city, crowd, school and beach, homemade many rotor unmanned aerial vehicle dataset still includes different visual angles, with unmanned aerial vehicle's relative distance and different flight gesture.

The technical scheme of the invention is further improved as follows: the SiamRPN ++ model m ₁ in step S2 is a pre-training model that is trained on a tracking dataset comprising ILSVRC2015-DET, ILSVRC2015-VID, COCO2017, YOUTUBE-BoundingBoxes for tracking tasks, the SiamRPN ++ model comprising a feature extraction network ResNet50 and a region candidate network RPN.

The technical scheme of the invention is further improved as follows: in the step S3, the SiamRPN ++ model m ₁ is composed of a plurality of nonlinear primitive functions, when a forward propagation path is constructed, for the convolution layer structure of the model, the output features F _k ¹ of different convolution layers are selected, each layer of the SiamRPN ++ pre-training model has a plurality of convolutions, and the output feature of the last convolution of each layer is selected to be obtained and reserved as the output of the layer.

The technical scheme of the invention is further improved as follows: in the step S4, the cosine similarity is used to calculate the similarity of all image data points in the detected data with respect to the output feature F _k ¹ Can be expressed as:

In the formula, Representing the edge of the p-th node and the q-th node, and expressing the similarity between the features of the two nodes in the feature space F _k ¹ by using cosine similarity, wherein the specific calculation mode is as follows:

The technical scheme of the invention is further improved as follows: the specific way of calculating the attribute value of the input data node to the output feature by using the gradient input method in the step S5 is as follows:

For the pre-training model m ₁, one input data is given Calculate the value of the ith element in x _a for the factor of F _k ¹(x_a)The calculation method is as follows:

The technical scheme of the invention is further improved as follows: in the step S7, the correlation coefficient of the edge similarity between each layer of convolution and the feature embedding space of the last layer of convolution of the detection data is calculated by using the spearman correlation coefficient, and the calculation mode is as follows:

wherein d _i represents AndIs the difference in the ith element order.

By adopting the technical scheme, the invention has the following technical effects:

1) The migration fine tuning critical point of the unmanned aerial vehicle tracking model can be quickly found;

2) The required image data volume is small, the calculation time is short, and the calculation cost is reduced;

3) The training time is short, and the efficiency of the whole transfer learning process is improved.

Drawings

FIG.1 is a flow chart of the present invention;

FIG. 2 is a diagram of the SiamRPN ++ model structure to which the present invention applies.

Detailed Description

The invention is further described in detail below with reference to the attached drawings and specific examples:

An unmanned aerial vehicle tracking model migration learning method based on depth attribution patterns, as shown in fig. 1, comprises the following steps:

S1: by means of random sampling, 200 pieces of unmanned aerial vehicle data containing different attributes and different backgrounds are collected in Mavic multi-rotor unmanned aerial vehicle to serve as detection data D _p, the detection data is represented as D _p＝{x₁,x₂,…x_a,x_n, and n pieces of unmanned aerial vehicle image data are contained.

The detection data D _p is data randomly sampled in the self-made multi-rotor unmanned aerial vehicle data set, and the Xinjiang unmanned aerial vehicle model contained in the self-made multi-rotor unmanned aerial vehicle data set is Mavic. The data set comprises multiple attributes such as rapid movement, background confusion, similar interference, deformation, shielding, motion blurring, illumination change, scale change, beyond view field and the like, and comprises multiple scenes such as cities, crowds, schools, beaches and the like, different view angles, relative distances with the unmanned aerial vehicle, different flight attitudes and the like, and the actual scenes which the unmanned aerial vehicle may need to deal with are fully considered. When the acquisition of the detection data is carried out in a random sampling mode, a plurality of factors such as the attribute, the background and the like are fully considered, and the richness of the detection data is ensured.

S2: the completed tracking model SiamRPN ++ trained on the generic tracking dataset ILSVRC-DET, ILSVRC2015-VID, COCO2017, YOUTUBE-BoundingBoxes will be utilized as the deep neural network pre-training model m ₁.

The SiamRPN ++ model m ₁ is a pre-training model that is trained on a tracking dataset comprising ILSVRC2015-DET, ILSVRC2015-VID, COCO2017, youtbe-BoundingBoxes for tracking tasks, the SiamRPN ++ model comprising a feature extraction network ResNet and region candidate network RPN.

S3: constructing a forward propagation path, inputting detection data D _p into a selected deep neural network pre-training model m ₁, calculating an output characteristic F _k ¹ of a convolution layer every time an image in the detection data D _p passes through the convolution layer, and storing a result; through n convolution layers of the pre-training model m ₁, a knowledge pool Ω, Ω= { F ₁ ¹,F₂ ¹,…,F_k ¹,F_n ¹ } containing n output features F _k ¹ is constructed.

The SiamRPN ++ model m ₁ consists of a plurality of nonlinear primitive functions, when a forward propagation path is constructed, aiming at the convolution layer structure of the model, the output characteristics F _k ¹ of different convolution layers are selected, each layer of the SiamRPN ++ pre-training model has a plurality of convolutions, and the output characteristics of the last convolution of each layer are selected to be obtained and reserved as the output of the layer.

S4: calculating the similarity of the same output characteristic F _k ¹ between every two image data points in the detection data D _p by using cosine similarity to obtain the similarity of edgesCan be expressed as:

In the formula, Representing the edges of the p-th node and the q-th node, and expressing the similarity of the edges between the two nodes by using cosine similarity.

S5: constructing a reverse propagation path, inputting the detection data D _p into a selected deep neural network pre-training model m ₁, and calculating the attribution value of the input data x _a aiming at the characteristic output F _k ¹(x_a) by using a gradient input modeThe node attribution value for obtaining the output characteristic of the layer is

Specifically, the specific way of calculating the attribute value of the input data node to the output characteristic by using the gradient input mode is as follows:

S6: constructing a node attribution value sequence, and calculating the similarity of each layer of convolution and the last layer of convolution characteristic embedded space node attribution value in the pre-training model m ₁ by using cosine similarity

S7: constructing a similarity sequence of the feature embedding space according to the arrangement sequence of the convolution layers, and calculating the edge similarity of the feature embedding space of each layer of convolution and the final layer of convolutionAnd (3) withIs obtained as the correlation coefficientThe calculation method is as follows:

wherein d _i represents AndIs the difference in the ith element order.

S8: constructing a depth attribution map, and constructing a similarity function expressed as: And solving a correlation coefficient r _k of the depth attribution map of each layer convolution and the last layer convolution according to the similarity function.

S9: setting a correlation coefficient threshold r _set, comparing the correlation coefficient r _k obtained by each calculation with r _set, reserving if the correlation coefficient is larger than or equal to the threshold, otherwise discarding, and sorting the reserved correlation coefficient according to the increment of the value, wherein r _k can be selected as a final result.

Example 1

The detection data are randomly sampled data in a self-made multi-rotor unmanned aerial vehicle data set, and the self-made multi-rotor unmanned aerial vehicle data set comprises a Xinjiang unmanned aerial vehicle model Mavic. The data set comprises multiple attributes such as rapid movement, background confusion, similar interference, deformation, shielding, motion blurring, illumination change, scale change, beyond view field and the like, and comprises multiple scenes such as cities, crowds, schools, beaches and the like, different view angles, relative distances with the unmanned aerial vehicle, different flight attitudes and the like, and the actual scenes which the unmanned aerial vehicle may need to deal with are fully considered. When the acquisition of the detection data is carried out in a random sampling mode, a plurality of factors such as the attribute, the background and the like are fully considered, and the richness of the detection data is ensured.

S2, referring to FIG. 2, the invention takes SiamRPN ++ model as a pre-training model m ₁ aiming at unmanned aerial vehicle tracking tasks, wherein the SiamRPN ++ model consists of a feature extraction network ResNet and a region candidate network RPN. The ResNet network is a network model designed by using a residual error module, adopts a Bottleneck structure and can be divided into 5 convolution layers, wherein layer 1 contains 1 convolution, layer 2 contains 3 convolutions, layer 3 contains 4 convolutions, layer 4 contains 6 convolutions and layer 5 contains 3 convolutions.

S3, the invention adopts the output characteristic of the last convolution of each convolution layer as F _k ¹, constructs an output characteristic knowledge pool as Ω＝{F₁ ¹,F₂ ¹,F₃ ¹,F₄ ¹,F₅ ¹}; and takes the final output characteristic of the RPN part as F _e ¹.

S4, calculating the similarity of each layer of characteristics of the detection data D _p in the 1 st layer to the 5 th layer by using cosine similarity, and constructing a similarity matrix of the edgesAlso calculate the edge similarity of the RPN part

S5, calculating node attribution values of output characteristics of the detection data in each layer from layer 1 to layer 5 by using a gradient input mode, and constructing a node attribution value sequenceAlso calculate the node-attribute value of the RPN part

S6, calculating similarity between the node attribution value of each layer of convolution output characteristics and the node attribution value of the RPN part output characteristics in the node attribution value sequence by utilizing cosine similarityAnd retain the results.

S7, calculating the correlation coefficient of the similarity of each layer of output characteristic edge and the similarity of the RPN output characteristic edge in the edge similarity sequence by utilizing the Szechwan correlation coefficientAnd retain the results.

S8, constructing a depth attribution mapAndCalculating a similarity function

S9, under the influence of training data, the lambda is set differently, lambda=1 is set, 10% of unmanned aerial vehicle data are used for transfer learning, lambda=0.01 are 1% of unmanned aerial vehicle data are used for transfer learning, and a corresponding correlation coefficient r _k is obtained.

TABLE 1 depth attribution map correlation coefficient

The depth attribution map correlation coefficient of each layer convolution and RPN output characteristic is listed in table 1, the correlation coefficient is found to be basically gradually increased layer by layer, and the correlation coefficient phase difference of other layers except the 1 st layer is not obvious, because ResNet networks are deep neural networks, the characteristic with strong correlation can be learned in shallower layers, and therefore, a threshold value can be set according to the correlation coefficient, and the selection of the model migration fine tuning critical point can be carried out.

S10, when λ=1, selecting a threshold value r _set =1, and the corresponding correlation coefficients of the layers 2, 3, 4 and 5 are all larger than the threshold value, so that the corresponding k layer can be selected as a critical point for fine adjustment of model parameters; similarly, when λ=10, the threshold value r _set =5 is selected, and the critical point for fine tuning of the model parameters can be selected from layers 2, 3, 4, and 5.

Claims

1. The unmanned aerial vehicle tracking model migration learning method based on the depth attribution map is characterized by comprising the following steps of:

s3: constructing a forward propagation path, inputting detection data D _p acquired in the step S1 into a deep neural network pre-training model m ₁ in the step S2, and calculating the output characteristics of a convolution layer once when an image in the detection data D _p passes through the convolution layer And saving the result; through n convolution layers of the deep neural network pre-training model m ₁, a knowledge pool omega containing n output features is constructed,

S5: constructing a reverse propagation path, inputting the detection data D _p into a deep neural network pre-training model m ₁, and calculating the characteristic output of input data x _a by using a gradient input modeIs the value of the cause of (2)The node attribution value for obtaining the output characteristic of the layer isThe specific way of calculating the attribution value of the input data node to the output characteristic by using the gradient input mode is as follows:

For a deep neural network pre-training model m ₁, one input data is given Then calculate the ith element pair in x _a Is the value of the cause of (2)The calculation method is as follows:

2. The unmanned aerial vehicle tracking model migration learning method based on the depth attribution map of claim 1, wherein the method comprises the following steps of: the detection data D _p in the step S1 is data randomly sampled in a self-made multi-rotor unmanned aerial vehicle data set; the self-made multi-rotor unmanned aerial vehicle comprises a Sinkiang unmanned aerial vehicle model Mavic; the attribute of homemade many rotor unmanned aerial vehicle dataset includes rapid movement, background confusion, similar thing interference, deformation, shelter from, motion blur, illumination change, scale change and surpass the field of vision, the scene of homemade many rotor unmanned aerial vehicle dataset includes city, crowd, school and beach, homemade many rotor unmanned aerial vehicle dataset still includes different visual angles, with unmanned aerial vehicle's relative distance and different flight gesture.

3. The unmanned aerial vehicle tracking model migration learning method based on the depth attribution map of claim 1, wherein the method comprises the following steps of: the SiamRPN ++ model m ₁ in step S2 is a pre-training model that is trained on a tracking dataset comprising ILSVRC2015-DET, ILSVRC2015-VID, COCO2017, YOUTUBE-BoundingBoxes for tracking tasks, the SiamRPN ++ model comprising a feature extraction network ResNet50 and a region candidate network RPN.

4. The unmanned aerial vehicle tracking model migration learning method based on the depth attribution map of claim 1, wherein the method comprises the following steps of: in the step S3, the SiamRPN ++ model m ₁ is composed of a plurality of nonlinear primitive functions, and when constructing the forward propagation path, the output characteristics of different convolution layers are selected according to the convolution layer structure of the modelThe SiamRPN ++ model has multiple convolutions at each layer, and the output characteristic of the last convolution of each layer is selected to be obtained and reserved as the output of the layer.

5. The unmanned aerial vehicle tracking model migration learning method based on the depth attribution map of claim 1, wherein the method comprises the following steps of: in the step S4, all image data points in the detected data are calculated by using cosine similarity, and the output characteristics are aimed atSimilarity of (3)Can be expressed as:

In the formula, Representing the edges of the p-th node and the q-th node, and expressing the two nodes in the feature space by using cosine similarityThe similarity among the features is calculated by the following specific calculation modes:

6. The unmanned aerial vehicle tracking model migration learning method based on the depth attribution map of claim 1, wherein the method comprises the following steps of: in the step S7, the correlation coefficient of the edge similarity between each layer of convolution and the feature embedding space of the last layer of convolution of the detection data is calculated by using the spearman correlation coefficient, and the calculation mode is as follows:

wherein d _i represents AndIs the difference in the ith element order.