CN111783526B

CN111783526B - Cross-domain pedestrian re-identification method using posture invariance and graph structure alignment

Info

Publication number: CN111783526B
Application number: CN202010434344.9A
Authority: CN
Inventors: 李华锋; 庞健; 严双林; 欧洋汛; 张亚飞; 余正涛
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2022-08-05
Anticipated expiration: 2040-05-21
Also published as: CN111783526A

Abstract

The invention provides a cross-domain pedestrian re-identification method by utilizing posture invariance and graph structure alignment, belonging to the field of computer vision. The invention provides a dictionary learning algorithm based on matrix decomposition to eliminate the influence of domain information and pedestrian attitude information among data sets on cross-domain pedestrian re-identification. Specifically, the method is divided into two parts: (1) decomposing original visual features into attitude invariant components, domain information components and interference information components based on the idea of matrix decomposition, and aiming at extracting visual components which are not influenced by domain information and pedestrian attitude information; (2) in order to further improve the generalization capability of the model, the relation between the posture invariant feature and the semantic attribute is established by introducing hypergraph structure alignment constraint so as to accurately predict the pedestrian attribute of the target data set at the later stage, and finally the pedestrian similarity measurement can be carried out by combining the posture invariant feature and the semantic attribute of the pedestrian so as to further improve the recognition performance.

Description

Cross-domain pedestrian re-identification method using posture invariance and graph structure alignment

Technical Field

The invention relates to a cross-domain pedestrian re-identification method by utilizing posture invariance and graph structure alignment, belonging to the field of computer vision.

Background

With the rapid development of artificial intelligence, it is a need to apply pedestrian re-identification technology based on high-dimensional features to real life. Therefore, the scholars at home and abroad make a series of great research progresses in the aspect of pedestrian re-identification, and a plurality of methods are developed. Some methods design discriminative artifact features robust to changes in illumination, viewing angle, etc. for a target data set, or cluster unmarked target data. However, the performance of this kind of method is poor, mainly because the target data has no label, and the model is very difficult to mine the discriminant information. Some more advanced approaches view pedestrian re-identification as an unsupervised domain adaptation problem, which focuses on source domain to target domain knowledge migration. Compared to traditional unsupervised domain adaptation methods, pedestrian labels are completely different in the source domain and the target domain, and therefore the challenge is greater. Such methods still suffer from poor performance compared to supervised methods.

Disclosure of Invention

The invention aims to provide a cross-domain pedestrian re-recognition method by utilizing posture invariance and graph structure alignment, which is used for solving the problem that the existing pedestrian re-recognition algorithm is difficult to deploy; introducing an effective hypergraph structure alignment constraint, establishing a conversion relation between the posture invariant feature and the semantic attribute, and fully combining the advantages of the posture invariant feature and the semantic attribute to carry out joint measurement, wherein the specific flow is shown in figure 1. Compared with the existing method, the method can perform cross-domain re-recognition task, namely, the trained model is deployed to a brand-new camera network for pedestrian recognition.

A cross-domain pedestrian re-recognition method using posture invariance and graph structure alignment comprises the following steps:

1) defining data set variables and characteristics and attributes of pedestrians;

2) a design feature decomposition module for determining a target function containing a posture invariant component dictionary, a domain information component dictionary, an interference component dictionary and a conversion matrix;

3) designing a hypergraph structure alignment module by utilizing semantic attribute information;

4) designing a domain adaptation module capable of reducing domain offset;

5) merging the proposed loss functions into a final optimization function;

6) obtaining a dictionary and a conversion matrix by using an alternative optimization algorithm, thereby further obtaining a target domain data coding coefficient;

7) predicting the identity and the attribute of the pedestrian through the target domain coding coefficient;

8) and calculating the similarity between the pedestrians by using the cosine similarity and combining the predicted identity and the attribute.

The method comprises the following specific steps:

step 1, defining that K pedestrians exist in a source data set,

wherein

Representing the ith pedestrian feature of the source domain s, d representing the feature dimension,

representing the ith pedestrian attribute, c represents the attribute dimension,

indicates the i-th pedestrian label, N _s Indicating the number of samples. X _s ，A _s ，Y _s Respectively representing a source domain feature set, a source domain attribute set and a source domain label set. Defining a target dataset

Contains N in total _t The number of the samples is one,

the ith pedestrian feature of the target domain t is represented, and d represents the feature dimension. And using GOG pedestrian features at a feature level, and using the attributes of the existing data set as the attributes of the pedestrians.

Step 2: the following loss function Feature Decomposition term (FD) L is designed _FD The purpose of (1) decomposing a source domain feature set into an attitude invariant component, a domain component and an interference component:

wherein, V _s Denotes the total number of source domain views, X _s,v,i And (3) representing the features of the ith identity at the v view angle in the training set s. D _p ，D _d ，D _r Respectively representing an attitude invariant component dictionary, a domain information component dictionary, and an interference component dictionary. While

Represents X _s,v,i Corresponding to the coding coefficients of the three component dictionaries, respectively. I | · | purple wind _* Represents the kernel norm, | ·| non-woven phosphor of the matrix _2,1 Representing a structured sparse norm. Eta, lambda ₁ ，λ ₂ A regularization parameter is represented. Wherein phi (D) _r ,C ^p ,C ^r ) Regular terms that promote domain separation are represented, specifically as follows:

wherein C is ^p ，C ^r Representing the data set as a whole coding coefficients. Lambda [ alpha ] ₃ And λ ₄ Representing a regularization parameter. I and Q represent the identity matrix and identity matrix, respectively.

And step 3: in order to enhance the robustness and the domain invariance of the semantic attributes, the semantic attributes are introduced to assist cross-domain pedestrian re-identification. Loss function Hypergraph Structure Alignment (HSA) L _HSA Is represented as follows:

firstly, a hypergraph G (X, E) is constructed through image samples of a source domain and the identity of a pedestrian, and comprises a group of vertexes

And a set of super edges

Wherein | N _j I and | N _r And | respectively represents the number of vertexes and super edges. For any given hypergraph, its hyper-edges can be easily converted into a correlation matrix

α ₁ ，α ₂ ，β ₁ The representation of the hyper-parameter is,

representing two hypergraph laplacian regularizations, P and E represent linear transformation coefficient matrices, L-I-W represent hypergraph laplacian matrices,

a weight matrix representing a hypergraph to measure the degree of correlation between two vertices;

D _x and D _e Diagonal matrices representing the degrees of the super edge and the degrees of the vertex, respectively. W _e A diagonal matrix representing super-edge weights.

And 4, step 4: in order to solve the Domain deviation, a Domain Adaptation item is introduced, part of unlabeled data of the target Domain participates in the training of a characteristic decomposition model, and a Domain Adaptation (DA) L is lost _DA Is represented as follows:

wherein, V _t Represents the total number of views of the target domain, N _t Representing the number of samples, X, of the target domain _t,v,i And (3) representing the pedestrian image feature sequence of the ith identity at the v view angle in the target data set t. While

Represents X _t,v,i Corresponding to three component dictionaries D respectively _p ，D _d ，D _r The coding coefficients of (1). Lambda [ alpha ] ₂ Is a regularization parameter. Finally, the entire objective function is represented as:

L＝L _FD +L _HSA +L _DA (6)。

and 5: the proposed functions are then consolidated and merged, and the overall loss function L in step 4 can be expanded into the following form:

step 6: and 5, solving 9 variables, solving each variable by using an alternating iterative optimization algorithm, wherein other variables need to be fixed when one variable is solved in the process. Obtaining an attitude invariant component dictionary D by solving _p Domain information component dictionary D _d Dictionary of interference components D _r And transformation matrices P and E. With these dictionaries, the corresponding coding coefficients can be calculated by the following formula

ζ represents a regularization parameter.

And 7: when calculated, get

Then, using the transformation matrices P, E found in step 6, h can be found by equations (9) and (10) _t,i And a _t,i ：

In the above formula, h _t,i And E can be considered constant by finding the optimum a _t,i The minimum value is taken after the F norm of the right term is squared, and the a at the moment is obtained _t,i . With predicted identity representation h for the test sample _t,i And semantic Attribute a _t,i 。α ₂ The regularization parameters are represented.

And 8: finally, the similarity achievement sim of the pedestrian image pair in the identity space and the semantic space can be respectively calculated through the cosine distance calculation formula of the equation (11) _h And sim _a 。

Wherein z is _a And z _b Respectively representing the current pedestrian identity expression vector and the semantic attribute vector and h obtained in the step 7 _t,i And a _t,i Are represented by the same, with the difference that z _a And z _b Broadly refers to the identity representation and semantic attributes of the current pedestrian, and h _t,i ，a _t,i An identity representation and semantic attributes representing the ith pedestrian. ε is a constant of 0.0000001. And (4) weighting and summing the similarity scores respectively obtained by the identity space and the semantic attribute space, and taking the weighted similarity score as a final pedestrian to perform similarity measurement on the similarity score.

sim _final ＝τsim _a +(1-τ)sim _h (12)

Where τ > 0 represents the weight occupied by each space. In the present invention, τ is set to 0.2. Through the method, the similarity of the pedestrians in the target data set can be finally measured by using the solved variable.

The invention has the following beneficial effects:

(1) by the aid of the proposed decomposition model, influence of domain information and pedestrian posture information among data sets on cross-domain pedestrian re-identification is eliminated, and differences among different domains are reduced. The method is beneficial to the model to extract the more robust characteristics of the pedestrian in the real scene.

(2) By introducing an effective hypergraph structure alignment constraint, a conversion relation between the posture invariant feature and the semantic attribute is established, and the model is more discriminative for different pedestrians by combining a similarity measurement method performed by the two, for example, the appearances of two pedestrians are very similar, but the two pedestrians can be prevented from being identified as the same pedestrian through attribute information, so that misjudgment is avoided.

Drawings

FIG. 1 is a flow chart of a cross-domain pedestrian re-identification method using gesture invariance and graph structure alignment according to the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

Example 1: as shown in fig. 1, a cross-domain pedestrian re-identification method using posture invariance and graph structure alignment includes the following steps:

4) designing a domain adaptation module capable of reducing domain offset;

5) merging the proposed loss functions into a final optimization function;

The method comprises the following specific steps:

step 1, defining that K pedestrians exist in a source data set,

wherein

Contains N in total _t The number of the samples is one,

Step 2: the following loss function Feature Decomposition term (FD) L is designed _FD Is to set the source domain features

Decomposed into posturesInvariant component, domain component, interference component:

Represents X _s,v,i Corresponding to the coding coefficients of the three component dictionaries, respectively. I | · | purple wind _* Represents the kernel norm, | ·| non-woven phosphor of the matrix _2,1 Representing a structured sparse norm. Eta, lambda ₁ ，λ ₂ A regularization parameter is represented. Wherein Φ D _r ,C ^p ,C ^r ) Regular terms that promote domain separation are represented, specifically as follows:

And a set of super edges

α ₁ ，α ₂ ，β ₁ The representation of the hyper-parameter is,

wherein, V _t Represents the total number of views of the target domain, N _t Representing the number of samples, X, of the target domain _t,v,i Pedestrian image feature sequence representing ith identity at v view angle in target data set t. While

L＝L _FD +L _HSA +L _DA (6)。

and 6: in the step 5, 9 variables need to be solved, each variable is solved by using an alternating iterative optimization algorithm, and other variables need to be fixed in the process of solving one variable. Obtaining an attitude invariant component dictionary D by solving _p Domain information component dictionary D _d Dictionary of interference components D _r And transformation matrices P and E. With these dictionaries, the corresponding coding coefficients can be calculated by the following formula

ζ represents the regularization parameter.

And 7: when calculated, get

Then, using the transformation matrices P, E obtained in step 6, h can be obtained by equations (9) and (10) _t,i And a _t,i ：

In the above formula, h _t,i And E can be considered constant by finding the optimum a _t,i The minimum value is taken after the F norm of the right term is squared, and the a at the moment is obtained _t,i . With predicted identity representation h for the test sample _t,i And semantic Attribute a _t,i 。α ₂ Representing a regularization parameter.

Wherein z is _a And z _b Respectively representing the current pedestrian identity expression vector and the semantic attribute vector and h obtained in the step 7 _t,i And a _t,i Are identical, except that z _a And z _b Broadly refers to the identity representation and semantic attributes of the current pedestrian, and h _t,i ，a _t,i An identity representation and semantic attributes representing the ith pedestrian. ε is a constant of 0.0000001. And (4) weighting and summing the similarity scores respectively obtained by the identity space and the semantic attribute space, and taking the weighted similarity score as a final pedestrian to perform similarity measurement on the similarity score.

sim _final ＝τsim _a +(1-τ)sim _h (12)

In the model proposed above, there are 11 parameters to be set, including dictionary D _p ，D _d ，D _r Atom size d of _p ,d _d ,d _r And the regularization term parameter λ ₁ ,λ ₂ ,λ ₃ ,λ ₄ ,α ₁ ,α ₂ β, ζ. In the experiment, these parameters were set to d, respectively _p ＝600,d _d ＝180,d _r ＝180，λ ₁ ＝0.0001,λ ₂ ＝0.0001,λ ₃ ＝0.01,λ ₄ ＝1,α ₁ ＝0.1,α ₂ ＝0.1,β＝0.1,ζ＝0.1。

The GOG features are used as visual features of pedestrians, and standard semantic attributes which are already represented are used as attributes of the pedestrians. To demonstrate that the algorithm can be deployed in real life, experiments were conducted on the VIPeR dataset. The data set contains two cameras, each capturing one image per person. The data set has various pedestrian attitude changes, as well as visual angles and illumination changes. And taking prid2011 and grid as source data sets, and averagely dividing the model into training and testing. Training was repeated 10 times to obtain the average as the final performance. The comparison results are shown in table 1. The experiment proves that the method can directly deploy the trained model to the VIPer scene for recognition and keep good recognition rate.

TABLE 1 VIPeR data set

The invention also carries out experiments on the CUHK01 data set, the data set is collected from the campus scene of Chinese university in hong Kong, the cameras are respectively arranged in a teaching building and an outdoor scene, and the visual angle is wide step by step. Tests were performed with VIPeR as the source data set and CUHK01 as the target data set. The results are shown in table 2, which also shows the performance of other processes, from which it can be seen that the process achieves a relatively high performance.

TABLE 2 CUHK01 dataset

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A cross-domain pedestrian re-recognition method using posture invariance and graph structure alignment is characterized in that: the method comprises the following steps:

4) designing a domain adaptation module capable of reducing domain offset;

5) merging the proposed loss functions into a final optimization function;

8) calculating the similarity between the pedestrians by using the cosine similarity in combination with the predicted identity and attribute;

the method comprises the following specific steps:

step 1, defining that K pedestrians exist in a source data set,

wherein

indicates the ith pedestrian label, N _s Denotes the number of samples, X _s ，A _s ，Y _s Respectively representing a source domain feature set, a source domain attribute set, a source domain label set and defining a target data set

Contains N in total _t The number of the samples is one,

representing the ith pedestrian feature of the target domain t, using the GOG pedestrian feature on the feature level, and using the attribute of the existing data set as the attribute of the pedestrian;

step 2: the loss function characteristic decomposition term L is designed as follows _FD Is to set the source domain features

Decomposition into pose-invariant components, domain components, interference components:

wherein, V _s Denotes the total number of source domain views, X _s,v,i Features representing the ith identity at the v view in the training set s, D _p ，D _d ，D _r Respectively represent an attitude invariant component dictionary, a domain information component dictionary, and an interference component dictionary, and

represents X _s,v,i Corresponding to the coding coefficients of the three-component dictionary, | | · | | non-woven phosphor _* Represents the kernel norm, | ·| non-woven phosphor of the matrix _2,1 Indicating knotConstructed sparse norm, η, λ ₁ ，λ ₂ Represents a regularization parameter, where Φ (D) _r ,C ^p ,C ^r ) Regular terms that promote domain separation are represented, specifically as follows:

wherein C is ^p ，C ^r Representing the overall coding coefficient, λ, of the data set ₃ And λ ₄ Representing a regular parameter, wherein I and Q respectively represent an identity matrix and an identity matrix;

and step 3: in order to enhance the robustness and the domain invariance of semantic attributes, the semantic attributes are introduced to assist cross-domain pedestrian re-identification, and a loss function hypergraph structure alignment item L _HSA Is represented as follows:

And a set of super edges

Wherein | N _j I and | N _r L respectively represents the number of vertexes and super edges, and for any given super graph, the super edge can be easily converted into a correlation matrix

α ₁ ，α ₂ ，β ₁ Denotes a hyperparameter, tr (C) ^p LC ^pT ) Representing two hypergraph laplacian regularizations, P and E represent linear transformation coefficient matrices, L-I-W represent hypergraph laplacian matrices,

D _x and D _e Diagonal matrices, W, representing the degrees of the super-edges and the degrees of the vertices, respectively _e A diagonal matrix representing super-edge weights;

and 4, step 4: in order to solve the domain deviation, a domain adaptation item is introduced, part of unmarked data of the target domain is used for participating in the training of a characteristic decomposition model, and a function domain adaptation item L is lost _DA Is represented as follows:

wherein, V _t Represents the total number of views of the target domain, N _t Representing the number of samples, X, of the target domain _t,v,i A sequence of pedestrian image features representing the ith identity at the v view angle in the target data set t, and

represents X _t,v,i Corresponding to three component dictionaries D respectively _p ，D _d ，D _r A coding coefficient of (a) ₂ To regularize the parameters, finally, the entire objective function is expressed as:

L＝L _FD +L _HSA +L _DA (6)

step 6: in the step 5, 9 variables need to be solved, each variable is solved by using an alternative iterative optimization algorithm, in the process, one variable needs to be fixed with other variables, and the attitude invariant component dictionary D is obtained by solving _p Domain information component dictionary D _d Dictionary of interference components D _r And transformation matrices P and E, with these dictionaries, whose corresponding coding coefficients can be calculated by the following formula

ζ represents a regularization parameter;

and 7: when calculated, get

In the above formula, h _t,i And E can be considered constant by finding the optimum a _t,i The minimum value is taken after the F norm of the right term is squared, and the a at the moment is obtained _t,i For the test sample, there is a predicted identity representation h _t,i And semantic Attribute a _t,i ，α ₂ Representing a regularization parameter;

and 8:finally, the similarity achievement sim of the pedestrian image pair in the identity space and the semantic space can be respectively calculated through the cosine distance calculation formula of the equation (11) _h And sim _a ，

Wherein z is _a And z _b Respectively representing the current pedestrian identity expression vector and the semantic attribute vector and h obtained in the step 7 _t,i And a _t,i Are represented by the same, with the difference that z _a And z _b Broadly refers to the identity representation and semantic attributes of the current pedestrian, and h _t,i ，a _t,i Representing the identity representation and semantic attribute of the ith pedestrian, wherein epsilon is a constant of 0.0000001, weighting and summing similarity scores obtained from an identity space and a semantic attribute space respectively, and taking the weighted similarity score as a final pedestrian to perform similarity measurement on the similarity score:

sim _final ＝τsim _a +(1-τ)sim _h (12)

wherein tau > 0 represents the weight occupied by each space, and tau is set to be 0.2, and finally the similarity of pedestrians in the target data set can be measured by using the solved variable.