CN110033007A - Attribute recognition approach is worn clothes based on the pedestrian of depth attitude prediction and multiple features fusion - Google Patents

Attribute recognition approach is worn clothes based on the pedestrian of depth attitude prediction and multiple features fusion Download PDF

Info

Publication number
CN110033007A
CN110033007A CN201910321093.0A CN201910321093A CN110033007A CN 110033007 A CN110033007 A CN 110033007A CN 201910321093 A CN201910321093 A CN 201910321093A CN 110033007 A CN110033007 A CN 110033007A
Authority
CN
China
Prior art keywords
image
pixel
model
label
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910321093.0A
Other languages
Chinese (zh)
Other versions
CN110033007B (en
Inventor
柯逍
李振达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201910321093.0A priority Critical patent/CN110033007B/en
Publication of CN110033007A publication Critical patent/CN110033007A/en
Application granted granted Critical
Publication of CN110033007B publication Critical patent/CN110033007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Attribute recognition approach is worn clothes based on the pedestrian of depth attitude prediction and multiple features fusion the present invention relates to a kind of.This method is matched by external appearance characteristic first, and selected part search result is used for subsequent Attribute Recognition;Then by the depth estimation method of human posture based on SSD, energy effective position goes out in image to belong to the foreground area of pedestrian, and preferably excludes contextual factor interference;The parsing result of various ways is finally merged, and combines iteration smoothing process, the mode for taking maximum a posteriori probability to distribute reinforces the correlation between attribute tags and pixel, obtains final attribute parsing recognition result.The present invention solves the problems such as inaccurate tag recognition under single analysis mode, pixel resolution areas deviation.This method simple and flexible has stronger practical application.

Description

Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion
Technical Field
The invention belongs to the fields of computer vision, deep learning and image processing, and is applied to scenes such as intelligent monitoring, pedestrian re-identification and the like, in particular to a pedestrian clothing attribute identification method based on deep attitude estimation and multi-feature fusion.
Background
The identification of pedestrian attributes in surveillance images acquired from surveillance videos in the real world is more challenging for the following reasons: (1) poor imaging quality, generally low resolution, and susceptibility to motion blur; (2) the attributes may be influenced by the appearance of clothes worn or worn by the pedestrian, and the corresponding attributes are located at different spatial positions in the image due to different postures of the pedestrian in different images; (3) tag attribute data from surveillance video images is difficult to collect and can only be obtained in small quantities. These factors make it very difficult to learn a pedestrian attribute model through training. Early attribute identification methods relied primarily on manually extracted features such as color or text annotations of the item. In recent years, the pedestrian attribute recognition model based on deep learning starts attracting more and more people to research, because the model obtained by deep learning has stronger and stable learning ability under a large-scale data set, and a universal model capable of representing complex characteristics can be obtained. Meanwhile, due to the poor quality of images obtained through monitoring videos, the low resolution and the complex change of the appearance of clothes in monitored scenes, the situation of deep learning in the identification of the attributes of the people is undoubtedly improved by the factors.
Pedestrian clothing attribute identification corresponds to a multi-Label image classification (MLIC-Multiple Label image classification) problem. Existing approaches have explored sequential multi-label prediction. These methods are based on CNN-RNN model design. Importantly, these existing MLIC models assume (1) availability of large scale labeling training data (2) sufficiently good image quality. Both of these assumptions are invalid for pedestrian attribute identification in the surveillance image. A recent multi-person image annotation approach advances this continuous MLIC paradigm by combining additional interpersonal social relationships and scene context. This approach takes advantage of the background of family members and friend-centric high resolution photographic images in particular, but does not extend to open world surveillance scenes of objectionable image data. Furthermore, strong attribute level tags are required, whereas pedestrian attributes are mostly weak tags at the image level.
The reason why the pedestrian attribute label in the image obtained in the monitoring scene is in the weak level is also the reason why the existing attribute identification method has positioning deviation caused by the influence of environmental factors during identification.
In order to solve the problems, the method for estimating the pedestrian posture is provided for further defining a foreground region and a background region of a pedestrian in an image and eliminating the interference of background factors. And the image quality in a monitoring scene is improved by an image processing mode, and the accuracy of the identification method is enhanced by means of fusion characteristics and a fusion attribute identification method.
Disclosure of Invention
The invention aims to provide a pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion, which overcomes the defects in the prior art and solves the problems of inaccurate label identification and pixel analysis area deviation in a single analysis mode.
In order to achieve the purpose, the technical scheme of the invention is as follows: a pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion comprises the following steps:
s1, preprocessing an input image in a monitoring scene in an image denoising and image enhancement mode to improve the image quality;
step S2, performing attitude estimation based on a Deep Convolutional Neural Network (DCNN) on the preprocessed input image, defining foreground and background areas in the image, and taking attitude characteristics as one part of fusion characteristics;
s3, extracting fusion features from the foreground region of the image processed in the S2, and performing feature dimensionality reduction through PCA;
step S4, performing a style search on the input image using a common data set, the search including: labeling similar image samples and garment labels of the image samples;
and step S5, inputting the obtained fusion characteristics of different forms into the designed pedestrian clothing attribute identification frame to obtain the final clothing attribute identification result.
In an embodiment of the present invention, the step S1 is specifically implemented as follows:
step S11, solving the interference caused by motion blur in a monitoring scene through an image denoising method based on blind deconvolution, and obtaining a restored image:
assume an initial restored image f0And a degradation function g0And the fuzzy functions of all parts of the image are the same, and the input image before fuzzy interference is obtained through the following iterative formula:
wherein,representing the degradation function at the kth iteration of the ith round, fi kRepresenting the restored image at the ith iteration, c (x) is the degraded image, i.e. the original input image,performing convolution operation;
step S12, image enhancement is carried out through a multi-scale Retinex algorithm with color recovery, and the color expression of the image under the monitoring scene is enhanced:
defining the incident light as L (x, y), the reflected image of the object as R (x, y), and the reflected image as S (x, y), then there are: s (x, y) ═ L (x, y) · R (x, y), mapping all three components to the log domain, then there is a corresponding log result of
logS(x,y),logL(x,y),logR(x,y)
The above formula can then be converted into:
logS(x,y)=logL(x,y)+logR(x,y)
introducing color channel weights omega in the case of multiscaleiThen, there are:
wherein K represents the number of the center surrounding functions F;
definition CiOne of the three channels is shownThe color recovery factor of the channel is used to balance the ratio between the three channels, and highlight the relatively dark area to achieve the purpose of eliminating distortion, so the model of the MSRCR can be expressed as:
and finally mapping the real number domain to a real number domain to obtain a final enhancement result.
In an embodiment of the present invention, in the step S2, a specific manner of performing the pose estimation based on the deep convolutional neural network DCNN on the preprocessed input image is as follows:
step S21, constructing an image model G ═ (V, E) to visually represent a human body model, where V represents a joint point of a human body or a certain part of the body, E is an edge connecting between nodes, E ∈ V × V represents a spatial relationship between adjacent nodes, and K ═ V | represents the number of joints; definition I represents an image, I represents the ith node in the image, l represents the pixel coordinate of the node, t represents the mixed spatial relationship of the nodes, the mixed spatial relationship is abstracted and combined by different pose instances, and then according to the definition in an image model, I belongs to { 1.,. K }, l is definedi∈{1,...,L},ti∈{1,...,T},liPixel coordinate { (x) representing node ii,yi)},tiRepresenting a spatial relationship type set of a node i and adjacent nodes thereof, namely T gesture types at the node;
step S22, from step S21, the appearance model of the human joint part may be represented as:
and is provided with
φ(li,ti|I;θ)=logp(li,ti|I;θ)
Wherein, p (l)i,tiI; theta) is a probability domain obtained by mapping a score result finally calculated by forward propagation Softmax function in DCNN, and the mixed posture type of the node part I in the image I is predicted to be tiAnd the pixel coordinate is at liθ is a parameter of the model;
the inter-joint spatial relationship may be expressed as:
adding standard quadratic variation to the spatial relationship model, with definition < d (l)i-lj)>=[dx dx2dy dy2]TAnd dx ═ xi-xj、dy=yi-yjRepresenting the relative pixel location of node i with respect to node j;
finally, the obtained product is
The formula represents a human body posture estimation model, and the tree structure represents the high efficiency of the acceleration calculation;
s23, carrying out clustering operation on the local image blocks obtained by preprocessing according to the spatial relative position of the central joint and the adjacent joints by a K-means clustering method to obtain a pre-training model;
and step S24, training by using the DCNN, mapping the image to different gesture types through a score function model, adjusting the weight and the parameters through a loss function according to the labeling information to enable the mapped score result to be consistent with the actual category, completing the classification of the gesture types, and obtaining the DCNN multi-classification model.
In an embodiment of the present invention, the step S3 is specifically implemented as follows:
s31, extracting simple features including color, gradient and texture from the input image, and fusing the simple features into different fusion complex features according to different attribute identification stages;
step S32, obtaining m pieces of n-dimensional data after passing through a feature descriptor, and forming a matrix X with m rows and n columns from the original data according to columns;
step S33, subtracting the average value of each line of X;
step S34, solving a covariance matrix;
step S35, solving the eigenvalue of the covariance matrix and the corresponding eigenvector r;
step S36, arranging the eigenvectors r into a matrix from top to bottom according to the corresponding eigenvalue size, and taking the first n' rows to form a new matrix P;
in step S37, the matrix P is the data after dimension reduction to n'.
In an embodiment of the present invention, the step S4 is specifically implemented as follows:
step S41, firstly, establishing a KD-tree index tree on a public data set;
s42, selecting fusion characteristics as sample characteristics, comparing the appearance fusion characteristics of the input image and the samples in the data set, and searching a KD-tree by a KNN clustering method in combination with L2-distance;
and step S43, selecting the first 25 results to form a nearest neighbor sample set, and forming candidate labels by the labeled clothing label information of the samples, so as to provide help for the subsequent clothing attribute identification.
In an embodiment of the present invention, the step S5 is specifically implemented as follows:
step S51, defining the pixel of the image in the sample as i, and the predicted clothing label of the pixel as liThe complex characteristic of the pixel is fi(ii) a Defining a nearest neighbor sample set obtained through nearest neighbor retrieval as D, defining a labeled label set in the sample set as tau (D), and t represents a clothing category label; each analysis is defined with a mixing parameter of lambda ≡ [ lambda ] respectively123]So as to perform final confidence combination;
step S52, fusing global identification based on logistic regression, approximate identification based on nearest neighbor sample, and improving identification accuracy based on the identification result of migration identification of mask conversion:
the functional model of global recognition pixel-tag confidence is as follows:
p denotes a given complex feature fiAnd a model parameter thetat gRepresents the probability value of the presence of one of the labels t in this sample, 1[ ·]Is an indicator function, which indicates that the label t is a member of the label set of nearest neighbor samples, and the model parameter θt gTraining was performed using The fahisonitatadataset as a positive sample;
the functional model for approximate recognition pixel-tag confidence is as follows:
model parametersTraining by using a nearest neighbor sample set D as a training set;
the function model of migration-resolved pixel-label confidence based on mask transformation is:
where j denotes the pixel present in the nearest neighbor sample superpixel block, the parameter θt gIs a model parameter in global parsing because M (l)i,si,d) The mean value of the logistic regression results obtained by carrying out global analysis on the nearest neighbor sample super-pixel block area is shown;
step S53, the single confidence level is not enough to guarantee the accuracy of the label assignment result, therefore, consider the fusion analysis of the three kinds of analyses, Λ ≡ [ λ ≡ b123]Defining three analytic weight ratios respectively, and calculating the confidence coefficient of each clothing label-pixel through a fusion model:
step S54, combining with the iterative smoothing processing procedure: define the label assignment of all pixels as L ≡ { L ≡ LiThe condition of labeling the clothing label on each pixel in the image and the clothing item appearance modelWherein theta ist cIs a fused appearance model of the labels t of the clothing categories, the final optimization result is to find the optimal pixel-label distribution L*And appearance model set theta*(ii) a At the beginning of the iterative process, an initial pixel-label assignment is defined asRepresenting the pixel-label result obtained by MAP allocation in combination with the confidence of the first pass through fusion analysis, the fusion appearance model set of the initial clothing category label isExpressing a set of logistic regression models for clothing categories, useTraining as training data; the process of defining the iteration is represented by a constant k for the number of iterations performed and E for the adjacent pixel pair, then at the kth iteration its pixel-label assignment isFusion appearance modelFor this, the model for optimizing the pixel-label distribution case is as follows:
wherein:and obtaining a final identification result.
Compared with the prior art, the invention has the following beneficial effects: the method further defines the foreground region and the background region of the pedestrians in the image by combining the human body posture estimation based on the deep learning, and eliminates the interference of background factors. Meanwhile, the image quality in the monitoring scene is improved by image denoising and image enhancement methods. Moreover, a plurality of simple features are fused to form complex description features, the expressive force of the clothing attributes is strengthened, the result of a single recognition mode is fused, and the recognition accuracy is improved through iterative smoothing processing. The method integrates the depth human body posture estimation result and multiple fusion characteristics, and can accurately identify the clothes attribute of the pedestrian. The method is simple, flexible to implement and high in practicability.
Drawings
FIG. 1 is a flow chart of a pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
As shown in FIG. 1, the invention provides a pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion. Aiming at the problems that the existing attribute identification method has environmental factor interference and further influences positioning accuracy and the like, the pedestrian attribute identification method based on pedestrian attitude estimation and multi-feature fusion is provided. The method comprises the steps of firstly, selecting a part of retrieval results for subsequent attribute identification through appearance feature matching. And then, a foreground region belonging to the pedestrian in the image can be effectively positioned by a depth human body posture estimation method based on the SSD, and the interference of background factors is well eliminated. And finally, combining the analysis results in various modes, combining an iterative smoothing process, and adopting a mode of maximum posterior probability distribution to strengthen the correlation between the attribute labels and the pixels to obtain a final attribute analysis recognition result. The invention solves the problems of inaccurate label identification, pixel analysis area deviation and the like in a single analysis mode. The method comprises the following specific steps:
and step S1, improving the image quality of the input image in the monitoring scene through a classical image denoising and image enhancement mode.
And step S2, performing gesture estimation based on a Depth Convolution Neural Network (DCNN) on the input image to define foreground and background areas in the image, wherein the gesture features are also used as one part of the fusion features, and the subsequent identification work is expanded based on the foreground area in the image.
And step S3, extracting multiple simple features such as color, gradient, texture and the like from the input image, fusing the multiple simple features into complex features, and performing feature dimensionality reduction through PCA.
And step S4, performing style retrieval on the input image by using a public data set, wherein the final retrieved result is used for assisting subsequent attribute identification. The result comprises a similar image sample, b the clothing label of the image sample.
And step S5, inputting the obtained fusion characteristics of different forms into the designed pedestrian clothing attribute identification frame to obtain the final clothing attribute identification result.
Further, in the present embodiment, in the step S1, the image quality is improved by:
and step S11, solving the interference caused by motion blur in the monitored scene to a certain extent through an image denoising method based on blind deconvolution, and obtaining a restored image. Assume an initial restored image f0And a degradation function g0And the fuzzy functions of all parts of the image are the same, and the input image before fuzzy interference is obtained through the following iterative formula:
whereinRepresenting the degradation function at the kth iteration of the ith round, fi kRepresenting the restored image at the ith iteration, c (x) is the degraded image, i.e. the original input image,is a convolution operation.
Step S12, performing image enhancement by using a multi-Scale Retinex (multiple Scale Retinex with color retrieval msrcr) algorithm with color recovery, and enhancing the color representation of the image in the monitored scene. Defining the incident light as L (x, y), the reflected image of the object as R (x, y), and the reflected image as S (x, y), then there are: s (x, y) ═ L (x, y) · R (x, y), mapping all three components into the log domain, then there are corresponding log results log S (x, y), log L (x, y), log R (x, y), then the above equation can be translated into: log s (x, y) ═ log l (x, y) + log r (x, y), and in the case of multiscale, color channel weight ω is introducediThen, there are:
k represents the number of the center surround functions F and takes a value of 3.
Definition CiThe color recovery factor of a channel in the three channels is used to balance the ratio between the three channels, and highlight a relatively dark region to achieve the purpose of eliminating distortion, so the model of the MSRCR can be expressed as:and finally mapping the real number domain to a real number domain to obtain a final enhancement result.
Further, in the present embodiment, in the step S2, the DCNN-based human body pose estimation is performed through the following steps:
step S21, building an image model G ═ (V, E) to visually represent a human body model, where V represents a joint point of the human body or a certain part of the body, E is an edge connecting between nodes, E ∈ V × V represents a spatial relationship between adjacent nodes, and K ═ V | represents the number of joints. Defining I to represent an image, I to represent the ith node in the image, l to represent the pixel coordinate of the node, and t to represent the mixed spatial relationship of the nodes (cluster extraction is carried out by different pose instances)Like union), then according to the definition in the image model, there is i e {1i∈{1,...,L},ti∈{1,...,T},liPixel coordinate { (x) representing node ii,yi)},tiRepresenting a set of spatial relationship types of node i with its neighbors (i.e., T pose types at that node).
Step S22, the appearance model of the human joint part can be expressed as:and has phi (l)i,ti|I;θ)=logp(li,tiI; θ) where p (l)i,tiI; theta) is a probability domain obtained by mapping a score result finally calculated by forward propagation Softmax function in DCNN, and the mixed posture type of the node part I in the image I is predicted to be tiAnd the pixel coordinate is at liIs a parameter of the model. The inter-joint space can be expressed as: adding standard quadratic variation to the spatial relationship model, with definition < d (l)i-lj)>=[dx dx2dy dy2]TAnd dx ═ xi-xj、dy=yi-yjIndicating the relative pixel location of node i with respect to node j.
Finally obtaining
The above formula represents a human body posture estimation model, and represents high efficiency in accelerated computation in a tree structure.
And step S23, carrying out clustering operation on the local image blocks obtained by preprocessing according to the spatial relative position of the central joint and the adjacent joints by a K-means clustering method to obtain a pre-training model, and providing help for subsequent training.
And step S24, training by using a DCNN, mapping the image to different gesture types through a score function model, adjusting weight and parameters through a loss function according to the labeling information to enable the mapped score result to be consistent with the actual category, and completing classification of the gesture types to obtain the DCNN multi-classification model.
Further, in this embodiment, in step S3, the image fusion features are extracted and feature dimensionality reduction is performed to reduce the computational complexity by:
and step S31, extracting simple features such as color, gradient, texture and the like from the input image, and fusing the simple features into different fusion complex features according to different attribute identification stages.
Step S32, obtaining m pieces of 39168-dimensional data after passing through a feature descriptor, and forming a matrix X with m rows and 39168 columns by the original data according to columns;
step S33, subtracting the average value of each line (representing an attribute field) of X from the average value of the line;
step S34, solving a covariance matrix;
step S35, solving the eigenvalue of the covariance matrix and the corresponding eigenvector r;
step S36, arranging the eigenvectors into a matrix from top to bottom according to the size of the corresponding eigenvalue, and taking the front 441 rows to form a new matrix P;
in step S37, the matrix P is the data after dimension reduction to 441 dimensions.
Further, in the present embodiment, in the step S4, the style search and the acquisition of the candidate tag are performed to assist the subsequent clothing attribute identification by:
step S41 is to first build a KD index tree on a common data set.
And S42, selecting the fusion characteristics as sample characteristics, comparing the appearance fusion characteristics of the input image and the samples in the data set, and searching the KD-tree by a KNN (KNearestNeighbones) clustering method in combination with L2-distance.
And step S43, selecting the first 25 results to form a nearest neighbor sample set, and forming candidate labels by the labeled clothing label information of the samples, so as to provide help for the subsequent clothing attribute identification.
Further, in this embodiment, in step S5, the results obtained by fusing the depth human body pose estimation exclude the interference of background factors as much as possible, and the results of multiple recognition modes are fused, and the iterative smoothing process is combined to enhance the recognition accuracy:
step S51, defining the pixel in the sample as i, and the predicted clothing label of the pixel is liThe complex characteristic of the pixel is fi. A nearest neighbor sample set obtained by nearest neighbor search is defined as D, a labeled label set in the sample set is defined as τ (D), and t represents a label of the clothing category. Each analysis is defined with a mixing parameter of lambda ≡ [ lambda ] respectively123]Thereby making the final confidence combination.
And step S52, fusing global identification based on logistic regression, approximate identification based on nearest neighbor samples and identification results of migration identification based on mask conversion, and improving identification accuracy. The functional model of global recognition pixel-tag confidence is as follows:
Cglobal(lifi,D)≡P(li=tfit g)·1[t∈τ(D)]p denotes a given complex feature fiAnd a model parameter thetat gRepresents the probability value of the existence of a certain label t in this sample, 1[ ·]Is an indicator function, which indicates that the label t is a member of the label set of nearest neighbor samples, and the model parameter θt gTraining was performed using the fashionistadataset as a positive sample. The functional model for approximate recognition pixel-tag confidence is as follows:p denotes a given complex feature fiAnd model parametersResults of logistic regression, model parametersThe nearest neighbor sample set D is used as a training set for training. The function model of migration-resolved pixel-label confidence based on mask transformation is:
where j denotes the pixel present in the nearest neighbor sample superpixel block, the parameter θt gIs a model parameter in global parsing because M (l)i,siAnd d) represents the average value of the logistic regression results obtained by performing global analysis on the nearest neighbor sample superpixel block region.
Step S53, the single confidence level is not enough to guarantee the accuracy of the label assignment result, therefore, consider the fusion analysis of the three kinds of analyses, Λ ≡ [ λ ≡ b123]Defining three analytic weight ratios respectively, and calculating the confidence coefficient of each clothing label-pixel through a fusion model:
and step S54, combining with the iterative smoothing processing procedure. Define the label assignment of all pixels as L ≡ { L ≡ LiThe (i.e. the case of labeling the clothing label for each pixel in the image) and the clothing item appearance modelWhereinIs a fused appearance model of the labels t of the clothing categories, the final optimization result is to find the optimal pixel-label distribution L*And appearance model set theta*. At the beginning of the iterative process, an initial pixel-label assignment is defined asRepresenting the pixel-label result obtained by MAP allocation in combination with the confidence of the first pass through fusion analysis, the fusion appearance model set of the initial clothing category label is(set of clothing category logistic regression models), useTrained as training data. The process of defining the iteration is represented by a constant k for the number of iterations performed and E for the adjacent pixel pair, then at the kth iteration its pixel-label assignment isFusion appearance modelFor this, the model for optimizing the pixel-label distribution case is as follows:
wherein:and obtaining a final identification result.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (6)

1. A pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion is characterized by comprising the following steps:
s1, preprocessing an input image in a monitoring scene in an image denoising and image enhancement mode to improve the image quality;
step S2, performing attitude estimation based on a Deep Convolutional Neural Network (DCNN) on the preprocessed input image, defining foreground and background areas in the image, and taking attitude characteristics as one part of fusion characteristics;
s3, extracting fusion features from the foreground region of the image processed in the S2, and performing feature dimensionality reduction through PCA;
step S4, performing a style search on the input image using a common data set, the search including: labeling similar image samples and garment labels of the image samples;
and step S5, inputting the obtained fusion characteristics of different forms into the designed pedestrian clothing attribute identification frame to obtain the final clothing attribute identification result.
2. The pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion as claimed in claim 1, wherein the step S1 is implemented as follows:
step S11, solving the interference caused by motion blur in a monitoring scene through an image denoising method based on blind deconvolution, and obtaining a restored image:
assume an initial restored image f0And a degradation function g0And the fuzzy functions of all parts of the image are the same, and the input image before fuzzy interference is obtained through the following iterative formula:
wherein,representing the degradation function at the kth iteration of the ith round, fi kRepresenting the restored image at the ith iteration, c (x) is the degraded image, i.e. the original input image,performing convolution operation;
step S12, image enhancement is carried out through a multi-scale Retinex algorithm with color recovery, and the color expression of the image under the monitoring scene is enhanced:
defining the incident light as L (x, y), the reflected image of the object as R (x, y), and the reflected image as S (x, y), then there are: s (x, y) ═ L (x, y) · R (x, y), mapping all three components to the log domain, then there is a corresponding log result of
log S(x,y),log L(x,y),log R(x,y)
The above formula can then be converted into:
log S(x,y)=log L(x,y)+log R(x,y)
introducing color channel weights omega in the case of multiscaleiThen, there are:
wherein K represents the number of the center surrounding functions F;
definition CiRepresenting the color recovery factor of one of the three channels to balance the ratio between the three channels, highlighting the relatively dark regions for distortion removal purposes, then the model of the MSRCR can be expressed as:
log RMSRCRi(x,y)=Ci(x,y)log RMSRi(x,y)
and finally mapping the real number domain to a real number domain to obtain a final enhancement result.
3. The pedestrian clothing attribute recognition method based on depth pose estimation and multi-feature fusion of claim 1, wherein in the step S2, the pose estimation based on the depth convolution neural network DCNN is performed on the preprocessed input image in the following specific manner:
step S21, constructing an image model G ═ (V, E) to visually represent a human body model, where V represents a joint point of a human body or a certain part of the body, E is an edge connecting between nodes, E ∈ V × V represents a spatial relationship between adjacent nodes, and K ═ V | represents the number of joints; defining I to represent an image, I to represent the ith node in the image, and l to represent a nodeT represents the mixed spatial relationship of the nodes, and is abstracted and combined by clustering of different pose instances, so that according to the definition in the image model, i belongs to { 1.,. K }, l is definedi∈{1,...,L},ti∈{1,...,T},liPixel coordinate { (x) representing node ii,yi)},tiRepresenting a spatial relationship type set of a node i and adjacent nodes thereof, namely T gesture types at the node;
step S22, from step S21, the appearance model of the human joint part may be represented as:
and is provided with
φ(li,ti|I;θ)=logp(li,ti|I;θ)
Wherein, p (l)i,tiI; theta) is a probability domain obtained by mapping a score result finally calculated by forward propagation Softmax function in DCNN, and the mixed posture type of the node part I in the image I is predicted to be tiAnd the pixel coordinate is at liθ is a parameter of the model;
the inter-joint spatial relationship may be expressed as:
adding standard quadratic variation to the spatial relationship model, with definition < d (l)i-lj)>=[dx dx2dy dy2]TAnd dx ═ xi-xj、dy=yi-yjRepresenting the relative pixel location of node i with respect to node j;
finally, the obtained product is
The formula represents a human body posture estimation model, and the tree structure represents the high efficiency of the acceleration calculation;
s23, carrying out clustering operation on the local image blocks obtained by preprocessing according to the spatial relative position of the central joint and the adjacent joints by a K-means clustering method to obtain a pre-training model;
and step S24, training by using the DCNN, mapping the image to different gesture types through a score function model, adjusting the weight and the parameters through a loss function according to the labeling information to enable the mapped score result to be consistent with the actual category, completing the classification of the gesture types, and obtaining the DCNN multi-classification model.
4. The pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion as claimed in claim 1, wherein the step S3 is implemented as follows:
s31, extracting simple features including color, gradient and texture from the input image, and fusing the simple features into different fusion complex features according to different attribute identification stages;
step S32, obtaining m pieces of n-dimensional data after passing through a feature descriptor, and forming a matrix X with m rows and n columns from the original data according to columns;
step S33, subtracting the average value of each line of X;
step S34, solving a covariance matrix;
step S35, solving the eigenvalue of the covariance matrix and the corresponding eigenvector r;
step S36, arranging the eigenvectors r into a matrix from top to bottom according to the corresponding eigenvalue size, and taking the first n' rows to form a new matrix P;
in step S37, the matrix P is the data after dimension reduction to n'.
5. The pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion as claimed in claim 1, wherein the step S4 is implemented as follows:
step S41, firstly, establishing a KD-tree index tree on a public data set;
s42, selecting fusion characteristics as sample characteristics, comparing the appearance fusion characteristics of the input image and the samples in the data set, and searching a KD-tree by a KNN clustering method in combination with L2-distance;
and step S43, selecting the first 25 results to form a nearest neighbor sample set, and forming candidate labels by the labeled clothing label information of the samples, so as to provide help for the subsequent clothing attribute identification.
6. The pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion as claimed in claim 5, wherein the step S5 is implemented as follows:
step S51, defining the pixel of the image in the sample as i, and the predicted clothing label of the pixel as liThe complex characteristic of the pixel is fi(ii) a Defining a nearest neighbor sample set obtained through nearest neighbor retrieval as D, defining a labeled label set in the sample set as tau (D), and t represents a clothing category label; each analysis is defined with a mixing parameter of lambda ≡ [ lambda ] respectively123]So as to perform final confidence combination;
step S52, fusing global identification based on logistic regression, approximate identification based on nearest neighbor sample, and improving identification accuracy based on the identification result of migration identification of mask conversion:
the functional model of global recognition pixel-tag confidence is as follows:
p denotes a given complex feature fiAnd model parametersThe result of the logistic regression of (1), which indicates that one of the labels t is atProbability value existing in this sample, 1 [. ]]Is an indicator function, which indicates that the label t is a member of the label set of nearest neighbor samples, the model parameterTraining was performed using The fahisonita Dataset as a positive sample;
the functional model for approximate recognition pixel-tag confidence is as follows:
model parametersTraining by using a nearest neighbor sample set D as a training set;
the function model of migration-resolved pixel-label confidence based on mask transformation is:
where j denotes a pixel present in the nearest neighbor sample superpixel block, a parameterIs a model parameter in global parsing because M (l)i,si,d) The mean value of the logistic regression results obtained by carrying out global analysis on the nearest neighbor sample super-pixel block area is shown;
step S53, the single confidence level is not enough to guarantee the accuracy of the label assignment result, therefore, consider the fusion analysis of the three kinds of analyses, Λ ≡ [ λ ≡ b123]Defining weight ratios of three kinds of analyses respectively, and calculating confidence of each clothing label-pixel through a fusion modelDegree:
step S54, combining with the iterative smoothing processing procedure: define the label assignment of all pixels as L ≡ { L ≡ LiThe condition of labeling the clothing label on each pixel in the image and the clothing item appearance modelWhereinIs a fused appearance model of the labels t of the clothing categories, the final optimization result is to find the optimal pixel-label distribution L*And appearance model set theta*(ii) a At the beginning of the iterative process, an initial pixel-label assignment is defined asRepresenting the pixel-label result obtained by MAP allocation in combination with the confidence of the first pass through fusion analysis, the fusion appearance model set of the initial clothing category label isExpressing a set of logistic regression models for clothing categories, useTraining as training data; the process of defining the iteration is represented by a constant k for the number of iterations performed and E for the adjacent pixel pair, then at the kth iteration its pixel-label assignment isFusion appearance modelTo optimizeThe pixel-label assignment case model is:
wherein:and obtaining a final identification result.
CN201910321093.0A 2019-04-19 2019-04-19 Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion Active CN110033007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910321093.0A CN110033007B (en) 2019-04-19 2019-04-19 Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910321093.0A CN110033007B (en) 2019-04-19 2019-04-19 Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion

Publications (2)

Publication Number Publication Date
CN110033007A true CN110033007A (en) 2019-07-19
CN110033007B CN110033007B (en) 2022-08-09

Family

ID=67239540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910321093.0A Active CN110033007B (en) 2019-04-19 2019-04-19 Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion

Country Status (1)

Country Link
CN (1) CN110033007B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555393A (en) * 2019-08-16 2019-12-10 北京慧辰资道资讯股份有限公司 method and device for analyzing pedestrian wearing characteristics from video data
CN111126153A (en) * 2019-11-25 2020-05-08 北京锐安科技有限公司 Safety monitoring method, system, server and storage medium based on deep learning
CN111142671A (en) * 2019-12-27 2020-05-12 江西服装学院 Intelligent garment with intelligent interaction system
CN111368637A (en) * 2020-02-10 2020-07-03 南京师范大学 Multi-mask convolution neural network-based object recognition method for transfer robot
CN113191443A (en) * 2021-05-14 2021-07-30 清华大学深圳国际研究生院 Clothing classification and attribute identification method based on feature enhancement
CN113420173A (en) * 2021-06-22 2021-09-21 桂林电子科技大学 Minority dress image retrieval method based on quadruple deep learning
WO2021233051A1 (en) * 2020-05-21 2021-11-25 华为技术有限公司 Interference prompting method and device
CN114117040A (en) * 2021-11-08 2022-03-01 重庆邮电大学 Text data multi-label classification method based on label specific features and relevance
WO2022047662A1 (en) * 2020-09-02 2022-03-10 Intel Corporation Method and system of neural network object recognition for warpable jerseys with multiple attributes
CN116030418A (en) * 2023-02-14 2023-04-28 北京建工集团有限责任公司 Automobile lifting line state monitoring system and method
CN116206369A (en) * 2023-04-26 2023-06-02 北京科技大学 WMSD risk real-time monitoring method and device based on data fusion and machine vision

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090190798A1 (en) * 2008-01-25 2009-07-30 Sungkyunkwan University Foundation For Corporate Collaboration System and method for real-time object recognition and pose estimation using in-situ monitoring
CN104134076A (en) * 2014-07-10 2014-11-05 杭州电子科技大学 SAR image target recognition method based on CS and SVM decision fusion
CN105678321A (en) * 2015-12-31 2016-06-15 北京工业大学 Human body posture estimation method based on fusion model
WO2016110005A1 (en) * 2015-01-07 2016-07-14 深圳市唯特视科技有限公司 Gray level and depth information based multi-layer fusion multi-modal face recognition device and method
CN108537136A (en) * 2018-03-19 2018-09-14 复旦大学 The pedestrian's recognition methods again generated based on posture normalized image
CN109035329A (en) * 2018-08-03 2018-12-18 厦门大学 Camera Attitude estimation optimization method based on depth characteristic

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090190798A1 (en) * 2008-01-25 2009-07-30 Sungkyunkwan University Foundation For Corporate Collaboration System and method for real-time object recognition and pose estimation using in-situ monitoring
CN104134076A (en) * 2014-07-10 2014-11-05 杭州电子科技大学 SAR image target recognition method based on CS and SVM decision fusion
WO2016110005A1 (en) * 2015-01-07 2016-07-14 深圳市唯特视科技有限公司 Gray level and depth information based multi-layer fusion multi-modal face recognition device and method
CN105678321A (en) * 2015-12-31 2016-06-15 北京工业大学 Human body posture estimation method based on fusion model
CN108537136A (en) * 2018-03-19 2018-09-14 复旦大学 The pedestrian's recognition methods again generated based on posture normalized image
CN109035329A (en) * 2018-08-03 2018-12-18 厦门大学 Camera Attitude estimation optimization method based on depth characteristic

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DANGWEI LI ET AL.: "Pose Guided Deep Model for Pedestrian Attribute Recognition in Surveillance Scenarios", 《2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *
MELTEM DEMIRKUS ET AL.: "Hierarchical Spatio-Temporal Probabilistic Graphical Model with Multiple Feature Fusion for Binary Facial Attribute Classification in Real-World Face Videos", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 》 *
石祥滨 等: "基于多特征融合的动作识别方法", 《沈阳航空航天大学学报》 *
雷庆 等: "复杂场景下的人体行为识别研究新进展", 《计算机科学》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555393A (en) * 2019-08-16 2019-12-10 北京慧辰资道资讯股份有限公司 method and device for analyzing pedestrian wearing characteristics from video data
CN111126153A (en) * 2019-11-25 2020-05-08 北京锐安科技有限公司 Safety monitoring method, system, server and storage medium based on deep learning
CN111142671A (en) * 2019-12-27 2020-05-12 江西服装学院 Intelligent garment with intelligent interaction system
CN111368637A (en) * 2020-02-10 2020-07-03 南京师范大学 Multi-mask convolution neural network-based object recognition method for transfer robot
CN111368637B (en) * 2020-02-10 2023-08-11 南京师范大学 Transfer robot target identification method based on multi-mask convolutional neural network
WO2021233051A1 (en) * 2020-05-21 2021-11-25 华为技术有限公司 Interference prompting method and device
WO2022047662A1 (en) * 2020-09-02 2022-03-10 Intel Corporation Method and system of neural network object recognition for warpable jerseys with multiple attributes
CN113191443B (en) * 2021-05-14 2023-06-13 清华大学深圳国际研究生院 Clothing classification and attribute identification method based on feature enhancement
CN113191443A (en) * 2021-05-14 2021-07-30 清华大学深圳国际研究生院 Clothing classification and attribute identification method based on feature enhancement
CN113420173A (en) * 2021-06-22 2021-09-21 桂林电子科技大学 Minority dress image retrieval method based on quadruple deep learning
CN114117040A (en) * 2021-11-08 2022-03-01 重庆邮电大学 Text data multi-label classification method based on label specific features and relevance
CN116030418A (en) * 2023-02-14 2023-04-28 北京建工集团有限责任公司 Automobile lifting line state monitoring system and method
CN116030418B (en) * 2023-02-14 2023-09-12 北京建工集团有限责任公司 Automobile lifting line state monitoring system and method
CN116206369A (en) * 2023-04-26 2023-06-02 北京科技大学 WMSD risk real-time monitoring method and device based on data fusion and machine vision
CN116206369B (en) * 2023-04-26 2023-06-27 北京科技大学 Human body posture data acquisition method and device based on data fusion and machine vision

Also Published As

Publication number Publication date
CN110033007B (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN110033007B (en) Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion
Lian et al. Road extraction methods in high-resolution remote sensing images: A comprehensive review
CN109325952B (en) Fashionable garment image segmentation method based on deep learning
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
Zhang et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation
CN110458077B (en) Vehicle color identification method and system
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
CN108288051B (en) Pedestrian re-recognition model training method and device, electronic equipment and storage medium
CN108564012B (en) Pedestrian analysis method based on human body feature distribution
CN110796026A (en) Pedestrian re-identification method based on global feature stitching
CN106127197B (en) Image saliency target detection method and device based on saliency label sorting
CN110060273B (en) Remote sensing image landslide mapping method based on deep neural network
CN109509191A (en) A kind of saliency object detection method and system
CN108985298B (en) Human body clothing segmentation method based on semantic consistency
CN110298248A (en) A kind of multi-object tracking method and system based on semantic segmentation
CN111062928A (en) Method for identifying lesion in medical CT image
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
CN109840518B (en) Visual tracking method combining classification and domain adaptation
Hu et al. Hypergraph video pedestrian re-identification based on posture structure relationship and action constraints
Jia et al. Saliency detection via a unified generative and discriminative model
CN111091129A (en) Image salient region extraction method based on multi-color characteristic manifold sorting
Kong et al. Detection model based on improved faster-RCNN in apple orchard environment
Li et al. Arbitrary body segmentation in static images
Khan et al. Image segmentation via multi dimensional color transform and consensus based region merging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant