CN106529441B

CN106529441B - Depth motion figure Human bodys' response method based on smeared out boundary fragment

Info

Publication number: CN106529441B
Application number: CN201610949051.8A
Authority: CN
Inventors: 蒋敏; 金科; 孔军; 昝宝锋; 胡珂杰; 徐海洋; 刘天山
Original assignee: Jiangnan University
Current assignee: Huirong Electronic System Engineering Ltd
Priority date: 2016-10-26
Filing date: 2016-10-26
Publication date: 2019-04-05
Anticipated expiration: 2036-10-26
Also published as: CN106529441A

Abstract

The depth motion figure Human bodys' response method based on smeared out boundary fragment that the invention discloses a kind of.Model training method is the following steps are included: determine the smeared out boundary of fragment by video depth map sequence fragment and according to fuzzy parameter α；The depth motion figure DMM of their main views, left view and top view is calculated separately for the subsequence after each fragment；These depth motion figures are converted into fixed size and normalization using interpolation method；The depth motion figure DMM of each video sequence subsequence after normalization is connected, the feature vector of the video sequence is obtained；Being cooperated using probability indicates that classifier R-ProCRC classifies to feature, finally realizes Human bodys' response.Human bodys' response method disclosed by the invention, effectively captures the changing rule of temporal signatures, enhances behavioural characteristic to the anti-interference ability of time domain difference, can be realized the robust control policy to human body behavior.

Description

Depth motion figure Human bodys' response method based on smeared out boundary fragment

Technical field:

The invention belongs to field of machine vision, in particular to a kind of depth motion figure human body row based on smeared out boundary fragment For recognition methods.

Background technique:

Human bodys' response technology is the video sequence by handling human body behavior, therefrom extracts behavioural characteristic, and pass through A kind of technology of these feature identification maneuvers.

In machine vision and area of pattern recognition, Human bodys' response has become a very active branch now. This technology has many potential applications, including video analysis, monitoring system, intelligent robot etc. in field of human-computer interaction. Several years ago, the research in Human bodys' response direction is based primarily upon this number of image frame sequence [1] [2] of colour imagery shot acquisition According to there is intrinsic defect, they for illumination, block and complicated background is very sensitive, influence accuracy of identification.Nowadays deep Degree camera has obtained extensive concern, and this sensor is capable of providing the 3D structural information of object, the shape including human body with And action message.The characterization method of depth is extracted at present mainly there are two branch: the method [3] of the 3D skeleton point based on estimation and Method based on original depth image.But this artis estimation is it sometimes appear that mistake, hides especially for human body parts The situation of gear or background complexity.Depth motion figure (DMM, the Depth that C.Chen etc. [4] passes through calculating original depth image Motion Maps), and by having obtained relatively good result using pixel as feature after PCA dimensionality reduction.But this method All video frames are accumulated on a DMM, the temporal information of movement is lost.

The present invention above-mentioned Human bodys' response there are aiming at the problem that, propose a kind of depth based on smeared out boundary fragment The Human bodys' response method of action diagram has not only drawn the advantages of original depth image information is relative to bone information, and Fragment is carried out to DMM using the technology of smeared out boundary fragment, the temporal information of movement is extracted, to illumination, blocks, complex background Etc. factors have higher robustness.

Summary of the invention:

The main object of the present invention is to propose a kind of depth motion figure Human bodys' response side based on smeared out boundary fragment Method, the temporal information of capture movement simultaneously indicate classifier R-ProCRC (Robust using the cooperation of the probability of robust Probabilistic Collaborative Representation based Classifier) [5] classify, mention High accuracy of identification.

To achieve the goals above, the present invention provides the following technical solutions, and includes training stage and test phase.

The Human bodys' response method training stage technical solution of depth motion figure based on smeared out boundary fragment is as follows:

Step 1: the training set of the depth map sample of given human body behavior video sequence Wherein X^(k)Indicate the depth map sequence of k-th of training sample, For the original depth image of the i-th frame in k-th of sample, N_kFor the totalframes of k-th of sample；Y^(k)Indicate k-th of training sample institute Category behavior classification；M indicates number of samples in training set；

Step 2: by video sequence training sample X each in training set^(k)Temporally axis be directly divided into DIV it is isometric when Between piece, each time leaf length isTimeslice after division is expressed asWherein

Step 3: selecting suitable fuzzy parameter α, carries out fragment Fuzzy Processing to each timeslice；Fuzzy Time piece table It is shown asTo avoid subscript from overflowing, for first A timeslice do not do before to Fuzzy Processing, to Fuzzy Processing after not done to the last one timeslice；

Step 4: each Fuzzy Time piece is calculatedIn the depth motion figure of three different projecting directionsIts Middle v ∈ { f, s, t } respectively indicates the three-view diagram (three directions) of video sequence projection, main view, left view and top view；Extremely This, calculates and obtains all training samplesCorresponding depth motion set of graphs

Step 5: step 4 is solved to the depth motion figure obtained using bicubic differential techniqueIt is adjusted to an identical size, and these depth motion figures are normalized Between 0 to 1, after normalizationIt is denoted as

Step 6: by any training sample X^(k)Corresponding depth motion set of graphsInto Row vector, and the sequence action diagram after all vectorizations is connected, complete training sample X^(k)Corresponding feature construction, the spy Sign is denoted as H^(k), then the characteristic set of all samples is expressed as { H^(k)}_k∈[1M]；

Step 7: all samples that step 6 is obtainedOutput feature { H^(k)}_k∈[1M]Pass through PCA Dimensionality reduction, and save the feature after all dimensionality reductions

The Human bodys' response method testing stage technological scheme of depth motion figure based on smeared out boundary fragment is as follows:

Step 1: the test sample (TestX, TestY) of the depth map sample of a given human body behavior video sequence, Middle TestX indicates the depth map sequence of test sample,X_iFor the original depth of the i-th frame in test sample Spend image, N_TFor the totalframes of the test sample；TestY indicates the affiliated behavior classification of test sample；

Step 2: by test sample TestX, temporally axis is directly divided into DIV isometric timeslice (sliced fashion and instructions It is identical to practice the stage), each time leaf length isTimeslice after division is expressed asWherein

Step 3: according to the fuzzy parameter α that the training stage uses, fragment Fuzzy Processing is carried out to each timeslice；It is fuzzy Timeslice is expressed asTo avoid subscript from overflowing Out, to Fuzzy Processing before not done for first timeslice, to Fuzzy Processing after not done to the last one timeslice；

Step 4: each Fuzzy Time piece is calculatedIn the depth motion figure DMM of three different projecting directions_j,v, wherein v ∈ { f, s, t } respectively indicates the three-view diagram (three directions) of video sequence projection, main view, left view and top view；So far, it counts It calculates and obtains the corresponding depth motion set of graphs { DMM of test sample TestX_j,v}_{j∈[1DIV],v∈[f,s,t]}；

Step 5: step 4 is solved to the test sample depth motion figure obtained using bicubic differential technique {DMM_j,v}_{j∈[1DIV],v∈[f,s,t]}It is adjusted to training stage same size, and according to training stage method for normalizing, by these depths Degree action diagram is normalized between 0 to 1, the DMM after normalization_j,vIt is denoted as

Step 6: by the corresponding depth motion set of graphs of test sample TestXInto Row vector, and the sequence action diagram after all vectorizations is connected, the corresponding feature construction of test sample TestX is completed, it should Feature is denoted as H_T；

Step 7: the output feature H for the test sample TestX that step 6 is obtained_TBy PCA dimensionality reduction, after obtaining dimensionality reduction FeaturePCA dimension reduction method is consistent with the training stage；

Step 8: and then by the output feature after dimensionality reductionIt is sent into R-ProCRC [5] classifier, obtains classification output PridY；

Step 9: comparing PridY and TestY, if PridY=TestY, identification is correct, otherwise identifies mistake.

Compared with prior art, the invention has the following advantages:

1, this method carries out Human bodys' response using depth data, compared to traditional color video data, depth Data can be realized the efficient segmentation to human body, while save the shape and structure feature of human body, help to improve classification essence Degree；

2, traditional feature extracting method using depth motion figure DMM projects to entire video frame on one DMM figure, Drop-out time information.The depth motion figure Human bodys' response method for the smeared out boundary fragment that this method proposes, by depth map Sequence carries out the fragment of time dimension, effectively captures the changing rule of temporal signatures；

3, human body behavior time domain otherness, the human body row of the depth motion figure for the smeared out boundary fragment that this method proposes are directed to For recognition methods, the boundary between fragment is controlled using fuzzy parameter α, so that adjacent burst information is shared, further Feature is improved to the robustness of human body behavior time domain difference, is significantly mentioned with the use of R-ProCRC classifier accuracy of identification It is high.

Detailed description of the invention:

The Human bodys' response feature construction method flow diagram of depth motion figure of the Fig. 1 based on smeared out boundary fragment；

Fig. 2 smeared out boundary stripping strategy schematic diagram；

Projection instance figure of Fig. 3 depth motion figure under three visual angles；

Specific embodiment

Purpose, specific steps and feature in order to better illustrate the present invention, with reference to the accompanying drawing, with MSR For Action3D data set, the present invention is described in further detail:

The Human bodys' response method of the depth motion figure of smeared out boundary fragment proposed by the present invention, wherein feature extraction Flow chart is as shown in Figure 1.The equivalent fragment on boundary is determined to sample first, the mould on boundary is then determined according to parameter alpha Paste degree calculates the video sequence after each fragment its depth motion figure DMM, will using bicubic differential technique The DMM of all samples is fixed to an identical size, and normalizes, and obtains the feature of subsequence after vectorization of connecting, completes The building of the output feature of training sample.

A kind of Human bodys' response method of depth motion figure based on smeared out boundary fragment proposed by the present invention includes instruction Practice stage and test phase.

Step 3: selecting suitable fuzzy parameter α, carries out fragment Fuzzy Processing to each timeslice；Fuzzy Time piece table It is shown asTo avoid subscript from overflowing, for One timeslice do not do before to Fuzzy Processing, to Fuzzy Processing after not done to the last one timeslice；

In above-mentioned technical proposal, training stage step 2 is carried out video sequence in the method for isometric timeslice division, point The selection of the piece number DIV determines optimal strip number according to specific human body behavior sample data set, with MSR Action3D data For collection, DIV=3.

In above-mentioned technical proposal, training stage step 2 is carried out video sequence in the method for isometric timeslice division, often A time leaf length isTime leaf length is using rounding mode downwards；If the last one timeslice curtailment Then chosen by physical length；

In above-mentioned technical proposal, training stage step 3 is in the fragment Fuzzy Processing of timeslice, the selection of fuzzy parameter α Optimized parameter is determined according to specific human body behavior sample data set, by taking MSR Action3D data set as an example, α=0.8.

In above-mentioned technical proposal, training stage step 3 is in the fragment Fuzzy Processing of timeslice, to the of each sample One timeslice is not done preceding to Fuzzy Processing；To Fuzzy Processing after not done to the last one timeslice of each sample.

In above-mentioned technical proposal, training stage step 2 and step 3 are completed jointly at the fuzzy fragment of video sequence Reason, as shown in Fig. 2, fuzzy parameter α controls the boundary between fragment, so that adjacent burst information is shared, further mentions High robustness of the feature to human body behavior time domain difference.

In above-mentioned technical proposal, training stage step 4 is to each Fuzzy Time pieceIn three different projecting directions Depth motion figureCalculating using video frame absolute difference superposition method, specifically:

The projection that three visual directions are done to the video frame in same timeslice obtains main view, the left view of each video Figure and top view then subtract each other the same visual angle projection of consecutive frame, subtract each other rear absolute value and be overlapped, the track acted in this way Just it has been saved, specific formula is as follows:

WhereinWithWhen respectively indicating first timeslice in k-th of sample with the last one Between piece, v ∈ { f, s, t } respectively indicates the three-view diagram (three directions) of video sequence projection, i.e. main view, left view and vertical view Figure；Indicate the v direction projection of the i-th frame depth map in k-th of sample；J ∈ (2 ..., DIV-1), α ∈ (0,1)；Such as Fig. 3 institute Show, the DMM figure of each projecting direction effectively saves the single fuzzy fragment of human body behavior sequence in the track of three projecting directions Information.

In above-mentioned technical proposal, training stage step 5 uses bicubic differential technique by depth motion figureAdjustment Size for same size, main view, left view and top view that this patent uses is respectively defined as: 50 × 25,50 × 40,40 ×20；In actual implementation, it can choose the scaling that different interpolation methods realizes depth motion figure size, select premise for as far as possible Reduce the loss of image information.

In above-mentioned technical proposal, step 6 is calculated in training stage step 7 all samples Output feature { H^(k)}_k∈[1M]By PCA dimensionality reduction, intrinsic dimensionality after specific dimensionality reduction can depending on the number of training sample, In this patent embodiment, if the number of training sample is M, final intrinsic dimensionality is (M-20) × 1.

In above-mentioned technical proposal, latent structure method and parameter and training rank that test phase step 2 to step 6 uses Duan Xiangtong.

In above-mentioned technical proposal, test phase step 7 adopts the test sample TestX's that step 6 is calculated in PCA Export feature H_TDimensionality reduction, it is training stage number of samples that the dimension after dimensionality reduction, which is (M-20) × 1, M,.

Row in above-mentioned technical proposal, after the dimensionality reduction that test phase step 8 uses R-ProCRC classification to obtain step 7 It is characterizedClassify method particularly includes:

(1) optimized parameter is calculated

WhereinIt is the training characteristics after dimensionality reduction, H_TFor test sample feature,It is belong to classification c (c ∈ C) all defeated Enter the set of feature vector, ‖ C ‖ is total classification number, and λ and γ are the parameters between 0 to 1； Wherein's Building method are as follows: first willIt is initialized as and dictionary0 matrix of identical size then willIt is assigned toIn, position For?In relative position, can be obtainedSuch asValue are as follows: It is one The weight matrix of diagonalization:

Here,Indicate all elements of the i-th row,Indicate i-th of value of test sample feature vector；

(2) estimation test sample exports featureBelong to the probability of classification c

Test sample exports featureThe Probabilistic estimation for belonging to classification c is as follows:

For all samplesIt is all identical, therefore above formula can simplify are as follows:

It is hereby achieved that featureAffiliated classification.

To verify effectiveness of the invention, the present invention is in famous human body behavior depth information database MSR It is successively tested on Action3D, Action Pair.Table 1 gives the spy of two kinds of human body behavior depth information databases Property.

Table 1: depth data Sink Characteristics description

As shown in table 2, MSR Action3D database is divided into three fixed subsets by us in an experiment,.Every height Collection all uses 3 kinds of experiment methods, and Test One is using the demonstration behavior of the first time of each subject as training set, remaining work For test set.Test Two will demonstrate twice behavior as training set before each subject, remaining to be used as test set. For Cross Test using all video sequences of 1,3,5,7,9 these subject as training set, remaining is used as test.Experiment It the results are shown in Table 3, it is seen that Activity recognition precision of the invention is in most cases better than traditional DMM method.

Table 2:MSR Action3D database subset

The comparison of table 3:MSR Action3D database subset

Table 4 show discrimination of the present invention on Action Pair database and with the comparison of DMM.Due to Action Pair behavior database is all the behavior opposite there are a large amount of action sequences, such as " picking up " and " putting down ", " standing up " " sitting down " etc., thus it is very sensitive to temporal information.Traditional DMM only has 50.6% discrimination, and identification of the invention Rate has reached 97.2%.

Table 4:Action Pair algorithms of different discrimination

It is deep compared to traditional color video data since the present invention carries out Human bodys' response using depth data Degree saves the shape and structure feature of human body according to can complete to fast accurate human body separation, is conducive to improve precision, make simultaneously With the processing mode of depth motion figure DMM, there is better robust relative to the skeleton point obtained based on the estimation of bone tracking technique Property；Traditional feature extracting method using depth motion figure DMM projects to entire video frame on one DMM figure, when loss Between information；The Human bodys' response of the depth motion figure for the smeared out boundary fragment that this method proposes, existing DMM is divided Piece, and the boundary between fragment is controlled using fuzzy parameter α, so that adjacent burst information is shared, but also DMM can Better capture time information, brilliant recognition accuracy can be obtained with the use of R-ProCRC [5] classifier.

A specific embodiment of the invention is elaborated above in conjunction with attached drawing, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept It puts and makes a variety of changes.

Bibliography

[1].Bian W,Tao D,Rui Y.Cross-domain human action recognition.[J].IEEE Transactions on Systems Man&Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man&Cybernetics Society,2012,42(2):298-307.

[2].Niebles J C,Wang H,Li F F.Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words[J].International Journal of Computer Vision,2008,79(3):299-318.

[3].Wang J,Liu Z,Wu Y,et al.Mining actionlet ensemble for action recognition with depth cameras[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2012:1290-1297.

[4].Chen C,Liu K,Kehtarnavaz N.Real-time human action recognition based on depth motion maps[J].Journal of Real-Time Image Processing,2013:1-9.

[5].Sijia C,Lei Z,et al.A Probabilistic Collaborative Representation based Approach for Pattern Classification.IEEE Trans.on Pattern Analysis and Machine Intelligence,2016.

[6].Xu H,Chen E,Liang C,et al.Spatio-Temporal Pyramid Model based on depth maps for action recognition[C]//Mmsp2015 IEEE,International Workshop on Multimedia Signal Processing.2015.

Claims

1. the Human bodys' response method of the depth motion figure based on smeared out boundary fragment, which is characterized in that including the training stage And test phase；

Training stage includes the following steps:

Step 2: by video sequence training sample X each in training set^(k)Temporally axis is directly divided into DIV equal long-time Piece, each time leaf length areTimeslice after division is expressed asWherein

Step 3: selecting suitable fuzzy parameter α, fragment Fuzzy Processing is carried out to each timeslice；Fuzzy Time piece is expressed asTo avoid subscript from overflowing, for first Timeslice do not do before to Fuzzy Processing, to Fuzzy Processing after not done to the last one timeslice；

Step 4: calculating each Fuzzy Time pieceIn the depth motion figure of three different projecting directionsWherein v ∈ [f, s, t] respectively indicates the three-view diagram of video sequence projection, i.e. three directions, main view, left view and top view；So far, It calculates and obtains all training samplesCorresponding depth motion set of graphs

Step 5: step 4 to be solved to the depth motion figure obtained using bicubic differential techniqueIt is adjusted to an identical size, and these depth motion figures are normalized Between 0 to 1, after normalizationIt is denoted as

Step 6: by any training sample X^(k)Corresponding depth motion set of graphsCarry out to Quantization, and the sequence action diagram after all vectorizations is connected, complete training sample X^(k)Corresponding feature construction, this feature note For H^(k), then the characteristic set of all samples is expressed as { H^(k)}_k∈[1M]；

Step 7: all samples that step 6 is obtainedOutput feature { H^(k)}_k∈[1M]It is dropped by PCA Dimension, and save the feature after all dimensionality reductionsTest phase includes the following steps:

Step 1: the test sample (TestX, TestY) of the depth map sample of a given human body behavior video sequence, wherein TestX indicates the depth map sequence of test sample,X_iFor the original depth of the i-th frame in test sample Image, N_TFor the totalframes of the test sample；TestY indicates the affiliated behavior classification of test sample；

Step 2: by test sample TestX, temporally axis is directly divided into DIV isometric timeslices, sliced fashion and training rank Duan Xiangtong, each time leaf length areTimeslice after division is expressed asWherein

Step 3: carrying out fragment Fuzzy Processing to each timeslice according to the fuzzy parameter α that the training stage uses；Fuzzy Time Piece is denoted asTo avoid subscript from overflowing, for First timeslice do not do before to Fuzzy Processing, to Fuzzy Processing after not done to the last one timeslice；

Step 4: calculating each Fuzzy Time pieceIn the depth motion figure DMM of three different projecting directions_{J, v}, wherein v ∈ [f, S, t] respectively indicate the three-view diagram that video sequence projects, i.e. three directions, main view, left view and top view；So far, calculating obtains Obtain the corresponding depth motion set of graphs { DMM of test sample TestX_{J, v}}_{J ∈ [1DIV], v ∈ [f, s, t]}；

Step 5: step 4 to be solved to the test sample depth motion figure obtained using bicubic differential technique {DMM_{J, v}}_{J ∈ [1DIV], v ∈ [f, s, t]}It is adjusted to training stage same size, and according to training stage method for normalizing, by these depths Degree action diagram is normalized between 0 to 1, the DMM after normalization_{J, v}It is denoted as

Step 6: by the corresponding depth motion set of graphs of test sample TestXCarry out vector Change, and the sequence action diagram after all vectorizations is connected, completes the corresponding feature construction of test sample TestX, this feature note For H_T；

Step 7: the output feature H for the test sample TestX that step 6 is obtained_TFeature by PCA dimensionality reduction, after obtaining dimensionality reductionPCA dimension reduction method is consistent with the training stage；

Step 8: then by the output feature after dimensionality reductionIt is sent into R-ProCRC classifier, obtains classification output PridY；

2. Human bodys' response method according to claim 1, which is characterized in that the training stage, step 2 was to video sequence In the method for carrying out isometric timeslice division, each time leaf length isTime leaf length is using rounding mode downwards；Most If the latter timeslice curtailmentThen chosen by physical length.

3. Human bodys' response method according to claim 1, which is characterized in that training stage step 4 is to each fuzzy TimesliceIn the depth motion figure of three different projecting directionsCalculating using video frame absolute difference superposition Method, specifically: the projection that three visual directions are done to the video frame in same timeslice, obtain each video main view, Left view and top view then subtract each other the same visual angle projection of consecutive frame, subtract each other rear absolute value and be overlapped, act in this way Track just has been saved, specific formula is as follows:

WhereinWithRespectively indicate first timeslice in k-th of sample and the last one time Piece, v ∈ [f, s, t] respectively indicate the three-view diagram of video sequence projection, i.e. three directions, i.e. main view, left view and top view；Indicate the v direction projection of the i-th frame depth map in k-th of sample；J ∈ (2 ..., DIV-1), α ∈ (0,1)；Each projecting direction DMM figure effectively save trace information of the single fuzzy fragment in three projecting directions of human body behavior sequence.

4. Human bodys' response method according to claim 1, which is characterized in that test phase step 8 uses R- Behavioural characteristic after the dimensionality reduction that ProCRC classification obtains testing procedure sevenClassify method particularly includes:

(1) optimized parameter is calculated

WhereinIt is the training characteristics after dimensionality reduction, H_TFor test sample feature,It is all inputs spy for belonging to classification c (c ∈ C) The set of vector is levied, | | C | | it is total classification number, λ and y are the parameters between 0 to 1；WhereinConstruction Method are as follows: first willIt is initialized as and dictionary0 matrix of identical size then willIt is assigned toIn, position is ?In relative position, can be obtainedSuch asValue are as follows: It is a diagonalization Weight matrix:

It is hereby achieved that featureAffiliated classification.