CN113065506B - Human body posture recognition method and system - Google Patents
Human body posture recognition method and system Download PDFInfo
- Publication number
- CN113065506B CN113065506B CN202110411237.9A CN202110411237A CN113065506B CN 113065506 B CN113065506 B CN 113065506B CN 202110411237 A CN202110411237 A CN 202110411237A CN 113065506 B CN113065506 B CN 113065506B
- Authority
- CN
- China
- Prior art keywords
- human body
- dimensional
- model
- pixel block
- body part
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 15
- 230000000007 visual effect Effects 0.000 claims abstract description 10
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 229910000831 Steel Inorganic materials 0.000 claims 1
- 239000010959 steel Substances 0.000 claims 1
- 238000004364 calculation method Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 239000003086 colorant Substances 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a human body posture recognition method, which comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and a system for recognizing human body gestures.
Background
3D human body pose estimation refers to estimating the pose of a human target from an image, video or point cloud, and is a fundamental task in 3D research around the human body. The 3D human body posture estimation is an important precondition for 3D human body reconstruction and can also be an important source of motion in human body motion driving. Currently, there are two main ways of obtaining the body posture. 1. By training a specific data set, a neural network is used for realizing the aim of estimating the human body posture under the scene. This method requires a large amount of artificial marker data to train the neural network, and at the same time, the accuracy of the method is low. 2. And establishing a human body model, and enabling the model to fit the human body on the picture. This method relies on the creation of a mannequin, which is relatively complex and slow. Therefore, how to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation is a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a human body posture recognition method and system, so as to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation.
In order to achieve the above object, the present invention provides the following solutions:
the invention provides a human body posture recognition method, which comprises the following steps:
acquiring a plurality of human body images under different visual angles of a current frame;
according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm;
clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
establishing a two-dimensional pixel block model of each pixel block;
optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
and determining the human body posture of the current frame according to the optimized three-dimensional human body model.
Optionally, the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model further includes:
predicting the position coordinates of each body part at the current frame according to the position coordinates of each body part at each frame within the preset frame before the current frame,
and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
Optionally, the building a three-dimensional human body model according to the plurality of human body images by adopting a convolutional neural network algorithm specifically includes:
obtaining K key points from each human body image by adopting a horglass network structure;
respectively enabling k=1, 2, … and K, and calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of a camera to obtain a plurality of back projection rays corresponding to each key point;
determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in a three-dimensional space by taking the coordinates of the key point in the three-dimensional space;
and carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
Optionally, the three-dimensional manikin is:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body.
Optionally, the two-dimensional pixel block model is:
B i (x) A two-dimensional pixel block model representing the ith pixel block, c i Representing the ith pixel blockColor of center, mu i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
Optionally, the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model specifically includes:
calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result;
if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.
Optionally, the calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model, as the objective function value, specifically includes:
calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:
wherein E is ij Jth representing three-dimensional manikinSimilarity of model of human body part and ith pixel block model, d (c) i ,c j ) Color c of model representing jth human body part j Color c with the ith pixel block i Is similar to degree B i (x) Two-dimensional pixel block model representing the ith pixel block, A j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ i Representing the coordinates, delta, of the center of the ith pixel block i Representing the half length of the ith pixel block,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;
w is the penalty value of the color, ε is the color threshold, μ jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Representing parameters of the camera that obtained the i-th pixel block.
And calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.
A human gesture recognition system, the recognition system comprising:
the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;
the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;
the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block;
the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
and the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model.
Optionally, the identification system further comprises:
the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame;
the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
Optionally, the three-dimensional mannequin building module specifically includes:
the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure;
the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point;
the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space;
and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a human body posture recognition method, which comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.
The invention predicts the result of the current frame by utilizing the continuity of human body actions and the result of the previous frames, reduces the iteration times in the optimization process and further improves the speed of human body posture estimation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a human body gesture recognition method provided by the invention;
FIG. 2 is a schematic diagram of a human body posture recognition method provided by the invention;
fig. 3 is a diagram of an arrangement of cameras for acquiring multiple images of a human body at different viewing angles according to the present invention;
FIG. 4 is a three-dimensional manikin provided by the present invention;
FIG. 5 is a diagram of a human body model composed of a plurality of two-dimensional pixel block models provided by the invention;
FIG. 6 is a view of an optimized three-dimensional manikin provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a human body posture recognition method and system, so as to improve the speed of human body posture estimation while ensuring the accuracy of human body posture estimation.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The invention relates to a human body posture estimation method without manually marking data or sensor data. Collecting color images of a target person at different angles in a fixed scene through a plurality of color cameras, obtaining coarse-precision human body key point coordinates by using a pre-trained convolutional neural network, and obtaining human body three-dimensional coordinates by using a back projection technology; and generating a general three-dimensional human body model by using the three-dimensional human body coordinates, calculating the similarity between the three-dimensional human body model and a human body in an image, and optimizing the human body model by combining the consistency of a plurality of visual angles in the three-dimensional world as constraint to obtain the final human body posture. The calculation process is performed on the CUDA, parameters of the human body model are constrained, calculation can be completed in constant time, 25 frames of images are processed per second, and videos can be processed in real time. The invention only needs color images as input, does not need manual operation or extra sensor equipment, and can be widely applied in the field of human body gesture acquisition.
As shown in fig. 1 and 2, the present invention provides a human body posture recognition method, which includes the steps of:
step 101, acquiring a plurality of human body images under different visual angles of a current frame.
A plurality of color cameras are used, located at different locations within the room, to obtain a sequence of successive, synchronized color images of the human body from multiple perspectives. All color cameras are controlled by a sync box.
The color cameras are circumferentially distributed to obtain human body information at different angles, and a typical eight-camera array is shown in fig. 3. The synchronous box sends out square wave signals, and when the camera receives the signals, the signals are shot at the same time, so that the human body information at the same moment is collected. The color information of the human body clothes is enriched as much as possible, so that the human body and the background object can be distinguished, and meanwhile, different parts of the human body can be identified more easily. The background information should be as simple as possible and the resulting body posture more robust.
And 102, building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images.
And according to the joint point coordinates output by the first frame neural network, back-projecting to obtain three-dimensional human body coordinates, and further generating a rough-precision gesture conforming to the human body in the image. The body model should describe the information of the fat, thin, height, characteristic color, etc. of the human body as much as possible. In a body model of a three-dimensional set of gaussian functions, the mean and variance of each gaussian function are used for description. In order to avoid overfitting caused by high degrees of freedom, L is used for describing the length of the human body trunk, R is used for describing the width of the human body trunk, and then the mean value and the variance of each Gaussian function are obtained through calculation, so that the degrees of freedom can be greatly reduced, and a better generalization effect is obtained. The present invention adjusts this model to generate a actor-specific body model that generally represents the shape and color of each gaussian.
This step uses the neural network horglass to extract 16 human keypoints (x) from each photograph for 8 photographs input at different perspectives i ,y i ). For 8 views of the same key point, based on camera parameters, 8 rays of the back projection are calculated. Using a least square method, a common point (x i ,y i ,z i ) The common point is the coordinates of the key point in three-dimensional space. Three-dimensional coordinates (x) i ,y i ,z i ) By linear interpolation, a three-dimensional mannequin having three-dimensional coordinates of 63 human body parts is obtained:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body.
The method comprises the following specific steps:
and obtaining K key points from each human body image by adopting a horglass network structure.
Let k=1, 2, …, K respectively, calculate the back projection ray in the human body image of the kth key point under each view angle based on the parameters of the camera, obtain a plurality of back projection rays corresponding to each key point.
And determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in the three-dimensional space by taking the coordinates of the common point as the coordinates of the key point in the three-dimensional space.
And (4) performing linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body, as shown in fig. 4.
And step 103, clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks.
For each view angle picture, firstly, according to the optimized three-dimensional human body model of the previous frame, the interested human body area is obtained through projection. Only the human body area is extracted, and irrelevant background can be removed. For neighboring similar pixel clusters, a color block (pixel block) is generated, the result is shown in fig. 5. Each color block is approximated using a two-dimensional gaussian function. During clustering, the present invention uses some particular threshold to determine which pixels are clustered together. The pictures are represented by the picture model of the two-dimensional Gaussian function set, so that compared with the method for performing similarity matching by directly using pixels of the pictures, a large amount of computing power can be saved, and the overall experiment speed is greatly improved.
And 102, establishing a two-dimensional pixel block model of each pixel block.
The two-dimensional pixel block model for each pixel block is:
B i (x) A two-dimensional pixel block model representing the ith pixel block, c i Color, μ representing the center of the ith pixel block i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
The entire image is divided into a plurality of pixel blocks, and thus, the entire image can be expressed as:
Im(x)=∑c i ·B i (x)
and 105, optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model.
Projection of the mannequin to each viewing angle:
wherein mu jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Representing parameters of the camera that obtained the i-th pixel block. By the formula, the three-dimensional human body model of the three-dimensional Gaussian function set is projected to two dimensions, and the similarity can be calculated with the two-dimensional pixel block model of the two-dimensional Gaussian function set. The approximate value of the real projection is calculated, and the real value is an ellipsoid, but the error introduced by the approximate value is negligible. The similarity of the three-dimensional mannequin and the image model can be expressed as:
wherein d (c) i ,c j ) For the closeness of two colors:
w is a penalty for colors that differ too much, ε is a threshold to determine if colors are close. Generally, epsilon=0.1 and w=0.05, and good results can be obtained. The RGB color space is closely related to the illumination intensity, and better results can be obtained using other color spaces, such as Lab color space. For similarity E ij Deriving, along the gradient direction, E ij And (3) enlarging:
E ijk+1 =E ijk +ρ k s (k)
wherein s is k Representing the gradient direction ρ k Representing the search step in the gradient direction. After a certain number of iterations, when E ij After approaching to the constant, recording the parameters of the human body model at the moment as the current human body posture. Wherein the step size is dynamic. An initial step size is determined according to the statistical rules of a plurality of videos. In the optimization process, when two iterative calculations are performedWhen the derivative symbols of (c) are identical, meaning that the current pose has not yet reached the optimal point, we expand the step size by a factor of 1.1. When the derivative symbols of the two iterative calculations are not identical, meaning that the current pose skips the optimal point, we reduce the step size to 0.5 (and do not need to be modified to-0.5 because the derivative symbols themselves have changed and the direction of optimization has changed). Considering that the human body motion is continuous, the result of the next frame can be predicted from the results of the previous frames.
pose i+1 =t 1 *pose i +t 2 *pose i-1 +t 3 *pose i-2
Where phase is the pose result of the different frames and t is the weight of each frame result used in prediction. And predicting the result of the gesture of the current frame by using the historical gesture, replacing the result of the previous frame directly used as the initial gesture of the current frame, and enabling the result to be closer to a picture model of a two-dimensional Gaussian function set of the current frame so as to reduce the optimized iteration times and obtain the acceleration of the whole experiment.
Step 105, optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model, which specifically includes: calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value; judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result; if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value; if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.
The calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model, as the objective function value, specifically includes: calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:
wherein d (c) i ,c j ) Color c of model representing jth human body part j Color c with the ith pixel block i Is similar to degree B i (x) Two-dimensional pixel block model representing the ith pixel block, A j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ i Representing the coordinates, delta, of the center of the ith pixel block i Representing the half length of the ith pixel block,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;
w is the penalty value of the color, ε is the color threshold, μ jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Representing a camera obtaining an ith pixel blockParameters.
And calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.
Each pixel block in fig. 5 calculates a color similarity. And then optimizing parameters of the three-dimensional human body model along the gradient direction to ensure that the similarity is larger. When the similarity is stable, the three-dimensional body model is projected onto a picture, and the result is shown in fig. 6. The current three-dimensional human body model parameters are the current human body posture.
And 106, determining the human body posture of the current frame according to the optimized three-dimensional human body model.
In order to reduce the number of iterations in the optimization process, in step 105, the optimizing the three-dimensional mannequin according to the two-dimensional pixel block model of each pixel block, to obtain an optimized three-dimensional mannequin, further includes: predicting the position coordinates of each human body part in the current frame according to the position coordinates of each human body part in each frame in the preset frame before the current frame; and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
The invention carries out picture preprocessing on the CPU to obtain mathematical representation of the picture. The picture and the human model parameters are transferred into CUDA (compute unified device architecture, parallel computing framework) and computed. For each picture, the same number of pixel blocks is used. Each CUDA kernel calculates the similarity between each pixel block and a part of the human body, and the parameter of the human body model is fixed, so each calculation is a constant time.
The invention also provides a human body gesture recognition system, which comprises:
the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;
the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;
the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
and the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block.
The three-dimensional human body model building module specifically comprises: the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure; the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point; the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space; and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
And the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model.
And the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model.
The identification system further comprises: the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame; the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a human body posture recognition method and a system, wherein the recognition method comprises the following steps: acquiring a plurality of human body images under different visual angles of a current frame; according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm; clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks; establishing a two-dimensional pixel block model of each pixel block; optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model; and determining the human body posture of the current frame according to the optimized three-dimensional human body model. According to the invention, a three-dimensional human body model is simply and rapidly built by adopting a convolutional neural network algorithm, then the three-dimensional human body model is optimized by utilizing the human body gesture in the image, and gesture recognition is performed by utilizing the optimized three-dimensional human body model, so that the speed of estimating the human body gesture is improved while the accuracy of estimating the human body gesture is ensured.
The invention predicts the result of the current frame by utilizing the continuity of human body actions and the result of the previous frames, reduces the iteration times in the optimization process and further improves the speed of human body posture estimation.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (8)
1. A human body posture recognition method, characterized in that the recognition method comprises the steps of:
acquiring a plurality of human body images under different visual angles of a current frame;
according to a plurality of human body images, a three-dimensional human body model is established by adopting a convolutional neural network algorithm;
clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
establishing a two-dimensional pixel block model of each pixel block;
optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
determining the human body posture of the current frame according to the optimized three-dimensional human body model;
the three-dimensional human body model is as follows:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body;
the two-dimensional pixel block model is as follows:
wherein B is i (x) A two-dimensional pixel block model representing the ith pixel block, c i Color, μ representing the center of the ith pixel block i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
2. The method according to claim 1, wherein the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model further comprises:
predicting the position coordinates of each human body part in the current frame according to the position coordinates of each human body part in each frame in the preset frame before the current frame;
and pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
3. The human body posture recognition method of claim 1, wherein the building a three-dimensional human body model according to a plurality of human body images by adopting a convolutional neural network algorithm specifically comprises:
obtaining K key points from each human body image by adopting a horglass network structure;
respectively enabling k=1, 2, … and K, and calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of a camera to obtain a plurality of back projection rays corresponding to each key point;
determining the coordinates of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and obtaining the coordinates of each key point in a three-dimensional space by taking the coordinates of the key point in the three-dimensional space;
and carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
4. The human body posture recognition method according to claim 1, wherein the optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model specifically comprises:
calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
judging whether the objective function value is larger than a preset threshold value or not to obtain a judging result;
if the judging result shows that the model of the jth human body part in the three-dimensional human body model is not the model, optimizing the model of the jth human body part in the three-dimensional human body model by adopting a gradient descent method, and returning to the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model as an objective function value;
if the judgment result shows that the model is positive, the value of j is increased by 1, and the step of calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model is returned to serve as an objective function value, and the model of the next human body part of the three-dimensional human body model is optimized until the model of each human body part in the three-dimensional human body model is optimized.
5. The human body posture identifying method according to claim 4, characterized in that said calculating a sum of the similarity of the model of the jth human body part of the three-dimensional human body model and each of said two-dimensional pixel block models as an objective function value, specifically comprises:
calculating the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by adopting the following formula:
wherein E is ij Similarity between model and i-th pixel block model of jth human body part representing three-dimensional human body model, d (c) i ,c j ) Color c of model representing jth human body part j Color c with the ith pixel block i Is similar to degree B i (x) Two-dimensional pixel block model representing the ith pixel block, A j (x) Model representing jth human body part of three-dimensional human body model, x represents position of any point of human body, μ i Representing the coordinates, delta, of the center of the ith pixel block i Representing half of the ith pixel blockThe length of the steel wire is longer than the length,projection coordinates of a model representing the jth human body part on the ith pixel block,/>Representing the projection length of the radius of the jth human body part on the ith pixel block;
wherein w is the penalty value of the color, ε is the color threshold, μ jx 、μ jy 、μ jz X-axis, y-axis and z-axis coordinates, sigma, respectively, of the jth human body part j Represents the radius of the jth body part, f il Parameters representing a camera that obtained the i-th pixel block;
and calculating the sum of the similarity between the model of the jth human body part of the three-dimensional human body model and each two-dimensional pixel block model by using a summation formula.
6. A human gesture recognition system, the recognition system comprising:
the image acquisition module is used for acquiring a plurality of human body images under different visual angles of the current frame;
the three-dimensional human body model building module is used for building a three-dimensional human body model by adopting a convolutional neural network algorithm according to a plurality of human body images;
the pixel clustering module is used for clustering adjacent pixels on each human body image respectively to obtain a plurality of pixel blocks;
the two-dimensional pixel block model building module is used for building a two-dimensional pixel block model of each pixel block;
the three-dimensional human body model optimizing module is used for optimizing the three-dimensional human body model according to the two-dimensional pixel block model of each pixel block to obtain an optimized three-dimensional human body model;
the human body posture recognition module is used for determining the human body posture of the current frame according to the optimized three-dimensional human body model;
the three-dimensional human body model is as follows:
wherein A (x) is a three-dimensional human body model, A j (x) Three-dimensional manikin of jth human body part, mu j Representing the coordinates, sigma, of the jth human body part j The radius of the jth human body part is represented, and x represents the position of any point of the human body;
the two-dimensional pixel block model is as follows:
wherein B is i (x) A two-dimensional pixel block model representing the ith pixel block, c i Color, μ representing the center of the ith pixel block i Representing the coordinates, delta, of the center of the ith pixel block i Representing half length of ith pixel block, x i Representing the projection of the position x of any point on the human body on the ith pixel block.
7. The human gesture recognition system of claim 6, wherein the recognition system further comprises:
the position coordinate prediction module is used for predicting the position coordinate of each human body part in the current frame according to the position coordinate of each human body part in each frame in the preset frame before the current frame;
the three-dimensional human body model pre-adjustment module is used for pre-adjusting the three-dimensional human body model according to the position coordinates of each human body part in the current frame to obtain a pre-adjusted three-dimensional human body model.
8. The human body posture recognition system of claim 6, wherein the three-dimensional human body model building module specifically comprises:
the key point acquisition sub-module is used for acquiring K key points from each human body image by adopting a horglass network structure;
the back projection operation sub-module is used for respectively enabling k=1, 2, … and K, calculating back projection rays of a kth key point in the human body image under each view angle based on parameters of the camera, and obtaining a plurality of back projection rays corresponding to each key point;
the key point coordinate determining submodule is used for determining the coordinate of a common point with the shortest total distance from a plurality of back projection rays corresponding to each key point, and the coordinate is used as the coordinate of the key point in the three-dimensional space to obtain the coordinate of each key point in the three-dimensional space;
and the three-dimensional human body model building sub-module is used for carrying out linear interpolation operation according to the coordinates of each key point in the three-dimensional space to obtain a three-dimensional human body model containing the three-dimensional coordinates of each human body part of the human body.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110411237.9A CN113065506B (en) | 2021-04-16 | 2021-04-16 | Human body posture recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110411237.9A CN113065506B (en) | 2021-04-16 | 2021-04-16 | Human body posture recognition method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113065506A CN113065506A (en) | 2021-07-02 |
CN113065506B true CN113065506B (en) | 2023-12-26 |
Family
ID=76566830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110411237.9A Active CN113065506B (en) | 2021-04-16 | 2021-04-16 | Human body posture recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113065506B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035769A (en) * | 2022-07-21 | 2022-09-09 | 四川嘉义索隐科技有限公司 | Training system for simulating electronic countermeasure |
CN115984972B (en) * | 2023-03-20 | 2023-08-11 | 乐歌人体工学科技股份有限公司 | Human body posture recognition method based on motion video driving |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715493A (en) * | 2015-03-23 | 2015-06-17 | 北京工业大学 | Moving body posture estimating method |
CN106910247A (en) * | 2017-03-20 | 2017-06-30 | 厦门幻世网络科技有限公司 | Method and apparatus for generating three-dimensional head portrait model |
CN108876814A (en) * | 2018-01-11 | 2018-11-23 | 南京大学 | A method of generating posture stream picture |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110008915A (en) * | 2019-04-11 | 2019-07-12 | 电子科技大学 | The system and method for dense human body attitude estimation is carried out based on mask-RCNN |
CN111428586A (en) * | 2020-03-09 | 2020-07-17 | 同济大学 | Three-dimensional human body posture estimation method based on feature fusion and sample enhancement |
CN111753747A (en) * | 2020-06-28 | 2020-10-09 | 高新兴科技集团股份有限公司 | Violent motion detection method based on monocular camera and three-dimensional attitude estimation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11069131B2 (en) * | 2019-09-26 | 2021-07-20 | Amazon Technologies, Inc. | Predictive personalized three-dimensional body models |
-
2021
- 2021-04-16 CN CN202110411237.9A patent/CN113065506B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715493A (en) * | 2015-03-23 | 2015-06-17 | 北京工业大学 | Moving body posture estimating method |
CN106910247A (en) * | 2017-03-20 | 2017-06-30 | 厦门幻世网络科技有限公司 | Method and apparatus for generating three-dimensional head portrait model |
CN108876814A (en) * | 2018-01-11 | 2018-11-23 | 南京大学 | A method of generating posture stream picture |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110008915A (en) * | 2019-04-11 | 2019-07-12 | 电子科技大学 | The system and method for dense human body attitude estimation is carried out based on mask-RCNN |
CN111428586A (en) * | 2020-03-09 | 2020-07-17 | 同济大学 | Three-dimensional human body posture estimation method based on feature fusion and sample enhancement |
CN111753747A (en) * | 2020-06-28 | 2020-10-09 | 高新兴科技集团股份有限公司 | Violent motion detection method based on monocular camera and three-dimensional attitude estimation |
Non-Patent Citations (5)
Title |
---|
Human body posture recognition algorithm for still images;Naigong Yu;《JOURNAL OF ENGINEERING-JOE》;全文 * |
一种基于全卷积神经网络的横担姿态测量方法;吴巍;郭飞;郭毓;郭健;;华中科技大学学报(自然科学版)(12);全文 * |
三维人脸建模及在跨姿态人脸匹配中的有效性验证;李昕昕;龚勋;;计算机应用(01);全文 * |
基于二维点云图的三维人体建模方法;张广翩;计忠平;;计算机工程与应用(19);全文 * |
基于卷积神经网络的人体姿态估计算法综述;彭帅;《北京信息科技大学学报(自然科学版)》;第35卷(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113065506A (en) | 2021-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180012411A1 (en) | Augmented Reality Methods and Devices | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
CN109903331B (en) | Convolutional neural network target detection method based on RGB-D camera | |
CN113065546B (en) | Target pose estimation method and system based on attention mechanism and Hough voting | |
CN113421328B (en) | Three-dimensional human body virtual reconstruction method and device | |
JP6207210B2 (en) | Information processing apparatus and method | |
US20170278302A1 (en) | Method and device for registering an image to a model | |
CN108171133B (en) | Dynamic gesture recognition method based on characteristic covariance matrix | |
CN111105432A (en) | Unsupervised end-to-end driving environment perception method based on deep learning | |
CN114666564A (en) | Method for synthesizing virtual viewpoint image based on implicit neural scene representation | |
CN111862278B (en) | Animation obtaining method and device, electronic equipment and storage medium | |
CN113065506B (en) | Human body posture recognition method and system | |
CN115661246A (en) | Attitude estimation method based on self-supervision learning | |
CN110827320B (en) | Target tracking method and device based on time sequence prediction | |
CN117994480A (en) | Lightweight hand reconstruction and driving method | |
CN115953447A (en) | Point cloud consistency constraint monocular depth estimation method for 3D target detection | |
CN111222459A (en) | Visual angle-independent video three-dimensional human body posture identification method | |
Gibson et al. | Quadruped gait analysis using sparse motion information | |
CN118071932A (en) | Three-dimensional static scene image reconstruction method and system | |
Wan et al. | Boosting image-based localization via randomly geometric data augmentation | |
CN116433822B (en) | Neural radiation field training method, device, equipment and medium | |
CN112509129A (en) | Spatial view field image generation method based on improved GAN network | |
JP2021071749A (en) | Three dimensional model generation apparatus and method | |
CN117726747A (en) | Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene | |
CN115497029A (en) | Video processing method, device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |