CN115359507A - Hand gesture recognition method and device, electronic equipment and computer-readable storage medium - Google Patents

Hand gesture recognition method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN115359507A
CN115359507A CN202210291130.XA CN202210291130A CN115359507A CN 115359507 A CN115359507 A CN 115359507A CN 202210291130 A CN202210291130 A CN 202210291130A CN 115359507 A CN115359507 A CN 115359507A
Authority
CN
China
Prior art keywords
gesture
recognized
output data
picture
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210291130.XA
Other languages
Chinese (zh)
Inventor
丁述勇
陈少辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhijiang College of ZJUT
Original Assignee
Zhijiang College of ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhijiang College of ZJUT filed Critical Zhijiang College of ZJUT
Priority to CN202210291130.XA priority Critical patent/CN115359507A/en
Publication of CN115359507A publication Critical patent/CN115359507A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to a hand gesture recognition method, a hand gesture recognition device, an electronic device and a computer readable storage medium. The method comprises the following steps: acquiring a gesture picture to be recognized, and extracting the contour characteristics of a hand in the gesture picture to be recognized based on an anti-noise robust image segmentation algorithm; vectorizing the outline features to obtain feature vectors for identifying the outline features; and determining the gesture in the gesture picture to be recognized according to the vector distance of the feature vector. The embodiment of the application can utilize a full convolution neural network architecture, realizes that the hand and the back are positioned in a low-illumination natural environment, and further matches a corresponding hand gesture through a radial vector of a hand outline interval. A satisfactory result can be obtained for a single image with serious mixed noise, and the foreground and the background of a moving target can be accurately separated in real time to identify the gesture type.

Description

Hand gesture recognition method and device, electronic equipment and computer-readable storage medium
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a hand gesture recognition method, apparatus, electronic device, and computer-readable storage medium.
Background
Gesture behavior recognition in a complex background is still a challenging problem, and in real applications, gestures are usually in a complex environment, such as too bright or too dark light, different gesture distances from a collection device, more interference gestures, and the like. The complex background often causes the accuracy of gesture recognition to be reduced, and the gesture cannot be accurately extracted and recognized, so that the robustness of the gesture recognition method is continuously enhanced, and the accuracy and the applicability are more challenging subjects. In a natural scene, images with over-bright or over-dark illumination, motion blur, multi-source illumination and uneven illumination often exist, and for the images, how to separate the foreground and the background of a moving target is difficult to recognize gesture behaviors. The processing algorithm of the Camera-based computer vision system mostly has high requirements on illumination and color, and the processing effect in a natural scene is not ideal. It is well known that images in natural scenes generally contain low frequency degradation problems, such as gaussian noise, camera imaging noise, motion blur, etc. These factors are all key factors that influence the function of the algorithm, and restrict the processing effect of the algorithm.
Disclosure of Invention
In order to solve or partially solve the problems in the related art, the application provides a hand gesture recognition method, a hand gesture recognition device, an electronic device, and a computer-readable storage medium, which can locate the hand and the back in a low-light natural environment by using a full convolution neural network architecture, and further match the corresponding hand gesture through a radial vector of a hand contour interval. A satisfactory result can be obtained for a single image with serious mixed noise, and the foreground and the background of a moving target can be accurately separated in real time to identify the gesture type.
The application provides a hand gesture recognition method in a first aspect, comprising:
acquiring a gesture picture to be recognized, wherein the gesture picture to be recognized comprises a hand to be recognized;
extracting the contour features of the hand in the gesture picture to be recognized based on an anti-manufacturing robust image segmentation algorithm;
vectorizing the contour features to obtain feature vectors for identifying the contour features;
and determining the gesture in the gesture picture to be recognized according to the vector distance of the feature vector.
As a possible embodiment of the present application, in this embodiment, the extracting, by the image segmentation algorithm based on robust modeling, contour features of a human hand in the gesture picture to be recognized includes:
carrying out binarization processing on the gesture picture to be recognized to obtain a gray scale image of the picture to be recognized;
inputting the gray-scale image into a data refining layer, and carrying out data refining and enhancing on the gray-scale image to obtain first output data;
inputting the first output data into a data normalization layer, and performing mean-variance normalization pretreatment on the gray-scale image to obtain second output data;
inputting the second output data to a downsampling branch, and downsampling the second output data to obtain third output data;
inputting the third output data to an up-sampling branch, and up-sampling the second output data to obtain fourth output data;
and inputting the fourth output data into a weighted loss branch, and calculating the network error of the fourth output data to obtain the profile characteristic.
As a possible embodiment of the present application, in this embodiment, the vectorizing the contour feature to obtain a feature vector for identifying the contour feature includes:
determining edge points of the gesture to be recognized in the gray-scale image by adopting a preset pair contour tracking algorithm, constructing an edge point set, and calculating mass center points of the gesture to be recognized based on the edge point set;
dividing the outline of the gesture to be recognized into fan-shaped areas with the same circle center angle by taking the center of mass point as the circle center;
and calculating the vector distance from the point of the outline of the gesture to be recognized in each fan-shaped area to the circle center, and normalizing the vector distance to obtain the feature vector of the outline feature.
As a possible embodiment of the present application, in this embodiment, the determining a gesture in the gesture picture to be recognized by using the vector distance of the feature vector includes:
and searching a corresponding target gesture in a preset gesture library based on the vector distance of the feature vector of the gesture to be recognized, wherein the similarity between the vector distance of the feature vector of the target gesture and the vector distance of the feature vector of the gesture to be recognized is highest. .
The second aspect of the application provides a hand gesture recognition device, which comprises a picture acquisition module, a gesture recognition module and a gesture recognition module, wherein the picture acquisition module is used for acquiring a gesture picture to be recognized, and the gesture picture to be recognized comprises a hand to be recognized;
the characteristic extraction module is used for extracting the contour characteristics of the hand in the gesture picture to be recognized based on an anti-robustness image segmentation algorithm;
the vector extraction module is used for vectorizing the contour features to obtain feature vectors for identifying the contour features;
and the gesture recognition module is used for determining the gesture in the gesture picture to be recognized according to the vector distance of the feature vector.
As a possible implementation manner of the embodiment of the present application, in this implementation manner, when the feature extraction module extracts the contour features of the hand in the gesture picture to be recognized based on the robust image segmentation algorithm, the feature extraction module may be configured to:
carrying out binarization processing on the gesture picture to be recognized to obtain a gray scale image of the picture to be recognized;
inputting the gray-scale image into a data refinement layer, and performing data refinement and enhancement on the gray-scale image to obtain first output data;
inputting the first output data into a data normalization layer, and performing mean-variance normalization pretreatment on the gray-scale image to obtain second output data;
inputting the second output data to a down-sampling branch, and performing down-sampling on the second output data to obtain third output data;
inputting the third output data to an up-sampling branch, and up-sampling the second output data to obtain fourth output data;
and inputting the fourth output data into a weighted loss branch, and calculating the network error of the fourth output data to obtain the profile characteristic.
As a possible embodiment of the present application, in this embodiment, when vectorizing the contour features and obtaining feature vectors for identifying the contour features, the feature extraction module may be configured to:
determining edge points of the gesture to be recognized in the gray-scale image by adopting a preset pair contour tracking algorithm, constructing an edge point set, and calculating mass center points of the gesture to be recognized based on the edge point set;
dividing the outline of the gesture to be recognized into fan-shaped areas with the same circle center angle by taking the center of mass point as the circle center;
and calculating the vector distance from the point of the outline of the gesture to be recognized in each fan-shaped area to the circle center, and normalizing the vector distance to obtain the feature vector of the outline feature.
As a possible embodiment of the present application, in this embodiment, when determining a gesture in the gesture picture to be recognized according to the vector distance of the feature vector, the gesture recognition module may be configured to:
and searching a corresponding target gesture in a preset gesture library based on the vector distance of the feature vector of the gesture to be recognized, wherein the vector distance of the feature vector of the target gesture has the highest similarity with the vector distance of the feature vector of the gesture to be recognized.
A third aspect of the present application provides an electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the method as described above.
According to the embodiment of the application, the contour features in the image to be recognized are obtained, the feature vectors of the gesture to be recognized are determined based on the contour features, the gesture to be recognized is determined based on the vector distance of the feature vectors, the hand and the back can be positioned in the low-light natural environment by utilizing the full convolution neural network architecture, and the corresponding hand gesture is matched further through the radial vectors of the hand contour interval. A satisfactory result can be obtained for a single image with serious mixed noise, and the foreground and the background of a moving target can be accurately separated in real time to identify the gesture type.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the application.
Fig. 1 is a flow chart of a hand gesture recognition method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a contour feature extraction method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a feature vector extraction method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an Nv-Net architecture according to an embodiment of the present application;
FIG. 5 is a diagram illustrating a hand segmentation result according to an embodiment of the present application;
FIG. 6 is a hand gesture standard template library portion sample as shown in an embodiment of the present application;
FIG. 7 is a sample of example contours in a library of hand gesture templates as shown in an embodiment of the present application;
FIG. 8 is a sample of contours of a sequence of hand gesture images shown in an embodiment of the present application;
FIG. 9 is a schematic diagram of an extracted contour after object segmentation according to an embodiment of the present application;
FIG. 10 is a sample vectorization diagram of a hand gesture standard template library portion shown in an embodiment of the present application;
fig. 11 is a schematic structural diagram of a hand gesture recognition apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Gesture behavior recognition in a complex background is still a challenging problem, and in real applications, gestures are usually in a complex environment, such as too bright or too dark light, different gesture distances from a collection device, more interference gestures, and the like. The complex background often causes the accuracy of gesture recognition to be reduced, and the gesture cannot be accurately extracted and recognized, so that the robustness of the gesture recognition method is continuously enhanced, and the accuracy and the applicability are more challenging subjects. In a natural scene, images with over-bright or over-dark illumination, motion blur, multi-source illumination and uneven illumination often exist, and for the images, how to separate the foreground and the background of a moving target is difficult to recognize gesture behaviors. The processing algorithm of the Camera-based computer vision system mostly has high requirements on illumination and color, and the processing effect in a natural scene is not ideal. It is well known that images in natural scenes typically contain low frequency degradation problems, such as gaussian noise, camera imaging noise, motion blur, etc. These factors are all key factors that influence the function of the algorithm, and restrict the processing effect of the algorithm.
In view of the above problems, embodiments of the present application provide a hand gesture recognition method, which can utilize a full convolution neural network architecture to position a hand and a back in a low-light natural environment, and further match a corresponding hand gesture through a radial vector of a hand contour interval. A satisfactory result can be obtained for a single image with serious mixed noise, and the foreground and the background of a moving target can be accurately separated in real time to identify the gesture type.
The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a hand gesture recognition method according to an embodiment of the present application.
Referring to fig. 1, a hand gesture recognition method provided in the embodiment of the present application includes:
step S101, acquiring a gesture picture to be recognized, wherein the gesture picture to be recognized comprises a hand to be recognized;
s102, extracting contour features of a hand in the gesture picture to be recognized based on an anti-robustness image segmentation algorithm;
step S103, vectorizing the contour features to obtain feature vectors for identifying the contour features;
and step S104, determining the gesture in the gesture picture to be recognized according to the vector distance of the feature vector.
As a possible embodiment of the present application, as shown in fig. 2, in this embodiment, the extracting, by using an image segmentation algorithm based on robust modeling, contour features of a human hand in the gesture picture to be recognized includes:
step S201, performing binarization processing on the gesture picture to be recognized to obtain a gray scale image of the picture to be recognized;
step S202, inputting the gray-scale image into a data refinement layer, and performing data refinement and enhancement on the gray-scale image to obtain first output data;
step S203, inputting the first output data into a data normalization layer, and performing mean-variance normalization preprocessing on the gray-scale image to obtain second output data;
step S204, inputting the second output data to a down-sampling branch, and performing down-sampling on the second output data to obtain third output data;
step S205, inputting the third output data to an up-sampling branch, and up-sampling the second output data to obtain fourth output data;
step S206, inputting the fourth output data to a weighted loss branch, and calculating a network error of the fourth output data to obtain the profile feature.
As a possible embodiment of the present application, as shown in fig. 3, in this embodiment, on the gray-scale map, a preset pair profile tracking algorithm is adopted, edge points of a gesture to be recognized in the gray-scale map are determined, an edge point set is constructed, and a centroid point of the gesture to be recognized is calculated based on the edge point set;
step S301, taking the center of mass point as a circle center, and dividing the outline of the gesture to be recognized into fan-shaped areas with the same circle center angle;
step S302, calculating the vector distance from the point of the outline of the gesture to be recognized in each sector area to the circle center, and normalizing the vector distance to obtain the feature vector of the outline feature.
In the embodiment of the application, a gesture picture to be recognized is obtained, the gesture picture to be recognized at least comprises a human hand to be recognized, and a human hand positioning model for human posture skeleton detection is adopted for the gesture picture to be recognized, wherein the model comprises: the method comprises the following steps that the gesture picture to be recognized serves as an input image, primary features of the input image are extracted from the first 10 layers of the VGG-19 network and a multi-stage CNN network, the multi-stage CNN network is used for calculating an image feature map, and the multi-stage CNN network comprises two important branches: the first branch predicts a confidence map set of joint regions and the second branch predicts a confidence map set of joint points. Each branch of the network can be optimized individually using an iterative predictor, i.e., relay supervision optimization, to solve the gradient vanishing problem. In addition to the first stage, the inputs of the second and third stages are merged with the output of the previous stage based on the output of VGG-19. Therefore, the characteristic information of each stage is fused with each other, and a more accurate prediction result is finally obtained.
In the embodiment of the application, when a human hand contour in an image to be recognized is recognized, an Nv-Net architecture is adopted, and a network architecture is shown in fig. 4 and consists of a data thinning part (a left part is marked as "DR"), a data normalizing part (a left part is marked as "DN"), a down-sampling part (left green) and an up-sampling part (right green) 4. Among them, a data refinement layer. This portion is labeled "DR" and is located at the very front of the frame. The part introduces the input channels of the three-channel gray scale image. After the data passes through the data refinement and enhancement layer of the front end of the framework, the data in Feature Map of each layer in the framework is more detailed and accurate. A data normalization layer. This section performs mean-variance normalization preprocessing on the input image data to accelerate network convergence. This portion, labeled "DN", is located at the front of the frame. And (3) convolution kernel: the convolution kernel is multi-channel to be compatible with the data type of the processed image and the data type of the Feature Map. And a down-sampling part. The downsampling portion conforms to a representative architecture of a convolutional neural network. It includes two-use two-repeat 3 × 3 convolution and three-use four-repeat 3 × 3 convolution, all of which are followed by a linear correction unit (ReLU) and a 2 × 2 mean value Pooling operation for downsampling. In each down-sampling step, the number of feature channels is doubled. But the data in Feature Map and conditional Kernel are more detailed and accurate. An upsampling section. In the up-sampling portion, there are a total of three phases. In the first two stages, each step involves convolution by a 4 × 4 convolution kernel ("up-convolution") resulting in an upsampling of Feature maps, which halves the number of Feature maps. Then, data fusion is performed with the relevant Feature Map from the downsampling section, and the outputs are named "pool4" and "pool3", respectively. The last stage is similar to the first two stages, with the size of the convolution kernel being 16 × 16. The network structure has a total of 19 convolutional layers and 3 deconvolution layers. The data in the network is more detailed and accurate. The weighted feature fusion based on the channel attention mechanism is the weighted fusion of the feature map extracted from the downsampling convolution portion and the feature map obtained from the upsampling deconvolution portion. The fusion of Feature maps of the convolution and deconvolution stages is essential for the overall effect of the framework. Joint estimation of data of feature maps in two different phases methods for final object localization can generally improve the overall performance to some extent. A loss of weight. The network uses a WSCE (Weighted-Sigmoid-Cross-Encopy) loss function to calculate the network error. Pixel-level data enhancement is performed on the foreground object regions by weight. In the model framework, pixel-level Cross Entropy (Weighted-Sigmoid-Cross-inverse) loss functions are used to compute the error between the predicted and true values of the network model.
The cross-entropy function can be used to define a loss function for a binary problem in machine learning and optimization. pi represents the true label and qi is the predicted value of the model. Given an input vector x, the prediction probability value of the model is
Figure RE-GDA0003897886860000091
Where z is ∈ {0,1}.
The prediction probability of z =1 is given by,
Figure RE-GDA0003897886860000092
where the weight w vector is optimized by using an algorithm such as gradient descent. Similarly, the prediction probability of z =0 is simply given by,
Figure RE-GDA0003897886860000093
the true (observed) probability can be similarly expressed as p z=1 = z and p z=0 =1-z, where p ∈ { z,1-z }.
After the sign is agreed, cross entropy can be used to obtain a measure of similarity between p and q:
Figure RE-GDA0003897886860000094
assuming N samples, the loss function can be expressed as
Figure RE-GDA0003897886860000095
Wherein N =1,2, … N
The Sigmoid function is g (y) = 1/(1+e) -y ). Typically, y = f (x) = w · x + b is calculated, and then the input can be compressed to the domain of (0,1) with Sigmoid function. To predict the probability
Figure RE-GDA0003897886860000096
Must be in the range of [0,1]Within the range of (1). The Sigmoid function is introduced into the prediction probability calculation.
The loss function uses a weighted cross-entropy form rather than the usual average form
Figure RE-GDA0003897886860000097
Where ω represents the weight of the pixel under training. The more important the pixel, the greater the weight.
The hand segmentation result is schematically shown in fig. 5.
In the embodiment of the application, typical human behaviors such as gestures and other repeated actions can extract hundreds of outlines in the whole video sequence, but the shapes of the outlines show almost periodic similarity. The motion may be represented by a set of basic shapes for the entire cycle. Therefore, a multi-view hand gesture template library is established under controlled conditions, and a partial hand gesture sample image is shown in fig. 6.
Furthermore, key gestures in the basic gesture set show differences in different gestures. For example, the outline of the palm can be represented by four key poses as shown in fig. 7, and a multi-view hand pose outline template library based on bag-of-key-spots is established, and partial samples are shown in fig. 8.
In the embodiment of the application, on a binary image output after segmentation, by using a contour tracking algorithm, the whole contour information of a gesture, namely edge points, is recorded, and an edge point set Medge is formed. M = { M 1 ,M 2 …M n }; for calculating the centroid C of the object Z m =(X cm ,Y cm ) The following equation is used:
Figure RE-GDA0003897886860000101
Figure RE-GDA0003897886860000102
where n is the number of pixels in the object Z.
Fig. 9 shows the contour extracted after object segmentation, and another feature extracted from the foreground object region is the contour distance signal. Let M = { M 1 ,M 2 …M n Is the outline of an object Z, which consists of n points ordered from the top center point. Detecting the area in a clockwise direction, and C m Is the centroid point of Z. Distance signal D = { D = 1 ,d 2 ,…,d n Is calculated by calculating C m And each M from 1 to n in profile i The distance between them is generated as shown in the following equation:
Figure RE-GDA0003897886860000108
where Distance function is the euclidean Distance.
Using the centroid as the origin and applying a central angle radial averaging scheme. In this way, the contour can be divided into sectors of the same circle-center angle.
Figure RE-GDA0003897886860000103
The vector distance of the points on the contour of each sector area is calculated. This value is defined as the distance between the contour point and the centroid, and the values of the distance from the center within the sector area S are normalized to a unit sum and connected to obtain a normalized feature vector of the magnitude of the distance from the center.
v j =max(d k ,d k+1 ,…d i )-min(d k ,d k+1 ,…d i )
Figure RE-GDA0003897886860000104
Figure RE-GDA0003897886860000105
Figure RE-GDA0003897886860000106
Figure RE-GDA0003897886860000107
And splicing the normalized feature vectors in all the sector areas S into a profile feature vector.
Figure RE-GDA0003897886860000111
Creating a template database of sample object contours manually tags object types, which is done manually in an offline situation. Fig. 10 is a sample vectorization diagram of a hand gesture standard template library portion.
The similarity between the shapes of the two objects a and B is compared by finding the distance between the distance signals DA and DB. The distance between the two scaled and normalized distance signals DA and DB is calculated as follows:
Figure RE-GDA0003897886860000112
TO find the type TO of the object O, we compare its range signal DO with the range signals of all objects in the template database. The type TP of the template object P is specified as the type of query object O, TO = TP where P satisfies the following condition:
Figure RE-GDA0003897886860000113
the query object and the template database object are classified by type. The distance between two objects can be calculated using euclidean distance; the high computational complexity may not be suitable for the purpose of real-time systems.
To reduce noise in object classification, a maximum likelihood scheme is employed. The assigned object type is calculated for a window of k (= 5) frames, and the maximum value is assigned as the type. This reduces misclassification due to segmentation errors.
According to the embodiment of the application, the contour features in the image to be recognized are obtained, the feature vectors of the gesture to be recognized are determined based on the contour features, the gesture to be recognized is determined based on the vector distance of the feature vectors, the hand and the back can be positioned in the low-light natural environment by utilizing the full convolution neural network architecture, and the corresponding hand gesture is matched further through the radial vectors of the hand contour interval. A satisfactory result can be obtained for a single image with serious mixed noise, and the foreground and the background of a moving target can be accurately separated in real time to identify the gesture type.
The embodiment of the application function implementation method corresponds to the embodiment of the application function implementation method, and the application also provides a hand gesture recognition device, electronic equipment and a corresponding embodiment.
Fig. 11 is a schematic structural diagram of a hand gesture recognition apparatus according to an embodiment of the present application.
Referring to fig. 11, the hand gesture recognition apparatus 110 provided in the embodiment of the present application includes a picture obtaining module 1110, a feature extracting module 1120, a vector extracting module 1130, and a gesture recognition module 140, where;
the image obtaining module 1110 is configured to obtain a gesture image to be recognized, where the gesture image to be recognized includes a human hand to be recognized;
the feature extraction module 1120 is used for extracting the contour features of the hand in the gesture picture to be recognized based on an anti-manufacturing robust image segmentation algorithm;
a vector extraction module 1130, configured to vectorize the contour feature to obtain a feature vector for identifying the contour feature;
a gesture recognition module 1140 for determining a gesture in the picture of gestures to be recognized for the vector distance of the feature vector.
As a possible implementation manner of the embodiment of the present application, in this implementation manner, when the feature extraction module extracts the contour features of the hand in the gesture picture to be recognized based on the robust image segmentation algorithm, the feature extraction module may be configured to:
carrying out binarization processing on the gesture picture to be recognized to obtain a gray scale image of the picture to be recognized;
inputting the gray-scale image into a data refinement layer, and performing data refinement and enhancement on the gray-scale image to obtain first output data;
inputting the first output data into a data normalization layer, and performing mean-variance normalization pretreatment on the gray-scale image to obtain second output data;
inputting the second output data to a down-sampling branch, and performing down-sampling on the second output data to obtain third output data;
inputting the third output data to an up-sampling branch, and up-sampling the second output data to obtain fourth output data;
and inputting the fourth output data into a weighted loss branch, and calculating the network error of the fourth output data to obtain the profile characteristic.
As a possible embodiment of the present application, in this embodiment, when vectorizing the contour features and obtaining feature vectors for identifying the contour features, the feature extraction module may be configured to:
determining edge points of the gesture to be recognized in the gray-scale image by adopting a preset pair contour tracking algorithm, constructing an edge point set, and calculating mass center points of the gesture to be recognized based on the edge point set;
dividing the outline of the gesture to be recognized into fan-shaped areas with the same circle center angle by taking the center of mass point as the circle center;
and calculating the vector distance from the point of the outline of the gesture to be recognized in each fan-shaped area to the circle center, and normalizing the vector distance to obtain the feature vector of the outline feature.
As a possible embodiment of the present application, in this embodiment, when determining a gesture in the gesture picture to be recognized according to the vector distance of the feature vector, the gesture recognition module may be configured to:
and searching a corresponding target gesture in a preset gesture library based on the vector distance of the feature vector of the gesture to be recognized, wherein the similarity between the vector distance of the feature vector of the target gesture and the vector distance of the feature vector of the gesture to be recognized is highest.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
According to the embodiment of the application, the contour features in the image to be recognized are obtained, the feature vectors of the gesture to be recognized are determined based on the contour features, the gesture to be recognized is determined based on the vector distance of the feature vectors, the hand and the back can be positioned in the low-light natural environment by utilizing the full convolution neural network architecture, and the corresponding hand gesture is matched further through the radial vectors of the hand contour interval. A satisfactory result can be obtained for a single image with serious mixed noise, and the foreground and the background of a moving target can be accurately separated in real time to identify the gesture type.
Fig. 12 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
Referring to fig. 12, the electronic device 1000 includes a memory 1010 and a processor 1020.
The Processor 1020 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 1010 may include various types of storage units, such as system memory, read Only Memory (ROM), and permanent storage. The ROM may store, among other things, static data or instructions for the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered down. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 1010 has stored thereon executable code that, when processed by the processor 1020, may cause the processor 1020 to perform some or all of the methods described above.
Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.
Alternatively, the present application may also be embodied as a computer-readable storage medium (or non-transitory machine-readable storage medium or machine-readable storage medium) having executable code (or a computer program or computer instruction code) stored thereon, which, when executed by a processor of an electronic device (or server, etc.), causes the processor to perform part or all of the steps of the above-described methods according to the present application.
Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method of hand gesture recognition, the method comprising:
acquiring a gesture picture to be recognized, wherein the gesture picture to be recognized comprises a hand to be recognized;
extracting contour features of the hand in the gesture picture to be recognized based on an image segmentation algorithm of an anti-manufacturing robustness;
vectorizing the contour features to obtain feature vectors for identifying the contour features;
and determining the gesture in the gesture picture to be recognized according to the vector distance of the feature vector.
2. The gesture recognition method according to claim 1, wherein the anti-counterfeiting robust image segmentation algorithm extracts the contour features of the human hand in the gesture picture to be recognized, and comprises the following steps:
carrying out binarization processing on the gesture picture to be recognized to obtain a gray scale image of the picture to be recognized;
inputting the gray-scale image into a data refinement layer, and performing data refinement and enhancement on the gray-scale image to obtain first output data;
inputting the first output data into a data standardization layer, and carrying out mean-variance normalization pretreatment on the gray-scale image to obtain second output data;
inputting the second output data to a down-sampling branch, and performing down-sampling on the second output data to obtain third output data;
inputting the third output data to an up-sampling branch, and up-sampling the second output data to obtain fourth output data;
and inputting the fourth output data into a weighted loss branch, and calculating the network error of the fourth output data to obtain the profile characteristic.
3. The gesture recognition method according to claim 2, wherein the vectorizing the contour features to obtain feature vectors for identifying the contour features comprises:
determining edge points of the gesture to be recognized in the gray-scale image by adopting a preset pair contour tracking algorithm, constructing an edge point set, and calculating mass center points of the gesture to be recognized based on the edge point set;
dividing the outline of the gesture to be recognized into fan-shaped areas with the same circle center angle by taking the center of mass point as the circle center;
and calculating the vector distance from the point of the outline of the gesture to be recognized in each fan-shaped area to the circle center, and normalizing the vector distance to obtain the feature vector of the outline feature.
4. The gesture recognition method according to claim 1, wherein the determining the vector distance of the feature vector to the gesture to be recognized in the picture of the gesture to be recognized comprises:
and searching a corresponding target gesture in a preset gesture library based on the vector distance of the feature vector of the gesture to be recognized, wherein the vector distance of the feature vector of the target gesture has the highest similarity with the vector distance of the feature vector of the gesture to be recognized.
5. A gesture recognition apparatus, comprising:
the image acquisition module is used for acquiring a gesture image to be recognized, wherein the gesture image to be recognized comprises a hand to be recognized;
the characteristic extraction module is used for extracting the contour characteristics of the hand in the gesture picture to be recognized based on an anti-robustness image segmentation algorithm;
the vector extraction module is used for vectorizing the contour features to obtain feature vectors for identifying the contour features;
and the gesture recognition module is used for determining the gesture in the gesture picture to be recognized according to the vector distance of the feature vector.
6. The gesture recognition device according to claim 5, wherein the feature extraction module, when extracting the contour features of the human hand in the gesture picture to be recognized based on an anti-manufacturing robust image segmentation algorithm, is configured to:
carrying out binarization processing on the gesture picture to be recognized to obtain a gray scale image of the picture to be recognized;
inputting the gray-scale image into a data refinement layer, and performing data refinement and enhancement on the gray-scale image to obtain first output data;
inputting the first output data into a data normalization layer, and performing mean-variance normalization pretreatment on the gray-scale image to obtain second output data;
inputting the second output data to a downsampling branch, and downsampling the second output data to obtain third output data;
inputting the third output data to an up-sampling branch, and up-sampling the second output data to obtain fourth output data;
and inputting the fourth output data into a weighted loss branch, and calculating the network error of the fourth output data to obtain the profile characteristic.
7. The gesture recognition apparatus according to claim 6, wherein the feature extraction module, when vectorizing the contour features to obtain feature vectors for identifying the contour features, is configured to:
determining edge points of the gesture to be recognized in the gray-scale image by adopting a preset pair contour tracking algorithm, constructing an edge point set, and calculating mass center points of the gesture to be recognized based on the edge point set;
dividing the outline of the gesture to be recognized into fan-shaped areas with the same circle center angle by taking the center of mass point as the circle center;
and calculating the vector distance from the point of the outline of the gesture to be recognized in each fan-shaped area to the circle center, and normalizing the vector distance to obtain the feature vector of the outline feature.
8. The gesture recognition device of claim 5, wherein the gesture recognition module, when determining the gesture in the picture of gestures to be recognized from the vector distance of the feature vector, is configured to:
and searching a corresponding target gesture in a preset gesture library based on the vector distance of the feature vector of the gesture to be recognized, wherein the similarity between the vector distance of the feature vector of the target gesture and the vector distance of the feature vector of the gesture to be recognized is highest.
9. An electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-5.
10. A computer-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-5.
CN202210291130.XA 2022-03-23 2022-03-23 Hand gesture recognition method and device, electronic equipment and computer-readable storage medium Withdrawn CN115359507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210291130.XA CN115359507A (en) 2022-03-23 2022-03-23 Hand gesture recognition method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210291130.XA CN115359507A (en) 2022-03-23 2022-03-23 Hand gesture recognition method and device, electronic equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115359507A true CN115359507A (en) 2022-11-18

Family

ID=84030511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210291130.XA Withdrawn CN115359507A (en) 2022-03-23 2022-03-23 Hand gesture recognition method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115359507A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024183653A1 (en) * 2023-03-08 2024-09-12 虹软科技股份有限公司 Gesture recognition method, apparatus and system, and storage medium
CN118644896A (en) * 2024-08-16 2024-09-13 山东商务职业学院 Motion gesture recognition method and system for VR equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024183653A1 (en) * 2023-03-08 2024-09-12 虹软科技股份有限公司 Gesture recognition method, apparatus and system, and storage medium
CN118644896A (en) * 2024-08-16 2024-09-13 山东商务职业学院 Motion gesture recognition method and system for VR equipment

Similar Documents

Publication Publication Date Title
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN110414507B (en) License plate recognition method and device, computer equipment and storage medium
US10373019B2 (en) Low- and high-fidelity classifiers applied to road-scene images
US20170206434A1 (en) Low- and high-fidelity classifiers applied to road-scene images
CN111738262A (en) Target detection model training method, target detection model training device, target detection model detection device, target detection equipment and storage medium
CN110942471B (en) Long-term target tracking method based on space-time constraint
CN115359507A (en) Hand gesture recognition method and device, electronic equipment and computer-readable storage medium
AU2020272936B2 (en) Methods and systems for crack detection using a fully convolutional network
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
Salem A Survey on Various Image Inpainting Techniques.
CN112784750A (en) Fast video object segmentation method and device based on pixel and region feature matching
Cho et al. Real-time precise object segmentation using a pixel-wise coarse-fine method with deep learning for automated manufacturing
CN113989604A (en) Tire DOT information identification method based on end-to-end deep learning
CN113435319A (en) Classification method combining multi-target tracking and pedestrian angle identification
CN108960247B (en) Image significance detection method and device and electronic equipment
Cai et al. IOS-Net: An inside-to-outside supervision network for scale robust text detection in the wild
Esfahani et al. DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments
Ladický et al. Learning the matching function
Cho et al. Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation
CN110826564A (en) Small target semantic segmentation method and system in complex scene image
Yang et al. On-road vehicle tracking using keypoint-based representation and online co-training
Sliti et al. Efficient visual tracking via sparse representation and back-projection histogram
Huang et al. A multistage target tracker in IR image sequences
CN108154107A (en) A kind of method of the scene type of determining remote sensing images ownership
Jiang et al. Deep Learning-Based Scene Text Image Super-Resolution Methods: A Survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20221118