CN115079818B

CN115079818B - Hand capturing method and system

Info

Publication number: CN115079818B
Application number: CN202210497950.4A
Authority: CN
Inventors: 赵天奇; 李志豪; 巴君
Original assignee: Beijing Juli Dimension Technology Co ltd
Current assignee: Beijing Juli Dimension Technology Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2024-07-16
Anticipated expiration: 2042-05-07
Also published as: CN115079818A

Abstract

The embodiment of the application discloses a hand capturing method and a hand capturing system, wherein the method comprises the following steps: predicting the probabilities of different gesture semantics according to the gesture image of the current frame; inputting the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic, and obtaining the image module matching probability and skeleton rotation value of each gesture semantic; multiplying the image module matching probability and the skeleton rotation value of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic; normalizing the fusion probability of all gesture semantics, and obtaining a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value; and outputting a skeleton rotation value according to the skeleton rotation distribution function of the current frame image so as to drive the virtual hand skeleton motion according to the frame skeleton rotation value. On the premise of ensuring the capturing precision, the efficiency of the whole capturing process is obviously improved.

Description

Hand capturing method and system

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a hand capturing method and a hand capturing system.

Background

The traditional method for controlling the gesture actions of the virtual person realizes gesture capturing by wearing the manual capturing glove, but the cost of the manual capturing glove is high, and the special glove is required to be customized according to the hand type of different consumers, so that the method is a great obstacle for the common consumers to experience the virtual person technology. In addition, the wearing time of the glove by the dynamic predator is too long, so that the wearing time is easy to cause discomfort of hands of the wearing personnel due to the constraint of the wearing personnel, and the control experience of the virtual person is reduced.

Disclosure of Invention

Therefore, the embodiment of the application provides the hand capturing method and the hand capturing system, which reduce the cost of the traditional hand capturing scheme and solve the problem of wearing comfort, hand capturing personnel can realize high-precision hand capturing only by a camera without purchasing and wearing the traditional hand capturing device, and the efficiency of the whole capturing process is obviously improved on the premise of ensuring the capturing precision.

In order to achieve the above object, the embodiment of the present application provides the following technical solutions:

according to a first aspect of an embodiment of the present application, there is provided a hand capturing method, the method including:

Collecting a gesture image of a current frame;

predicting the probabilities of different gesture semantics according to the gesture image of the current frame;

inputting the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic, and obtaining the image module matching probability and skeleton rotation value of each gesture semantic;

multiplying the image module matching probability and the skeleton rotation value of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic;

normalizing the fusion probability of all gesture semantics, and obtaining a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value;

And outputting a skeleton rotation value according to the skeleton rotation distribution function of the current frame image so as to drive the virtual hand skeleton motion according to the frame skeleton rotation value.

Optionally, after the obtaining the bone rotation distribution function of the current frame image according to the processed fusion probability and the corresponding bone rotation value, before the outputting the bone rotation value according to the bone rotation distribution function of the current frame image, the method further includes:

acquiring a gesture capturing gesture distribution function and a motion description amount of a previous frame image;

Estimating and obtaining a skeleton rotation value according to the motion description quantity of the previous frame image;

And acquiring a skeletal rotation distribution function of the fused current frame image based on the gesture capturing gesture distribution function and the skeletal rotation value of the fused previous frame image and the skeletal rotation distribution function of the current frame image by Kalman filtering.

Optionally, the obtaining the bone rotation distribution function of the current frame image according to the processed fusion probability and the corresponding bone rotation value includes:

And carrying out maximum likelihood function solving according to all the processed fusion probabilities and the corresponding bone rotation values to obtain a bone rotation distribution function of the current frame image.

Optionally, the outputting the bone rotation value according to the bone rotation distribution function of the current frame image includes:

And calculating a mean value according to the bone rotation distribution function of the current frame image, and outputting the mean value as a bone rotation value.

Optionally, after predicting the probability of different gesture semantics from the current frame gesture image, the method further comprises:

and screening the probability of gesture semantics meeting the conditions according to the set probability threshold.

According to a second aspect of an embodiment of the present application, there is provided a hand capture system, the system comprising:

The data acquisition module is used for acquiring a gesture image of the current frame;

The classification prediction module is used for predicting the probabilities of different gesture semantics according to the gesture image of the current frame;

The skeleton rotation value calculation module is used for inputting the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic to obtain the image module matching probability and the skeleton rotation value of each gesture semantic;

The fusion probability calculation module is used for multiplying the image module matching probability and the skeleton rotation value of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic;

The skeleton rotation distribution module is used for carrying out normalization processing on the fusion probability of all gesture semantics, and obtaining a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value;

And the skeleton rotation value output module is also used for outputting skeleton rotation values according to a skeleton rotation distribution function of the current frame image so as to drive the virtual hand skeleton motion according to the frame skeleton rotation values.

Optionally, the system further comprises:

the data acquisition module is also used for acquiring gesture capturing gesture distribution functions and motion description amounts of the previous frame image;

The bone rotation value calculation module is also used for estimating and obtaining a bone rotation value according to the motion description quantity of the previous frame image;

The fusion module is used for fusing the gesture capturing gesture distribution function and the skeleton rotation value of the previous frame image and the skeleton rotation distribution function of the current frame image based on Kalman filtering to obtain the skeleton rotation distribution function of the fused current frame image.

Optionally, the bone rotation distribution module is specifically configured to:

According to a third aspect of an embodiment of the present application, there is provided an electronic apparatus including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to perform the method of the first aspect.

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon computer readable instructions executable by a processor to implement the method of the first aspect described above.

In summary, the embodiment of the application provides a method and a system for capturing a hand, which are used for capturing a gesture image of a current frame; predicting the probabilities of different gesture semantics according to the gesture image of the current frame; inputting the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic, and obtaining the image module matching probability and skeleton rotation value of each gesture semantic; multiplying the image module matching probability and the skeleton rotation value of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic; normalizing the fusion probability of all gesture semantics, and obtaining a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value; and outputting a skeleton rotation value according to the skeleton rotation distribution function of the current frame image so as to drive the virtual hand skeleton motion according to the frame skeleton rotation value. On the premise of ensuring the capturing precision, the efficiency of the whole capturing process is obviously improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.

Fig. 1 is a schematic flow chart of a hand capturing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a hand capture embodiment provided by an embodiment of the present application;

FIG. 3 is a block diagram of a hand capture system according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

Fig. 5 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 shows a hand capturing method according to an embodiment of the present application, where the method includes:

Step 101: collecting a gesture image of a current frame;

Step 102: predicting the probabilities of different gesture semantics according to the gesture image of the current frame;

Step 103: inputting the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic, and obtaining the image module matching probability and skeleton rotation value of each gesture semantic;

step 104: multiplying the image module matching probability and the skeleton rotation value of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic;

step 105: normalizing the fusion probability of all gesture semantics, and obtaining a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value;

Step 106: and outputting a skeleton rotation value according to the skeleton rotation distribution function of the current frame image so as to drive the virtual hand skeleton motion according to the frame skeleton rotation value.

In a possible implementation manner, after predicting the probability of different gesture semantics according to the gesture image of the current frame in step 102, the method further includes:

In a possible implementation manner, after obtaining the bone rotation distribution function of the current frame image according to the processed fusion probability and the corresponding bone rotation value in step 105, before outputting the bone rotation value according to the bone rotation distribution function of the current frame image in step 106, the method further includes:

Acquiring a gesture capturing gesture distribution function and a motion description amount of a previous frame image; estimating and obtaining a skeleton rotation value according to the motion description quantity of the previous frame image; and acquiring a skeletal rotation distribution function of the fused current frame image based on the gesture capturing gesture distribution function and the skeletal rotation value of the fused previous frame image and the skeletal rotation distribution function of the current frame image by Kalman filtering.

In a possible implementation manner, in step 105, the obtaining a bone rotation distribution function of the current frame image according to the processed fusion probability and the corresponding bone rotation value includes:

In a possible implementation manner, in step 106, the outputting the bone rotation value according to the bone rotation distribution function of the current frame image includes:

The hand capturing method according to the embodiment of the present application is described in detail below with reference to fig. 2.

In the first aspect, a gesture motion image of a person in front of a camera is acquired.

The common RGB camera is placed in front of the hand capturing person who can stand or sit down. The position of the camera should meet the requirement that the palm of the fingers of the left and right hands can be in the image when the maximum arm of the hand capturing person is unfolded. After the camera positions are placed, the hand-catcher can start gesture actions. Gesture actions include, but are not limited to: palm open, hand gestures ok, thumb down, thumb up, phone calls, numbers 1-9, heart comparing, finger open, finger close, pointing to the camera, etc.

In the second aspect, gesture types corresponding to each frame of image are recognized according to the collected RGB images, and different gesture motion semantic probabilities of people in front of the camera are predicted. Gesture semantic actions include: palm open, hand gestures ok, thumb down, thumb up, phone calls, numbers 1-9, heart comparing, finger open, finger close, pointing camera, etc. 20 action types. The result of this step in the present algorithm is a probability distribution of the gesture semantics of the current frame, such as the gesture open probability: 0.9, finger folding: 0.05, gesture ok:0.05, etc. 20 probabilities of different semantics.

By combining gesture recognition semantics, each gesture action is split into different semantics, so that each gesture action has different corresponding prediction modes and fusion modes, and the accuracy and the robustness of the whole gesture capturing system are improved.

In a third aspect, based on the result of the gesture recognition module, different gesture recognition maps different motion semantics, and a corresponding gesture capture module is selected according to the motion semantics obtained in the second aspect. And inputting the data acquired in the first step into a gesture capturing module, and outputting a gesture capturing result.

The embodiment of the application selects three gesture types with highest prediction probability, selects three corresponding gesture capturing modules aiming at different gesture types, and respectively acquires corresponding gesture capturing results. The gesture capture results include a bone rotation value and a matching value of the input image to the module (image module matching probability). At the time of subsequent fusion, if the matching value is larger, the bone rotation value accounts for a larger proportion in the final result.

In a fourth aspect, the hand skeletal joint rotation values output by the three different modules are fused with the matching values of the image module predicted by the hand capturing module and the gesture probability values output by the gesture recognition module. The method specifically comprises the following steps:

And multiplying the image module matching probability predicted by the gesture capturing module corresponding to the skeleton rotation values output by the three different modules by the gesture probability value output by the gesture recognition module to obtain three new probabilities, and normalizing the three probabilities to obtain three probability values with the sum equal to one. This step guarantees the accuracy of the probability of use at fusion: meanwhile, the hand joint rotation value with high probability is predicted by the gesture recognition module and the gesture capturing module, and the proportion is highest in the fusion process.

And fifthly, carrying out maximum likelihood function solving based on a Gaussian mixture model on the fusion probability and the corresponding bone rotation value obtained in the previous step to obtain a bone rotation distribution function of the current frame image, and calculating the mean value and the variance of the current observation distribution.

The method comprises the steps of fusing a gesture capturing gesture distribution function of a previous frame image of a current frame image based on Kalman filtering, obtaining a gesture capturing gesture distribution function obtained by a Gaussian mixture model of the current frame, estimating a bone rotation value based on a previous frame motion description amount, obtaining distribution of a result of the bone rotation value, and taking an average value of the distribution as final output of a system.

And combining the current hand capturing result with the gesture capturing result before the current moment, improving the smoothness of the overall gesture capturing effect in the time dimension, dynamically adjusting the fusion parameters of the gesture capturing process, and automatically removing jitter caused by noise. Based on the Kalman filtering method, the smoothness of the overall effect of hand capture is improved in the time dimension by combining the observation distribution and the prediction distribution, the fusion parameters of the hand capture process are dynamically adjusted, the shake caused by noise is automatically removed, and the anti-interference performance of the whole hand capture system is enhanced.

Because the system is used for ensuring the same real-time effect as the traditional gesture capturing glove, the input image of the system is only one frame, so that the gesture motion is easy to shake in time sequence when the gesture type is converted. A kalman filter-based fusion smoothing module is proposed for dynamically adjusting fusion smoothing coefficients in conjunction with a gesture capture type change process. The module realizes fusion and smoothing of gesture actions on time sequence, and increases smoothness of virtual character gesture capture and adaptability of scenes.

In summary, the embodiment of the application provides a hand capturing method, which is implemented by collecting a gesture image of a current frame; screening the probability of gesture semantics meeting the conditions according to a set probability threshold; inputting the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic, and obtaining the image module matching probability and skeleton rotation value of each gesture semantic; multiplying the image module matching probability and the skeleton rotation value of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic; normalizing the fusion probability of all gesture semantics, and obtaining a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value; and outputting a skeleton rotation value according to the skeleton rotation distribution function of the current frame image so as to drive the virtual hand skeleton motion according to the frame skeleton rotation value. On the premise of ensuring the capturing precision, the efficiency of the whole capturing process is obviously improved.

Based on the same technical concept, the embodiment of the application further provides a hand capturing system, as shown in fig. 3, the system includes:

the data acquisition module 301 is configured to acquire a current frame gesture image;

the classification prediction module 302 is configured to screen probabilities of gesture semantics meeting the conditions according to a set probability threshold;

The skeleton rotation value calculation module 303 is configured to input the gesture image of the current frame into a gesture capturing neural network corresponding to each gesture semantic, so as to obtain an image module matching probability and a skeleton rotation value of each gesture semantic;

The fusion probability calculation module 304 is configured to multiply the image module matching probability, the skeleton rotation value and the probability of each gesture semantic respectively to obtain a fusion probability of each gesture semantic;

the skeleton rotation distribution module 305 is configured to normalize the fusion probability of all gesture semantics, and obtain a skeleton rotation distribution function of the current frame image according to the processed fusion probability and the corresponding skeleton rotation value;

the bone rotation value output module 306 is further configured to output a bone rotation value according to a bone rotation distribution function of the current frame image, so as to drive the virtual hand bone motion according to the frame bone rotation value.

In one possible embodiment, the system further comprises: the data acquisition module 301 is further configured to acquire a gesture capturing gesture distribution function and a motion description amount of a previous frame image;

The bone rotation value calculation module 303 is further configured to estimate a bone rotation value according to the motion description amount of the previous frame image;

In one possible embodiment, the bone rotation distribution module 305 is specifically configured to:

The embodiment of the application also provides electronic equipment corresponding to the method provided by the embodiment. Referring to fig. 4, a schematic diagram of an electronic device according to some embodiments of the present application is shown. The electronic device 20 may include: a processor 200, a memory 201, a bus 202 and a communication interface 203, the processor 200, the communication interface 203 and the memory 201 being connected by the bus 202; the memory 201 stores a computer program executable on the processor 200, and the processor 200 executes the method according to any of the foregoing embodiments of the present application when the computer program is executed.

The memory 201 may include a high-speed random access memory (RAM: random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented through at least one physical port 203 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

Bus 202 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. The memory 201 is configured to store a program, and the processor 200 executes the program after receiving an execution instruction, and the method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200 or implemented by the processor 200.

The processor 200 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 200 or by instructions in the form of software. The processor 200 may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201, and in combination with its hardware, performs the steps of the above method.

The electronic device provided by the embodiment of the application and the method provided by the embodiment of the application have the same beneficial effects as the method adopted, operated or realized by the electronic device and the method provided by the embodiment of the application due to the same inventive concept.

The present application further provides a computer readable storage medium corresponding to the method provided in the foregoing embodiments, referring to fig. 5, the computer readable storage medium is shown as an optical disc 30, on which a computer program (i.e. a program product) is stored, where the computer program, when executed by a processor, performs the method provided in any of the foregoing embodiments.

It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.

The computer-readable storage medium provided by the above-described embodiments of the present application has the same advantageous effects as the method adopted, operated or implemented by the application program stored therein, for the same inventive concept as the method provided by the embodiments of the present application.

It should be noted that:

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may also be used with the teachings herein. The required structure for the construction of such devices is apparent from the description above. In addition, the present application is not directed to any particular programming language. It will be appreciated that the teachings of the present application described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in the creation means of a virtual machine according to an embodiment of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A hand capture method, the method comprising:

Collecting a gesture image of a current frame;

Multiplying the matching probability of the image module of each gesture semantic with the probability of each gesture semantic to obtain the fusion probability of each gesture semantic;

and outputting a bone rotation value according to a bone rotation distribution function of the current frame image so as to drive the virtual hand bone motion according to the bone rotation value.

2. The method of claim 1, wherein the obtaining the bone rotation distribution function of the current frame image according to the processed fusion probability and the corresponding bone rotation value comprises:

3. The method of claim 1, wherein outputting the bone rotation value according to the bone rotation distribution function of the current frame image comprises:

4. The method of claim 1, wherein after predicting probabilities of different gesture semantics from the current frame gesture image, the method further comprises:

5. A hand capture system, the system comprising:

The fusion probability calculation module is used for multiplying the matching probability of the image module of each gesture semantic with the probability of each gesture semantic respectively to obtain the fusion probability of each gesture semantic;

And the bone rotation value output module is also used for outputting bone rotation values according to a bone rotation distribution function of the current frame image so as to drive the virtual hand bone motion according to the bone rotation values.

6. The system of claim 5, wherein the bone rotation distribution module is configured to:

7. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor executes to implement the method according to any of the claims 1-4 when running the computer program.

8. A computer readable storage medium having stored thereon computer readable instructions executable by a processor to implement the method of any of claims 1-4.