CN116091875A - Model training method, living body detection method, electronic device, and storage medium - Google Patents

Model training method, living body detection method, electronic device, and storage medium Download PDF

Info

Publication number
CN116091875A
CN116091875A CN202310375684.2A CN202310375684A CN116091875A CN 116091875 A CN116091875 A CN 116091875A CN 202310375684 A CN202310375684 A CN 202310375684A CN 116091875 A CN116091875 A CN 116091875A
Authority
CN
China
Prior art keywords
network
network layer
branch
living body
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310375684.2A
Other languages
Chinese (zh)
Other versions
CN116091875B (en
Inventor
刘冲冲
付贤强
何武
朱海涛
户磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Dilusense Technology Co Ltd filed Critical Hefei Dilusense Technology Co Ltd
Priority to CN202310375684.2A priority Critical patent/CN116091875B/en
Publication of CN116091875A publication Critical patent/CN116091875A/en
Application granted granted Critical
Publication of CN116091875B publication Critical patent/CN116091875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application relates to the field of image recognition and discloses a model training method, a living body detection method, electronic equipment and a storage medium. The model training method comprises the following steps: performing feature extraction on the face image through a plurality of branch networks in the feature extraction network to obtain a plurality of face features; determining a plurality of first prediction probabilities based on the plurality of face features, and obtaining a second prediction probability based on the plurality of first prediction probabilities; performing iterative training on the feature extraction network; each branch network has one network layer as a specific network layer, and each specific network layer is configured in each iterative training process: the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused. The training method improves the accuracy, stability and efficiency of feature extraction of each branch network.

Description

Model training method, living body detection method, electronic device, and storage medium
Technical Field
The embodiment of the application relates to the technical field of image recognition, in particular to a model training method, a living body detection method, electronic equipment and a storage medium.
Background
The face image recognition technology is a very popular AI technology in recent years, and is widely used in production and living in various fields. Products employing facial image recognition techniques also typically require the use of live detection techniques to deny authorization for malicious attacks using props such as photographs, videos, masks, dummy models, head covers, etc.
The most widely used image living body detection technology at present takes a human face image as the input of a living body detection model to give a result of whether living body exists or not. However, the props available for malicious attacks are becoming ever-changing, and the performance of general biopsy techniques in dealing with entirely new attack types is far below expected. To address this problem, some biopsy methods design multiple models or multiple branches, each model or branch being responsible for handling a different attack type. However, the method needs to divide the attack types manually, relies on priori knowledge of human experts, has complex training process and long time consumption, has a certain subjective bias in dividing the attack types manually, and has larger possibility of misjudgment and potential safety hazard when processing the attack types beyond the priori knowledge.
Disclosure of Invention
The embodiment of the application aims to provide a model training method, a living body detection method, electronic equipment and a storage medium, wherein the configuration of a specific network layer in a plurality of branch networks enables a feature extraction network to automatically determine the type of a prosthesis responsible for processing of each branch network in the training process, so that the potential of each branch network is fully mined, and the accuracy, stability and efficiency of feature extraction of each branch network are greatly improved.
In order to solve the above technical problems, embodiments of the present application provide a model training method, including: extracting features of the face image through a plurality of branch networks included in the feature extraction network to obtain a plurality of face features; determining a plurality of first prediction probabilities that the face image belongs to a living body based on the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities; performing iterative training on the feature extraction network; wherein, each branch network has one network layer as a specific network layer, each specific network layer has the same layer position in the affiliated branch network and is configured in each iterative training process: the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused.
The embodiment of the application also provides a living body detection method, which comprises the following steps: inputting the face image to be detected into a trained feature extraction network to obtain a plurality of face features; obtaining a plurality of first prediction probabilities that the face image belongs to a living body according to the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities; when the second prediction probability is greater than or equal to a preset living body threshold value, determining that the face image to be detected is a living body; when the prediction probability is smaller than a preset living body threshold value, determining the face image to be detected as a prosthesis; the trained feature extraction network is obtained through the model training method in the embodiment.
The embodiment of the application also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method as mentioned in the above embodiments or to perform the living detection method as mentioned in the above embodiments.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the model training method mentioned in the above embodiments, or is capable of executing the living body detection method mentioned in the above embodiments.
The feature extraction network trained by the model training method comprises a plurality of branch networks, in each iterative training process, a specific network layer exists in each branch network, the layers of each specific network layer in the branch network are the same, the input features of the specific network layer comprise the output features of the previous network layer of the branch network, or the input features of the previous network layer of the branch network of at least one specific network layer in all the specific network layers are integrated. Therefore, after repeated iterative training, through the specific network layer in each branch network, the output characteristics of the middle network layer of other branch networks can be correlated to perform joint training on other branch networks while training each branch network according to the final output characteristics of each branch network, the design of the auxiliary training of each branch network reduces redundant training, so that the characteristic extraction network automatically determines the type of prosthesis which is responsible for processing each branch network in the training process, and the problem of false identification caused by the fact that the prior art depends on human subjective division of the type of prosthesis attack is solved. Meanwhile, by the configuration of a specific network layer, each branch network has the capability of distinguishing the living body from the prosthesis according to the output of the branch network, has the capability of distinguishing the living body from the prosthesis according to the output of part or all of the branch networks, fully digs the potential of each branch network, and greatly improves the accuracy, stability and efficiency of the feature extraction of each branch network.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
FIG. 1 is a flow chart of a model training method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of a feature extraction network according to an embodiment of the present application;
FIG. 3 is a flow chart of a method of in-vivo detection provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of each embodiment of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
The implementation details of the model training method of the present embodiment are exemplified below. The following is merely an implementation detail provided for ease of understanding and is not necessary to practice the present embodiments.
Embodiments of the present application relate to a model training method, as shown in fig. 1, including:
and step 101, extracting the characteristics of the face image through a plurality of branch networks included in the characteristic extraction network to obtain a plurality of face characteristics. Wherein, each branch network has one network layer as a specific network layer, each specific network layer has the same layer position in the affiliated branch network and is configured in each iterative training process: the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused.
Specifically, in this embodiment, face images in a sample set are input into a feature extraction network, so as to obtain a plurality of face features. The face images in the sample set comprise living body face images and prosthesis face images, and each face image corresponds to a label and is used for marking that the face image belongs to a living body or a prosthesis. The living body face images can be face images of different shooting angles, different wearing accessories and different age stages of the same person, or face images of different shooting angles, different wearing accessories and different age stages of the same person. The prosthetic face image may contain a variety of prosthetic types, such as: taking a photograph to obtain a prosthetic face image, taking a dummy model to obtain a prosthetic face image, taking a real person of a headset to obtain a prosthetic face image, and the like.
In this embodiment, the plurality of branch networks may be branch networks having the same network structure, or may be branch networks having different network structures. It will be appreciated that the specific values of the learnable parameters of the multiple branch networks after the final training is completed may be different, although the network structures of the multiple branch networks are identical. The plurality of branch networks having different network structures may be different in network type, such as: convolutional neural networks, residual neural networks, SVM (support vector machine) networks, and the like; the network types may be the same but the specific network structures are different, for example: the system comprises a neural network comprising different numbers of pooling layers, different numbers of splicing layers, different numbers of full-connection layers and the like, and the neural network internally provided with different connection relations; it is also possible that network configuration parameters are different, such as: neural networks with different convolution kernel sizes, neural networks with different learning rates, neural networks with different back propagation weight attenuation values, and the like.
Wherein each branch network has a specific network layer, and the specific network layer may be one network layer at any same network layer position in each branch network. Taking the feature extraction network including 4 branch networks as shown in fig. 2 as an example, the specific network layer may be the kth network layer of each branch network, and then the input features of the kth network layer (specific network layer) of each branch network may be two cases, where the first case is that the input features include two parts, one part is the output features of the kth-1 network layer in the branch network (indicated by solid arrows in fig. 2), and the other part is the feature after the output features of the previous network layer of at least one kth network layer in all the kth network layers are fused (indicated by addition of dashed arrows in fig. 2). The second case is where the input features are only the output features of the k-1 network layer of the branched network (indicated by solid arrows in fig. 2).
That is, at each iterative training, the inputs of a particular network layer must include the output characteristics of the previous network layer of the branched network in which it is located, and may also include: the output characteristics of the previous network layer of the specific network layer of any n branch networks, wherein the value range of n is 0-N, and N is the number of the branch networks.
Such as: when n=1 and any one of the branch networks is the branch network 1, the input of the kth network layer of the branch network 1 includes two identical output characteristics of the kth-1 network layers of the branch network 1; the inputs of the kth network layer of the branch network 2 include the output characteristics of the kth-1 network layer of the branch network 2, the output characteristics of the kth-1 network layer of the branch network 1. Similarly, other branched networks and so on.
And, for example: when n=2 and any two branch networks are branch network 2 and branch network 3, the input of the kth network layer of branch network 1 comprises the output characteristics of the kth-1 network layer of branch network 1, the characteristics after the output characteristics of the kth-1 network layer of branch network 2 and the output characteristics of the kth-1 network layer of branch network 3 are fused; the input of the kth network layer of the branch network 2 comprises the output characteristics of the kth-1 network layer of the branch network 2, the characteristics of the output characteristics of the kth-1 network layer of the branch network 2 and the characteristics of the output characteristics of the kth-1 network layer of the branch network 3 after being fused. Similarly, other branched networks and so on.
That is, when n=0, the input of each particular network layer is the output characteristic of the previous network layer of the branch network where it is located, i.e., each branch network trains alone, so that each branch network has the ability to distinguish between living and prosthesis according to the output of its own branch network. When n takes a value within the range of 1-N, the inputs to each particular network layer include: the output characteristics of the previous network layer of the branch network and the output characteristics of the previous network layer of at least one specific network layer in all specific network layers are integrated. The training of the branch network is performed to train other branch networks, so that the auxiliary training is achieved, the redundant training is reduced, and each branch network has the capability of distinguishing living bodies from prostheses according to the output of part or all of the branch networks.
It should be noted that the specific network layer cannot be the first network layer in the branched network, and it is understood that the input of the first network layer is a face image, not a feature vector, and there is no concept of the previous network layer, so the specific network layer must be a non-first layer (the value range of the specific network layer k is 2 to k, and k is the number of network layers included in each branched network). In addition, each network layer in each branch network can have other functions such as a feature shaping function, a feature scalar function, a feature dimension reduction function and the like, while having a feature extraction function. Of course, in order to reduce training complexity, it is preferable to set the number of network layers included in each branch network to be the same.
In addition, the inputs to each particular network layer are reconfigured each time training is iterated, i.e., each time a particular network layer input may be different.
Step 102, determining a plurality of first prediction probabilities that the face image belongs to the living body based on the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities.
Specifically, the plurality of first prediction probabilities are prediction results of the plurality of branch networks on the input face image, and the first prediction probabilities can be obtained according to an activation function commonly used in deep learning, for example: the sigmoid activation function, the tanh activation function, the ReLU activation function, the leak ReLU activation function, and the like, and may also input a plurality of face features into a classifier commonly used in deep learning to obtain a plurality of first prediction probabilities.
And obtaining a second prediction probability of the face image belonging to the living body through the plurality of first prediction probabilities, namely a probability value finally used for living body detection. Specifically, the second prediction probability may be obtained by calculating an average value of the plurality of first prediction probabilities, the second prediction probability may be obtained by multiplying the plurality of first prediction probabilities, the maximum value and the minimum value may be removed from the plurality of first prediction probabilities, and the second prediction probability may be obtained by calculating an average value of the remaining first prediction probabilities.
And 103, performing iterative training on the feature extraction network.
Specifically, a plurality of face images comprising a prosthesis and a living body are input to train a feature extraction network, a specific network layer in the feature extraction network is configured each time in the training process, and training is carried out according to a training method commonly used for deep learning (such as a gradient descent method, a Newton algorithm, a conjugate gradient method and a Levenberg-Marquardt algorithm) to obtain a converged feature extraction network.
In this embodiment, the face features output by each branch network both cover the output of each network layer before a specific network layer in the branch network itself and cover the output of each network layer before a specific network layer in other branch networks in the feature extraction process, so that the face features finally output by each branch network are different and related to each other. Therefore, a plurality of facial features which are different from each other and are associated with each other can automatically determine the type of the prosthesis which is more suitable for processing of the corresponding branch network through training.
In an embodiment, the feature extraction network further comprises: an output network and a converged network; the configuration process of each specific network layer in each iterative training process comprises the following steps: outputting a plurality of random parameters which obey Bernoulli distribution and are the same as the number of the branch networks through an output network, and selecting zero or at least one branch network according to the plurality of random parameters; and fusing the output characteristics of the previous network layer of the specific network layer in the selected zero or at least one branch network through a fusion network aiming at each specific network layer to obtain the fused characteristics.
In this embodiment, the output network is configured to output a plurality of random parameters α that follow Bernoulli distribution and are the same in the number of branch networks i (i=1, 2, … …, N is the number of branch networks), i.e. the values of the plurality of random parameters are 0 or 1,0 indicates that a certain branch network is not selected, and 1 indicates that a certain branch network is selected. Taking the feature extraction network including 4 branch networks as shown in fig. 2 as an example, if the 4 random parameters output by the output network are { α } 1 =0,α 2 =1,α 3 =0,α 4 =1 }, then this indicates that the 2 nd and 4 th branch networks are selected. If the 4 random parameters output by the output network are { alpha } 1 =0,α 2 =0,α 3 =0,α 4 =0 }, then it indicates that none of the 4 branch networks is selected.
And for the selected branch network, fusing the output characteristics of the previous network layer of the specific network layer in the selected branch network through a fusion network to obtain the fused characteristics.
Further, in each iterative training process, probability values corresponding to the plurality of random parameters are super parameters in the training process. Since a plurality of random parameters obey Bernoulli distribution, P (α i =1)=p,P(α i =0) =1-p, p being the hyper-parameter during training. It can be understood that the value of the random parameter output by the output network can be indirectly adjusted by training the adjustment probability value p, which is equivalent to indirectly adjusting the branch network selected each time. Of course, even if the probability values are the same, the values of the corresponding random parameters may be different, such as: the probability values corresponding to the random parameter {0,1,0,1} and the random parameter {1, 0} are equal, but the specific selected branch networks are different.
In addition, the vector dimensions of the output features of the previous network layer of the particular network layer in each branched network are the same. Specifically, in order to facilitate the feature fusion of output, the previous network layer of the specific network layer in each branch network has the feature extraction and the feature shaping function, so that the vector dimensions of the output features of the previous network layer are the same.
In one embodiment, iteratively training a feature extraction network includes: constructing a loss function based on the second predictive probability; for a plurality of learnable parameters of the feature extraction network, determining a neighborhood range by taking the minimum loss value of the loss function as the center, and acquiring a plurality of parameter offsets of the plurality of learnable parameters corresponding to the maximum loss value in the neighborhood range; updating the plurality of learnable parameters by adopting a plurality of parameter offsets to obtain a plurality of offset learnable parameters; and training and updating a plurality of offset learning parameters of the feature extraction network according to the loss function until the feature extraction network converges.
It should be noted that, in the conventional training method, only the loss value of the loss function reaches a minimum value at a certain point, and whether the minimum value is stable is not concerned, so that the input of the network is slightly disturbed (for example, the type of the prosthesis of the input face image does not appear in the previous training set, the quality of the input face image is poor, etc.), and the network may miss the minimum value point to obtain a larger loss value. That is, when the conventional training method is used for training the feature extraction network and applying the feature extraction network to prediction, the problem of false facial image recognition of a brand new prosthesis type which does not appear in the training set is easy to appear, and the network prediction is unstable.
Based on the above, the embodiment trains the offset learning parameters of the feature extraction network obtained by calculating the maximum loss value in the neighborhood range, so that the maximum loss value in the neighborhood range can reach a very small value, and even if the input of the feature extraction network is disturbed, the feature extraction network trained by the method can also show high stability and strong robustness, and can accurately identify the face image of the brand new prosthesis type.
Specifically, a minimum loss value of the loss function is obtained, a neighborhood range is determined according to the minimum loss value, and the neighborhood range can be automatically adjusted according to requirements on training time, network stability and the like. And then acquiring a plurality of parameter offsets of a plurality of learnable parameters corresponding to the maximum loss value in the neighborhood range, namely, the parameter offsets can be simply understood as the difference value of the learnable parameters corresponding to the maximum loss value and the learnable parameters corresponding to the minimum loss value in the neighborhood range, updating the learnable parameters according to the parameter offsets to obtain offset learnable parameters, and training and updating the offset learnable parameters by using a loss function.
Wherein the loss function constructed based on the second predictive probability is as follows:
Figure SMS_1
wherein loss is loss function, r and m are super parameters larger than 0, pred (b) A second predictive probability, y, for the b-th face image to belong to the living body (b) Label belonging to living body or prosthesis for b-th face image, y (b) When=0, the b-th face image belongs to a living body, y (b) When=1, the B-th face image belongs to a prosthesis, and b=1, 2, … …, B is the number of face images.
The calculation formula of the parameter offset is as follows:
Figure SMS_2
the formula for updating the offset learnable parameters is as follows:
params=params+offset
wherein offset is a parameter offset, q is a super parameter greater than 0,
Figure SMS_3
gradient of Loss function Loss with respect to the learnable parameter params 2 The L2 norm is expressed.
According to the model training method provided by the embodiment of the application, after repeated iterative training, through the specific network layer in each branch network, the output characteristics of the middle network layer of other branch networks can be associated to perform joint training on other branch networks while training the respective branch network according to the final output characteristics of each branch network, and the design of the auxiliary training of each branch network reduces redundant training, so that the characteristic extraction network automatically determines the type of prosthesis responsible for processing of each branch network in the training process, and the problem of false identification caused by the fact that the prior art depends on human subjective division of the type of prosthesis attack is solved. Meanwhile, by the configuration of a specific network layer, each branch network has the capability of distinguishing the living body from the prosthesis according to the output of the branch network, has the capability of distinguishing the living body from the prosthesis according to the output of part or all of the branch networks, fully digs the potential of each branch network, and greatly improves the accuracy, stability and efficiency of the feature extraction of each branch network.
Embodiments of the present application relate to a living body detection method, as shown in fig. 3, including:
step 201, inputting the face image to be detected into a trained feature extraction network to obtain a plurality of face features.
Step 202, obtaining a plurality of first prediction probabilities that the face image belongs to the living body according to the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities.
In this embodiment, the trained feature extraction network is obtained by the model training method described in the above embodiment. The first prediction probability may be obtained by calculation according to an activation function commonly used in deep learning, for example: the sigmoid activation function, the tanh activation function, the ReLU activation function, the leak ReLU activation function, and the like, and may also input a plurality of face features into a classifier commonly used in deep learning to obtain a first prediction probability.
And obtaining a second prediction probability of the face image belonging to the living body through the plurality of first prediction probabilities, namely a probability value finally used for living body detection. The second prediction probability may be obtained by calculating an average value of the plurality of first prediction probabilities, or may be obtained by multiplying the plurality of first prediction probabilities, or the second prediction probability may be obtained by removing a maximum value and a minimum value from the plurality of first prediction probabilities and calculating an average value of the remaining first prediction probabilities.
Of course, the methods of calculating the first prediction probability and the second prediction probability in the living body detection stage and the training stage are consistent.
Step 203, when the second prediction probability is greater than or equal to a preset living body threshold value, determining that the face image to be detected is a living body; and when the prediction probability is smaller than a preset living body threshold value, determining the face image to be detected as a prosthesis.
In this embodiment, the preset living body threshold value may be set by self-adjustment according to requirements for identification accuracy and differences in application scenario.
The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.
Embodiments of the present application relate to an electronic device, as shown in fig. 4, including:
at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform model training as mentioned in the above embodiments or to perform the living detection method as mentioned in the above embodiments.
The electronic device includes: one or more processors 301, and a memory 302, one processor 301 being illustrated in fig. 4. The processor 301, the memory 302 may be connected by a bus or otherwise, for example in fig. 4. The memory 302 is a non-volatile computer readable storage medium, and may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as algorithms corresponding to the processing strategies in the strategy space in the embodiments of the present application, are stored in the memory 302. The processor 301 executes various functional applications of the device and data processing, i.e., implements the above-described model training method or living detection method, by running nonvolatile software programs, instructions, and modules stored in the memory 302.
Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store a list of options, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some implementations, the memory 302 may optionally include memory located remotely from the processor 301, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 302 that, when executed by the one or more processors 301, perform the model training method in any of the above-described embodiments, or are capable of performing the in-vivo detection method mentioned in the above-described embodiments.
The product may perform the method provided by the embodiment of the present application, and have the corresponding functional module and beneficial effect of performing the method, and technical details not described in detail in the embodiment of the present application may be referred to the method provided by the embodiment of the present application.
Embodiments of the present application relate to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of implementing the present application and that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (10)

1. A method of model training, comprising:
extracting features of the face image through a plurality of branch networks included in the feature extraction network to obtain a plurality of face features;
determining a plurality of first prediction probabilities that the face image belongs to a living body based on the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities;
performing iterative training on the feature extraction network;
wherein, each branch network has one network layer as a specific network layer, each specific network layer has the same layer position in the affiliated branch network and is configured in each iterative training process:
the input features of each specific network layer comprise output features of a previous network layer of the branched network, or further comprise features after the output features of the previous network layer of the branched network of at least one specific network layer in all the specific network layers are fused.
2. The model training method of claim 1, wherein the feature extraction network further comprises: an output network and a converged network;
the configuration process of each specific network layer in each iterative training process comprises the following steps:
outputting a plurality of random parameters which obey Bernoulli distribution and are the same as the number of the branch networks through the output network, and selecting zero or at least one branch network according to the plurality of random parameters;
and fusing output characteristics of a previous network layer of the selected zero or at least one specific network layer in the branch network through the fusion network aiming at each specific network layer to obtain the fused characteristics.
3. The model training method according to claim 2, wherein in each iterative training process, probability values corresponding to the plurality of random parameters are super-parameters in the training process.
4. The model training method of claim 1, wherein the vector dimensions of the output features of the network layer preceding the particular network layer in each branched network are the same.
5. The model training method according to claim 1, wherein the obtaining a second prediction probability that the face image belongs to a living body based on the plurality of first prediction probabilities includes:
multiplying the plurality of first prediction probabilities to obtain a second prediction probability that the face image belongs to a living body.
6. The model training method according to any one of claims 1-5, characterized in that the iterative training of the feature extraction network comprises:
constructing a loss function based on the second predictive probability;
for a plurality of learnable parameters of a feature extraction network, determining a neighborhood range by taking a minimum loss value of the loss function as a center, and acquiring a plurality of parameter offsets of the plurality of learnable parameters corresponding to a maximum loss value in the neighborhood range;
updating the plurality of learnable parameters by adopting the plurality of parameter offsets to obtain a plurality of offset learnable parameters;
training and updating a plurality of offset learnable parameters of the feature extraction network according to the loss function until the feature extraction network converges.
7. The model training method of claim 6, wherein the parameter offset is calculated by the following formula:
Figure QLYQS_1
the offset learnable parameter is calculated by the following formula:
params=params+offset
wherein offset is a parameter offset, q is a super parameter greater than 0,
Figure QLYQS_2
gradient of Loss function Loss with respect to the learnable parameter params 2 The L2 norm is expressed.
8. A living body detecting method, characterized by comprising:
inputting the face image to be detected into a trained feature extraction network to obtain a plurality of face features;
obtaining a plurality of first prediction probabilities that the face image belongs to a living body according to the plurality of face features, and obtaining a second prediction probability that the face image belongs to the living body based on the plurality of first prediction probabilities;
when the second prediction probability is greater than or equal to a preset living body threshold value, determining that the face image to be detected is a living body; when the prediction probability is smaller than a preset living body threshold value, determining the face image to be detected as a prosthesis;
wherein the trained feature extraction network is obtained by the model training method of any one of claims 1-7.
9. An electronic device, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 7 or the in vivo detection method of claim 8.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the model training method of any one of claims 1 to 7 or implements the living detection method of claim 8.
CN202310375684.2A 2023-04-11 2023-04-11 Model training method, living body detection method, electronic device, and storage medium Active CN116091875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310375684.2A CN116091875B (en) 2023-04-11 2023-04-11 Model training method, living body detection method, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310375684.2A CN116091875B (en) 2023-04-11 2023-04-11 Model training method, living body detection method, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
CN116091875A true CN116091875A (en) 2023-05-09
CN116091875B CN116091875B (en) 2023-08-29

Family

ID=86199547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310375684.2A Active CN116091875B (en) 2023-04-11 2023-04-11 Model training method, living body detection method, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN116091875B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993220A (en) * 2019-03-23 2019-07-09 西安电子科技大学 Multi-source Remote Sensing Images Classification method based on two-way attention fused neural network
CN110348322A (en) * 2019-06-19 2019-10-18 西华师范大学 Human face in-vivo detection method and equipment based on multi-feature fusion
US20190347823A1 (en) * 2018-05-10 2019-11-14 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting living body, system, electronic device, and storage medium
US20200210773A1 (en) * 2019-01-02 2020-07-02 Boe Technology Group Co., Ltd. Neural network for image multi-label identification, related method, medium and device
CN111709409A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and medium
CN112766158A (en) * 2021-01-20 2021-05-07 重庆邮电大学 Multi-task cascading type face shielding expression recognition method
CN113792713A (en) * 2021-11-16 2021-12-14 北京的卢深视科技有限公司 Model training method, face recognition model updating method, electronic device and storage medium
CN114333011A (en) * 2021-12-28 2022-04-12 北京的卢深视科技有限公司 Network training method, face recognition method, electronic device and storage medium
CN115131858A (en) * 2022-06-27 2022-09-30 合肥的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190347823A1 (en) * 2018-05-10 2019-11-14 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting living body, system, electronic device, and storage medium
US20200210773A1 (en) * 2019-01-02 2020-07-02 Boe Technology Group Co., Ltd. Neural network for image multi-label identification, related method, medium and device
CN109993220A (en) * 2019-03-23 2019-07-09 西安电子科技大学 Multi-source Remote Sensing Images Classification method based on two-way attention fused neural network
CN110348322A (en) * 2019-06-19 2019-10-18 西华师范大学 Human face in-vivo detection method and equipment based on multi-feature fusion
CN111709409A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and medium
CN112766158A (en) * 2021-01-20 2021-05-07 重庆邮电大学 Multi-task cascading type face shielding expression recognition method
CN113792713A (en) * 2021-11-16 2021-12-14 北京的卢深视科技有限公司 Model training method, face recognition model updating method, electronic device and storage medium
CN114333011A (en) * 2021-12-28 2022-04-12 北京的卢深视科技有限公司 Network training method, face recognition method, electronic device and storage medium
CN115131858A (en) * 2022-06-27 2022-09-30 合肥的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Also Published As

Publication number Publication date
CN116091875B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN111164601B (en) Emotion recognition method, intelligent device and computer readable storage medium
US11361585B2 (en) Method and system for face recognition via deep learning
CN109410974B (en) Voice enhancement method, device, equipment and storage medium
CN111476709B (en) Face image processing method and device and electronic equipment
CN109492627B (en) Scene text erasing method based on depth model of full convolution network
KR102374747B1 (en) Method and device to recognize object
CN107832700A (en) A kind of face identification method and system
JP6798614B2 (en) Image recognition device, image recognition method and image recognition program
JP7257756B2 (en) Image identification device, image identification method, learning device, and neural network
JP2011113550A (en) Apparatus, method and system for processing information, program and data structure
CN113657195A (en) Face image recognition method, face image recognition equipment, electronic device and storage medium
JP7425362B2 (en) Learning method, high resolution method, learning device and computer program
JP2011181016A (en) Discriminator creation device, method and program
CN116091875B (en) Model training method, living body detection method, electronic device, and storage medium
CN104573737A (en) Feature point locating method and device
CN111461303B (en) Digital core reconstruction method and system based on generation of antagonistic neural network
CN115908260B (en) Model training method, face image quality evaluation method, equipment and medium
CN111414817A (en) Face recognition system and face recognition method
CN112487903B (en) Gait data generation method and device based on countermeasure network
CN112613488B (en) Face recognition method and device, storage medium and electronic equipment
US20230055488A1 (en) Method and system for extracting and classifying manufacturing features from three-dimensional model of product
CN112036446B (en) Method, system, medium and device for fusing target identification features
CN115131858A (en) Model training method, face recognition method, electronic device and storage medium
KR102186767B1 (en) Method and Device for Detecting Feature Point of Face Using Learning
CN111160219B (en) Object integrity evaluation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant