CN114357417A

CN114357417A - Self-learning dynamic voiceprint identity verification method based on unknown corpus

Info

Publication number: CN114357417A
Application number: CN202111677950.4A
Authority: CN
Inventors: 洪峰
Original assignee: Shanghai Acoustics Laboratory Chinese Academy Of Sciences
Current assignee: Shanghai Acoustics Laboratory Chinese Academy Of Sciences
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-15

Abstract

The invention discloses a self-learning dynamic voiceprint identity verification method based on unknown corpus, which receives and processes general registration character strings and a small amount of special registration character strings of registrants in each trained discriminator according to a regular format of the registration corpus to construct a registration voiceprint library. Preprocessing according to the rule format of the verified corpus, and obtaining a first embedded code by the numeric string based on a text correlation identification technology after the preset condition is passed; the special string will get the second embedded code based on the text-independent recognition technique to see if there is a match. And updating the text irrelevant discriminator, the general character string and number string text relevant discriminator and the special character string and text relevant voiceprint discriminator according to the factors such as cumulant, performance improvement degree, number of input corpora and the like saved in the text relevant database. The method has better performance, improves the safety based on unknown corpus, and dynamically increases the automatic improvement performance based on the model and the database.

Description

Self-learning dynamic voiceprint identity verification method based on unknown corpus

Technical Field

The invention belongs to the field of identity verification of voiceprint technology, and particularly relates to a self-learning dynamic voiceprint identity verification method based on unknown corpus.

Background

Voiceprint recognition is a biological feature recognition technology for identity authentication by using the voiceprint features of people, and has the advantages of low acquisition cost, simplicity in implementation, good experience and the like. More importantly, the voiceprint features are unique low-sensitivity information in all biological features, and compared with other biological feature identification technologies such as face identification and fingerprint identification, the voiceprint identification has less personal privacy information and is easier to accept by users. The voiceprint recognition can be divided into three types of text-independent type, text-dependent type and text-prompt type according to the recognition content. However, the software based on the three recognition schemes at present lacks an anti-fraud function, so that an imposter can possibly fake the voice close to the target speaker through various means, and thus fraud attacks can be carried out on the voiceprint recognition system, for example, such a safety hazard exists in a voice lock released by WeChat and Payment treasures. For text-related voiceprint recognition, the commonly used fixed authentication codes are easily subject to voice recording fraud, i.e., a masquerier plays a recorded voice of a target user and verifies the voice through a voiceprint. Although the text prompt type voiceprint recognition has a certain anti-fraud function, the limit of the text set is large, and the security of the recognition system is insufficient if the text set is too small; if the text set is too large, the user needs to enter a large amount of audio, and the experience is reduced. In addition, the voice print recognition of the text prompt type also lacks the function of anti-sound recording and splicing fraud, for example, the pronunciation of the fixed input 0-9 number or A-Z letter by the user is collected completely, the voice of a single number or letter is spoken by the target speaker and forms a text voice library, and when the voice print is confirmed, the pronunciation of the target verification code can be quickly spliced according to the text voice library by using a computer program.

Therefore, it is urgently needed to provide a technology, which can improve the security of the voiceprint recognition system on the basis of not involving more personal privacy and not reducing the user experience, so as to effectively solve the problem of poor security in the identity authentication scheme based on voiceprint recognition in the process of various software logins or financial payments.

Disclosure of Invention

The invention aims to provide a self-learning dynamic voiceprint identity verification method based on unknown corpus so as to solve the technical problem of poor voiceprint identification safety.

In order to solve the problems, the technical scheme of the invention is as follows:

a self-learning dynamic voiceprint identity authentication method based on unknown corpus is provided with a dynamic voiceprint identity authentication system, and the method comprises the following steps:

a1: receiving and processing a general registration character string and a special registration character string of a registrant in a preset discriminator according to a registration corpus rule format, and constructing a registration voiceprint library; the preset discriminators comprise 1 text irrelevant discriminator, 1 general character string and number string text relevant discriminator and a plurality of special character string and text relevant voiceprint discriminators;

a2: according to the regular format of the verification corpus, carrying out identification and binary segmentation on the input verification corpus to obtain a general verification character string and a special verification character string, carrying out identification and comparison through respective voice identification engines, if the general verification character string passes the identification and comparison, preprocessing the general verification character string and sending the preprocessed general verification character string to a general character string and number string text correlation discriminator, and obtaining a first embedded code of the general character string and number string text correlation discriminator based on a text correlation identification technology; preprocessing the special verification character string, sending the special verification character string to a text-independent discriminator or a special character string text-dependent voiceprint discriminator according to whether the special verification character string has a corresponding special character string text-dependent voiceprint discriminator and whether at least one registration embedded code exists, and obtaining a second embedded code of the text-independent discriminator or the text-dependent voiceprint discriminator based on a text-independent identification technology;

a3: updating a general character string and number string text correlation discriminator according to the cumulant of general verification character strings stored in a text correlation database and the performance improvement degree which simultaneously meet a certain threshold value;

updating a text related corpus and a special character string text related voiceprint discriminator if the number of inputted corpora persons and the number of corpora stored in a text related database according to the special verification character string simultaneously meet a certain threshold;

and updating the text-independent discriminator when the cumulant and the performance improvement degree of the special verification character string stored in the text-independent corpus simultaneously meet a certain threshold.

Further preferably, in the verification process of step a2, the words not forming the special string text-related voiceprint discriminator are used, and after meeting a certain threshold according to the data accumulation amount and the performance improvement degree, and after data amplification and robust processing, the words are provided with a text-related corpus required by a training model not forming the special string text-related voiceprint discriminator.

Specifically, step a1 includes:

s11: inputting a general registration character string and a special registration character string to a dynamic voiceprint identity verification system, wherein the general registration character string and the special registration character string are respectively distinguished by front-end audio features formed by a general character string digital string text correlation discriminator and a special character string text correlation voiceprint discriminator, and then are stored in a text correlation corpus;

s12: inputting a verification corpus, carrying out binary segmentation on the verification corpus to obtain a general verification character string and a special verification character string, and obtaining a first embedded code through a text correlation identification technology, wherein the first embedded code can be extracted by a general character string numeric string text correlation discriminator, and the second embedded code can be extracted by a special character string text correlation voiceprint discriminator based on the text correlation identification technology.

Specifically, step a2 includes:

s2: inputting a dynamic verification corpus given by the dynamic voiceprint identity verification system according to a dynamic verification rule of the dynamic voiceprint identity verification system, performing binary segmentation on the dynamic verification corpus to obtain a general dynamic verification character string and a special dynamic verification character string, respectively performing identification and comparison through respective voice recognition engines, judging that the general dynamic verification character string is subjected to preprocessing means such as speed testing, rearrangement, filling reconstruction and the like in sequence if the general dynamic verification character string passes the binary verification corpus, and performing subsequent processing in step S3, judging that the special dynamic verification character string is a special dynamic verification character string, judging whether the general dynamic verification character string is homophonic through the voice recognition engine, and performing subsequent processing in step S4 if the special dynamic verification character string is a special dynamic verification character, wherein the general dynamic verification character string is composed of a plurality of numbers, and the special dynamic verification character is composed of Chinese words, English letters or phrases;

s3: sending the numeric string into a preset general character string and numeric string text correlation discriminator to discriminate with a preset threshold value, if the discrimination is successful, sending the numeric string into a text irrelevant corpus and a text relevant database at the same time to store, otherwise, determining that the verification is failed and stopping;

s4: judging whether the special dynamic verification character string has a corresponding special character string text-related voiceprint discriminator and whether the text-related corpus has at least 1 registration corpus of the special dynamic verification character string to be verified, if both exist, entering step S5, if the corresponding special character string text-related voiceprint discriminator does not exist or both do not exist, entering step S6, if the corresponding special character string text-related voiceprint discriminator exists but is not activated and at least 1 registration corpus exists in the text-related corpus, entering step S7;

s5: sending the special dynamic verification character strings into a corresponding special character string text related voiceprint discriminator for similarity discrimination, if the similarity exceeds a preset threshold value, successfully judging, sending the special dynamic verification character strings into a text unrelated corpus and a text related database for storage, and otherwise, determining that the verification fails and stopping;

s6: sending the special dynamic verification character string into a text-independent discriminator to carry out similarity discrimination, if the similarity exceeds a preset threshold value, if the similarity is successfully judged, sending a text-independent corpus and a text-dependent corpus to be stored, otherwise, determining that the verification fails and stopping;

s7: sending the special dynamic verification character string into a text irrelevant discriminator to carry out similarity discrimination, if the similarity exceeds a preset threshold value, sending the special dynamic verification character string into a text irrelevant corpus to store if the similarity is successful, carrying out post-processing on the text relevant corpus, activating the corresponding special character string text relevant voiceprint discriminator, adding at least one registration language material which can be discriminated by the special character string text relevant voiceprint discriminator, adding a new one for extracting a registration code, and if the similarity is not successful, confirming that the verification is failed and stopping.

Further preferably, in step S2, before the segmenting the dynamic verification corpus, the method further includes the following steps:

b1: performing environmental voice quality detection on the dynamic verification corpus to judge the environmental noise level, if the signal-to-noise ratio obtained by calculation is compared with a preset threshold, if the signal-to-noise ratio is lower than the preset threshold, entering the step B2, otherwise, entering the step B3;

b2: prompting to input the dynamic verification corpus again, returning to the step B1, entering the step B2 for N +1, N being 0, and if N >3, determining that the verification fails and stopping, wherein the judgment condition of N >3 is adjustable;

b3: clearing N, and converting the dynamic verification corpus into a wav format to be segmented;

further, in step B1, noise processing can be performed on the dynamic verification corpus before the ambient voice quality detection is performed.

Wherein, step S2 specifically includes the following steps:

s21: performing digital recognition on the dynamic verification character string voice through a voice recognition engine, judging through a voice speed detector, if the recognition is passed and the voice speed is normal, entering a step S22, and if the recognition is not passed and the voice speed is normal, determining that the verification is failed and stopping;

s22: completely segmenting the dynamic verification character string to obtain a plurality of number segments only containing one number;

s23: rearranging the number fields based on the arrangement sequence of the general registration character strings;

s24: and adjusting and filling up missing corpora according to a priori rule, and filling up and reconstructing the arranged digital sections to obtain a digital string.

Wherein, step S3 specifically includes the following steps:

s31: sending the numeric string into a general character string and numeric string text correlation discriminator for discrimination to generate a first verification embedded code, solving the first verification embedded code and the first embedded code to obtain numeric string similarity for voiceprint matching, if the numeric string similarity exceeds a preset threshold value, judging to successfully enter a step S32, otherwise, judging to be failed in verification and stopping;

s32: and sending the digital strings into a text-related corpus and a text-unrelated corpus for storage, and calculating the first verification embedded code and the rest of first embedded codes to form a new first embedded code.

Wherein, step S5 specifically includes the following steps:

s51: sending the special dynamic verification character string into a corresponding special character string text related voiceprint discriminator to generate a second verification embedded code, solving the second verification embedded code and the second embedded code to obtain the similarity of the special dynamic verification character string so as to perform voiceprint matching, if the similarity of the special dynamic verification character string exceeds a preset threshold value, judging that the step S52 is successful, otherwise, confirming that the verification is failed and stopping;

s52: and sending the special dynamic verification character string into a text-independent corpus and a text-related corpus for storage, and calculating the second verification embedded code and other second embedded codes to form a new second embedded code.

Wherein, step S6 specifically includes the following steps:

s61: sending the special dynamic verification character string to the text-independent discriminator to generate a second verification embedded code, and solving the second verification embedded code and the second embedded code on a statement level or a frame level to acquire the similarity of the special dynamic verification character string;

s62: performing voice recognition engine judgment on the special dynamic verification character string, if the special dynamic verification character string meets a threshold value, entering the step S63, otherwise, determining that the verification fails and stopping; if the verifier registers the corpus, directly entering the step S63;

s63: performing voiceprint matching on the special dynamic verification character string, if the matching is successful, entering the step S64, otherwise, determining that the verification is failed and stopping;

s64: sending the special dynamic verification character string into a text-independent corpus for storage, generating special character string-related corpora and sending the special character string-related corpora into a temporary corpus, and forming the special character string-related corpus for training a new related discriminator when the length of the special character string-related corpora in the temporary corpus and the number of the people who input the test corpora exceed preset thresholds;

s65: and sending the special character string related language material into a text related language material base for storage, and training the special character string related language material to form a new special character string text related voiceprint discriminator.

Wherein, step S7 specifically includes the following steps:

s71: sending the special dynamic verification character string into a text-independent discriminator to generate a second verification embedded code, solving the second verification embedded code and the second embedded code to obtain the similarity of the special dynamic verification character string, entering a step S72 if a preset threshold value is met, otherwise, determining that the verification fails and stopping;

s72: sending the special dynamic verification character string into a text-independent language database for storage, and carrying out text-dependent language database;

s73: activating a corresponding special character string text related voiceprint discriminator in the text related corpus, and adding at least one registration corpus which can be discriminated by the special character string text related voiceprint discriminator.

Further preferably, step a3 further includes:

for a text related corpus, training all corpora of the text related corpus in a countertraining mode by a specific verification artificial data organization mode to form a related voiceprint synthesis network; the existing corpus text is used as input, audio is formed to be used as an amplified corpus file, and a noise signal is added to form a new corpus file in a text related corpus.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects:

the invention combines the voice recognition technology and the voiceprint recognition technology, and has lower error acceptance rate and error rejection rate.

The invention adopts a new registration and verification mode which is divided into a general + special (for example, number + Chinese character) verification code mode, adds certain visual interference, combines a voiceprint recognition technology, improves the safety, fully exerts the advantages of text-related and text-unrelated voiceprint recognition, and has the functions of anti-recording fraud and anti-recording splicing fraud. In addition, the method is based on a large amount of unknown corpora, and the safety is improved to a great extent.

The invention visually adds noise to the input verification corpus, thereby improving the safety and the theft resistance.

A text irrelevant discriminator, a general character string and numeric string text relevant discriminator and a special character string and text relevant voiceprint discriminator are independently discriminated together; in addition, the model and the database are dynamically and dynamically increased and updated, the automatic updating and self-learning functions are realized, the identification system can be continuously updated in the using process of a user, and the identification capability of a target speaker and the defense capability of an imposter are improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.

FIG. 1 is a flow chart of a self-learning dynamic voiceprint authentication method based on unknown corpus according to the present invention;

FIG. 2 is a flow chart illustrating an actual process of the self-learning dynamic voiceprint authentication method based on unknown corpus according to the present invention;

FIG. 3 is a diagram illustrating a registration string according to the present invention;

FIG. 4 is a schematic diagram of a validation string according to the present invention;

FIG. 5 is a diagram illustrating a relationship between corpora of the model according to the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".

The following describes the self-learning dynamic voiceprint authentication method based on unknown corpus in detail with reference to the accompanying drawings and specific embodiments. Advantages and features of the present invention will become apparent from the following description and from the claims.

Examples

Referring to fig. 1 and fig. 2, the embodiment provides a self-learning dynamic voiceprint authentication method based on unknown corpus, which is configured in a dynamic voiceprint authentication system for use and can perform unlocking and opening functions through sound.

The method is as follows A1: receiving and processing a general registration character string and a special registration character string of a registrant in a preset discriminator according to a registration corpus rule format, and constructing a registration voiceprint library; the preset discriminators comprise 1 text irrelevant discriminator, 1 general character string and number string text relevant discriminator and a plurality of special character string and text relevant voiceprint discriminators.

A2: according to the regular format of the verification corpus, carrying out identification and binary segmentation on the input verification corpus to obtain a general verification character string and a special verification character string, carrying out identification and comparison through respective voice identification engines, if the general verification character string passes the identification and comparison, preprocessing the general verification character string and sending the preprocessed general verification character string to a general character string and number string text correlation discriminator, and obtaining a first embedded code of the general character string and number string text correlation discriminator based on a text correlation identification technology; and preprocessing the special verification character string, sending the special verification character string to a text-independent discriminator or a special character string text-dependent voiceprint discriminator according to whether the special verification character string has a corresponding special character string text-dependent voiceprint discriminator and whether at least one registration embedded code exists, and obtaining a second embedded code of the text-independent discriminator or the text-dependent voiceprint discriminator based on a text-independent recognition technology.

and updating the text related corpus and the special character string text related voiceprint discriminator when the number of the inputted corpora persons and the number of the corpora stored in the text related database according to the special verification character string simultaneously meet a certain threshold.

The present embodiment will now be described based on practical applications

First, referring to fig. 3 and 4, before identity verification, related information needs to be entered, namely, a registration phase.

In steps S11 and S12, the user needs to input a registration string into the system, where the registration string is corpus information, and the registration string includes a general registration string and a special registration string. Specifically, the user only needs K groups of registration character strings (generally K is less than or equal to 5) when registering, and the content of each group of registration character strings is different randomly but the number of characters is the same. The first T group is a general registration character string, and the K-T group is a special character string. Generally, a registered voiceprint library is constructed by composing a general character string with numeric characters and a special character string with letters or english word groups or kanji characters, and each person has an independent part in the registered voiceprint library.

Before registration, there are several groups of classifiers that satisfy FRR and FAR requirements simultaneously, including 1 general character string, number string, text-related classifier, several special character string, text-related voiceprint classifiers (in this embodiment, 100), and 1 text-unrelated classifier.

The set of enrollment strings shown for convenience of presentation with reference to fig. 3 is not limited in its specific context. The design needs to record the registration character string of the K ═ 5 corpus for extracting the initial registration character, including the general registration character string and the special registration character string. Repeating each character content of letters, English phrases or Chinese characters for at least 2 times, wherein the number of characters of each character string is 10, the first 2 groups of linguistic data are general character strings, the last 3 groups of linguistic data are special character strings, wherein the 3 rd group is letters, and the last 5 characters are the repetition of the first 5 characters; the 4 th to 5 th groups are Chinese phrases, and each phrase is 2 characters.

Then, in step S2, the system will segment the corpus into two segments to obtain a general verification string and a special verification string, according to the input of the verification corpus given by the authentication system. The method comprises the steps of obtaining a first embedded code based on a text correlation identification technology, wherein the first embedded code can be extracted by a general character string numeric string text correlation discriminator, obtaining a second embedded code based on a text irrelevant identification technology, and extracting the second embedded code by a preset special character string text correlation voiceprint discriminator. The first embedded code and the second embedded code are initial embedded codes of the voiceprint of the user. In addition, in the subsequent implementation, the number of the general character string, the number string, the text related arbiter and the text unrelated arbiter are only 1, and a plurality of the special character string, the text related voiceprint arbiter can be preset, each corresponding to a Chinese phrase or a plurality of letters or English phrases, and the autonomous learning increase can be performed.

Assuming that a verification corpus has M-bit characters, a general verification string of N-bit characters and a special verification string of M-N-bit characters can be split. The general character string is generally composed of N-digit numbers, and the M-N special characters are generally composed of letters, English phrases or Chinese characters, so that the performance degradation problem of the algorithm for distinguishing different channels can be effectively avoided.

Referring to fig. 4, a set of verification corpora, which may also be referred to as a to-be-verified corpus, is shown for convenience of demonstration and is not limited by its specific content. Specifically, each verification corpus is 8 bits in length and is slightly shorter than the aforementioned registration character string, in which a general character string containing N-6 bits of characters and a special character string (letters or english word groups or chinese characters) containing M-N-2 bits of special characters are formed. Verifying that the corpus 1 contains all contents in the initial registry; the verification corpus 2 contains unknown corpus, namely the 'spelling' belongs to a Chinese phrase in a special character string, and the position is at the end of a sentence; verifying English letters in an 'AX' special character string contained in the corpus 3, wherein the positions of the English letters are at the end of a sentence; the verification corpus 4 is an initial registration corpus, and the 'vexation' belongs to Chinese phrases in special character strings, and the position of the Chinese phrases is in a sentence; the verification corpus 5 contains unknown corpus, i.e. "lovely" Chinese phrases belonging to special character string, and the position is in the sentence beginning.

After the preparation work is finished, the user can carry out identity verification through the dynamic verification code given by the system.

First, referring to fig. 1 and 2, in step S2, a dynamic verification corpus is input according to the dynamic verification provided by the system during user verification. The user can input the dynamic verification corpus through different audio recording devices. After the dynamic verification corpus is input, the dynamic verification corpus can be subjected to visual noise adding, the safety and the theft resistance are improved, then the dynamic verification corpus is subjected to environmental voice quality detection (including SNR calculation) to judge the environmental noise level, and if the signal-to-noise ratio obtained through calculation is too low and is lower than a preset threshold value, the dynamic verification corpus is prompted to be adjusted and input again. After the number of times exceeds three, the system is directly finished, the specific number of times can be automatically adjusted, and the dynamic verification corpus higher than the preset threshold value can be converted into the wav format, but not limited to the single format, and the single format is segmented. And obtaining a general dynamic verification character string and a special dynamic verification character string, and respectively identifying the general dynamic verification character string and the special dynamic verification character string, wherein the general dynamic verification character string and the special dynamic verification character string can be divided into a registered general dynamic verification character string, a registered special dynamic verification character string and an unregistered special dynamic verification character string. If the character string is determined to be a general dynamic verification character string, the step S3 is performed for subsequent processing, and if the character string is determined to be a special dynamic verification character string, the step S4 is performed for subsequent processing, wherein the general dynamic verification character string is composed of a plurality of numbers, and the special dynamic verification character is composed of a Chinese word or an English letter or an English phrase.

Assuming that the dynamic verification corpus to be input at this time is '81567903 teacup', the segmented number '81567903' and the Chinese phrase 'teacup' are respectively sent to a speech recognition engine for recognition, namely, the verification of the number '81567903' jumps to step S3, and the verification of the 'teacup' jumps to step S4.

Referring to fig. 1 and 2, in this embodiment, in step S2, the dynamic verification string is subjected to digital recognition, and if the dynamic verification string passes the recognition, the dynamic verification string enters the digital preprocessing module for subsequent processing, otherwise, the dynamic verification string is regarded as failed in verification and the verification is stopped. And after that, the speed of speech is verified through a speed of speech detector, if the speed of speech is too fast, the re-entry is prompted, and if the re-entry is invalid after multiple prompts, the closing can be carried out. For a general character string, as the corpus text sequence generated during verification is random, the segmentation and the order adjustment are needed to form the general character string sequence which is the same as the general character string sequence of the registered voiceprint library, and then the verification is carried out based on an end-to-end voiceprint recognition method in text related recognition. Therefore, the dynamic verification character strings are segmented one by one in the digital preprocessing module to obtain 8 digital fields only containing one number. The 8 fields are then rearranged according to the arrangement order of the general registration strings to form "01356789" (for ease of understanding, it is assumed that the general registration strings herein are 0123456789). Then, entering a reconstructed digit string module and adjusting and filling missing corpora (namely 2 and 4, filling corresponding digits with signal-to-noise ratio reaching a threshold value in a registration statement at random) according to a priori rule to reconstruct a digit string, and forming '01' after filling and sequencing23456789”。

Referring to fig. 1 and 2, in step S3, the reconstructed number string is sent to a general character string number string text correlation discriminator for discrimination, a first verification embedded code is generated, the first verification embedded code and the first embedded code are solved to obtain number string similarity for voiceprint matching, if the number string similarity exceeds a preset threshold, the subsequent step is determined to be successful, otherwise, the verification is determined to be failed and the process is stopped. After the judgment is successful, the digital strings are simultaneously sent to the text irrelevant corpus and the text relevant corpus for storage, and the voice data of the user is enriched.

Assume that a T-set text-related corpus SL ═ RC has been formed within the system₀,RS1,RS3,…,RS_T-1Here RC₀Is a general character string with the number of N₀Length Lrc₀And RS1, RS3, …, RS_T-1For registered special character strings, the number and length of which are N respectively₁And Lrc₁、N₂And Lrc₂、…、N_T-1Length Lrc_T-1. There are two types of processing for RS appearing in a corpus and a special string (URS) that is not registered at present.

For a common character string, preprocessing is carried out by utilizing a preprocessing technology, and the aligned corpus after segmentation and order adjustment is added to the existing common registration corpus RC0 ═ RC0₁,RC0₂,…，RC0_N0In this step, new RC0 ═ RC0 is obtained₁,RC0₂,…，RC0_N0，RC0_N0+1And re-entering the neural network encoder to generate a new embedded code, and then calculating with the other embedded codes to form a new embedded code. For the registered special character string, the reference processing mode is similar, and the description is omitted later. And calculating the first verification embedded code and the rest first embedded codes to form a new first embedded code.

Referring to fig. 1 and 2, step S4 judges the special character string.

In step S2, the segmented special dynamic verification string is identified, if the segmented special dynamic verification string is homophonic, the step S4 is performed, otherwise, the verification is determined to be failed and the operation is stopped. Next, in step S4, it is determined whether the special dynamic verification string has a corresponding special string text-related voiceprint discriminator, i.e., "cup" discriminator, where the system is identified that it has already been determined whether there is a corresponding special string text-related voiceprint discriminator when the system provides the dynamic verification corpus, and whether there are at least 1 registered corpus of the special dynamic verification string of the person to be verified in the text-related corpus. If both are present, the process goes to step S5, if the corresponding specific string text-related voiceprint discriminator is not present or both are not present, the process goes to step S6, if the corresponding specific string text-related voiceprint discriminator is present but not activated and at least 1 registered corpus exists in the text-related corpus, the process goes to step S7.

Referring to fig. 1 and 2, in step S5, the special dynamic verification string "cup" is sent to a corresponding special string text-related voiceprint discriminator for discrimination, a second verification embedded code is generated, the second verification embedded code and the second embedded code are solved to obtain the similarity of the special dynamic verification string, so as to perform voiceprint matching, if the similarity of the special dynamic verification string exceeds a preset threshold, it is determined that the subsequent step is successfully performed, otherwise, it is determined that the verification fails and the process is stopped. In step S52, the special dynamic verification strings are sent to and stored in the text-related corpus and the text-unrelated corpus of the corresponding speaker, so as to enrich the speech data of the speaker. And calculating the second verification embedded code and the rest second embedded codes to form a new second embedded code.

Referring to fig. 1, 2 and 5, for the unregistered special dynamic verification character string, voiceprint recognition is carried out based on a text-independent method, sentence-level or frame-level embedded codes are used as physical quantities to be compared with embedded codes in a registered voiceprint library, a second threshold with a lower threshold is set, and binary judgment is realized; a third threshold with a higher threshold is judged and set by combining the voice recognition engine; and judging the fourth threshold of the binary unregistered character string according to whether the requirements of the second threshold and the third threshold are met simultaneously. In addition, the system has automatic updating and self-learning functions, and after the first threshold and the fourth threshold are simultaneously met, the whole corpus to be detected is determined to pass through the judgment of the recognition system. If the judgment is passed, the classification processing is carried out and the classification is stored.

Specifically, in step S6, the special dynamic verification string "cup" is sent to the text-independent discriminator, and the sentence-level or frame-level embedded code is used as the physical quantity to be compared with the embedded code in the registered voiceprint library for voiceprint matching, and if the threshold is met, the matching is continued, otherwise, the verification is determined to be failed and the verification is stopped. Judging the special dynamic verification character string through a voice recognition engine, entering the next step if the special dynamic verification character string meets a threshold value, and otherwise, determining that the verification fails and stopping; and if the verifier registers the corpus, directly entering the next step. Then, the special dynamic verification character string is sent to a text-independent language database for storage, and a new text-independent voiceprint recognition is establishedThe model is ready for establishing a text related classifier of 'tea cups' in a text related material library subsequently. And generating a special character string related corpus, sending the special character string related corpus into a temporary corpus, sending the special character string related corpus into a text related corpus to be stored when the traversal speaker and the corpus both reach a certain number of thresholds, and constructing a new text related voiceprint discriminator (a 'teacup' classifier, and the number of the teacup classifier is 1 added to the last original text related voiceprint discriminator), and continuously collecting the corpus audio data if the number of the speeches is not reached. In addition, during the execution of step S6, at least one corpus is used as a group, one or five corpora may be used as a group, and after the "cup" successfully establishes the text-related voiceprint recognition classifier, another new set of unrelated phrases may appear in the corpus after the dynamic verification. That is, for unregistered special character string, it is added to the temporary SL and set to RS_TWhen the length N is in the corpus_TWhen a certain set threshold value is exceeded, RS is detected_TPut into SL to form SL', train to form new neural network encoder Net_TAnd can be used for generating the second embedded code. Therefore, the dynamic verification corpus can be split and then labeled, and the dynamic verification corpus is retrained with the registered corpus of the voiceprint database to form a new registered voiceprint database, a first embedded code, a second embedded code and a corresponding new embedded code.

Of course there is a special case as described above, namely step S7. If the word "teacup" is preset, the word discriminator has already formed [ the system knows itself ], but the situation that the registered corpus is not formed in the text-related corpus still exists. The reason for this is now that at the time of the just registration, for example: 5 pieces of Chinese characters are recorded in total, wherein 3 pieces are numbers, and the rest 2 pieces are words; that is, the 2 term corpus will not contain all the corpus required by 100 sets of classifiers, for example, 5 of them are covered, and the remaining 95 term corpora will be acquired in the subsequent verification process, which is assumed to be the case of "cup". At the moment, a teacup is required to enter a text-independent corpus and then the text-dependent corpus is subjected to post-processing, so that a discriminator with the teacup is activated. These 5 are then taken for statistical processing [ the most common, e.g. average, other statistics may be fused to form a new registration embedded code ]. There are N, and statistical processing is done on N, which is also dynamically trained to be continuously enhanced on the registration embedded code.

Referring to fig. 5, step a3 preferably shows self-refresh and self-learning of the present embodiment.

For the relevant model of the digital string, the FAR and FRR conditions are met initially, so the model is not updated generally, one data is added every time the model is verified, and when the data is accumulated to a threshold value, for example, the performance is increased by 10% and the FRR is tested to be better than the previous indexes at the same time, the model is updated to generate and obtain a new first embedded code.

When 100 text-related discriminators relate to the fact that the spek lacks the word corpus, the corpus is updated, and the model is also updated. And updating the text related corpus and the special character string text related voiceprint discriminator when the number of the inputted corpora persons and the number of the corpora stored in the text related database according to the special verification character string simultaneously meet a certain threshold. For a text related corpus, training a related voiceprint synthesis network by all corpora in an antagonistic training mode in a specific verification artificial data organization mode; the existing corpus text is used as input, audio is formed to be used as an amplified corpus file, and a noise signal is added to form a new corpus file in a text related corpus. For example, assuming a word of a newly added "teacup," the classifier that is not formed at the previous stage satisfies a certain condition (for example, 10000 speakers have 3 repeated phrases), and then the method of data amplification and robust improvement is performed by using the 3 repeated phrases to train to form a model. After the model is formed, the registration corpus of the verifier is updated for the fact that some subsequent verifiers lack the registration corpus of the words. Several text-related voiceprint discriminators in the same condition as the tea cup can be increased one by one, or 5 voiceprint discriminators can be accumulated for the next 5 voiceprint discriminators after completing one batch.

And the text-independent discriminator is updated when the data in the text-independent database is accumulated to a certain proportion, such as 10% increase and the test FAR FRR is better than the previous index at the same time, so as to obtain a new text-independent discriminator and a new embedded code.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments. Even if various changes are made to the present invention, it is still within the scope of the present invention if they fall within the scope of the claims of the present invention and their equivalents.

Claims

1. A self-learning dynamic voiceprint identity authentication method based on unknown corpus is configured with a dynamic voiceprint identity authentication system, and is characterized by comprising the following steps:

a2: according to a verification corpus rule format, carrying out identification and binary segmentation on input verification corpuses to obtain a general verification character string and a special verification character string, carrying out identification and comparison through respective voice identification engines, if the general verification character string and the special verification character string pass through the voice identification engines, preprocessing the general verification character string and sending the preprocessed general verification character string to a general character string and number string text correlation discriminator, and obtaining a first embedded code of the general character string and number string text correlation discriminator based on a text correlation identification technology; preprocessing the special verification character string, sending the special verification character string to a text-independent discriminator or a text-dependent voiceprint discriminator of the special character string according to whether the special verification character string has the corresponding special character string text-dependent voiceprint discriminator and whether at least one registration embedded code exists, and obtaining a second embedded code of the text-independent discriminator or the text-dependent voiceprint discriminator based on a text-independent identification technology;

a3: updating the general character string and number string text correlation discriminator according to the cumulant and the performance improvement degree of the general verification character string stored in the text correlation database, which simultaneously meet a certain threshold value;

updating the text related corpus and the special character string text related voiceprint discriminator according to the number of the inputted corpus and the number of the inputted corpora which are stored in the text related database by the special verification character string and the condition that the number of the inputted corpora simultaneously meet a certain threshold;

and updating the text-independent discriminator according to the fact that the cumulant of the special verification character string stored in the text-independent corpus and the performance improvement degree simultaneously meet a certain threshold value.

2. The self-learning dynamic voiceprint authentication method according to claim 1, wherein in the authentication process of step a2, words not forming the special string text-related voiceprint discriminator are used, and after a certain threshold is satisfied according to data accumulation and performance improvement, and data amplification and robust processing are performed, a text-related corpus not forming the training model of the special string text-related voiceprint discriminator is provided, and the obtaining manner is to update and increase the text-related corpus when partial authenticator authenticates, so that the number of inputted corpus persons and the number of corpus thereof satisfy a certain threshold at the same time, and thus the corresponding new special string text-related voiceprint discriminator is generated through training and performance authentication.

3. The unknown corpus-based self-learning dynamic voiceprint authentication method as claimed in claim 1, wherein said step a1 comprises:

s11: inputting the general registration character string and the special registration character string to the dynamic voiceprint identity authentication system, wherein the general registration character string and the special registration character string are respectively distinguished by front-end audio features formed by a general character string digital string text correlation discriminator and a special character string text correlation voiceprint discriminator, and then are stored in the text correlation corpus;

s12: inputting a verification corpus, carrying out binary segmentation on the verification corpus to obtain the general verification character string and the special verification character string, and obtaining a first embedded code through a text correlation identification technology, wherein the first embedded code can be extracted by a general character string numeric string text correlation discriminator, and a second embedded code is obtained based on a text irrelevant identification technology and can be extracted by a special character string text relevant voiceprint discriminator.

4. The unknown corpus-based self-learning dynamic voiceprint authentication method as claimed in claim 1, wherein said step a2 comprises:

s2: inputting a dynamic verification corpus given by the dynamic voiceprint authentication system according to the dynamic verification rule of the dynamic voiceprint authentication system, binary segmentation is carried out on the dynamic verification corpus to obtain a general dynamic verification character string and a special dynamic verification character string, and respectively carrying out recognition and comparison by respective voice recognition engines, if the general dynamic verification character string passes the recognition and comparison, judging that the general dynamic verification character string sequentially carries out preprocessing means such as speech speed test, rearrangement, completion reconstruction and the like, and step S3 is proceeded to perform the subsequent processing, if the special dynamic verification character string is determined to be the same voice, the speech recognition engine is proceeded to determine whether the character string is the same voice, if so, step S4 is proceeded to perform the subsequent processing, the general dynamic verification character string consists of a plurality of numbers, and the special dynamic verification character consists of Chinese words or English letters or English phrases;

s3: sending the numeric string into a preset general character string and numeric string text correlation discriminator to discriminate with a preset threshold value, and if the discrimination is successful, sending the numeric string into a text irrelevant corpus and a text relevant database at the same time to store; otherwise, confirming as verification failure and stopping;

s4: judging whether the special dynamic verification character string has a corresponding special character string text-related voiceprint discriminator and whether the text-related corpus has at least 1 registered corpus of the special dynamic verification character string of a person to be verified, if both exist, entering step S5, if the corresponding special character string text-related voiceprint discriminator does not exist or does not exist, entering step S6, and if the corresponding special character string text-related voiceprint discriminator exists but is not activated and at least 1 registered corpus exists in the text-related corpus, entering step S7;

s5: sending the special dynamic verification character string into a corresponding special character string text-related voiceprint discriminator to perform similarity discrimination with a registration embedded code of a to-be-verified person, if the similarity exceeds a preset threshold value, successfully judging, sending the special dynamic verification character string into the text-unrelated corpus and the text-related database to store, and otherwise, determining that the verification fails and stopping;

s6: sending the special dynamic verification character string into the text irrelevant discriminator to perform similarity discrimination with the registration embedded code of the to-be-verified person, if the similarity exceeds a preset threshold value, if the discrimination is successful, sending the special dynamic verification character string into the text irrelevant corpus and the text relevant corpus to store, and if not, determining that the verification fails and stopping;

s7: and sending the special dynamic verification character string into the text irrelevant discriminator to carry out similarity discrimination, if the similarity exceeds a preset threshold value, sending the special dynamic verification character string into the text irrelevant corpus to store if the similarity is judged successfully, carrying out post-processing on the text relevant corpus, activating the corresponding special character string text relevant voiceprint discriminator, adding at least one registration language material which can be discriminated by the special character string text relevant voiceprint discriminator, and determining that the verification is failed and stopping if the similarity is not judged successfully.

5. The self-learning dynamic voiceprint authentication method according to claim 4, wherein in the step S2, before the parsing of the dynamic authentication corpus, the method further comprises the following steps:

further, in the step B1, a visual noise processing can be performed on the dynamic verification corpus before the ambient voice quality detection is performed.

6. The self-learning dynamic voiceprint authentication method based on unknown corpus of claim 4, wherein the step S2 specifically comprises the following steps:

s21: performing digital recognition on the dynamic verification character string voice through a voice recognition engine, judging through a voice speed detector, if the dynamic verification character string voice passes the recognition and the voice speed is normal, entering a step S22, and if the dynamic verification character string voice passes the recognition and the voice speed is normal, determining that the verification fails and stopping;

s23: rearranging a number of the number segments based on an arrangement order of the general registration character strings;

s24: and adjusting and filling missing corpora according to a prior rule, and filling and reconstructing the arranged digital segments to obtain the digital string.

7. The self-learning dynamic voiceprint authentication method based on unknown corpus of claim 4, wherein the step S3 specifically comprises the following steps:

s31: sending the numeric string into a text correlation discriminator of the general character string and the numeric string for discrimination to obtain a first verification embedded code, solving the first verification embedded code and the first embedded code to obtain the similarity of the numeric string for voiceprint matching, if the similarity of the numeric string exceeds a preset threshold value, successfully entering a step S32, otherwise, determining that the verification fails and stopping;

s32: and sending the digital string into the text-related corpus and the text-unrelated corpus for storage, and calculating the first verification embedded code and the rest of the first embedded codes to form a new first embedded code.

8. The self-learning dynamic voiceprint authentication method based on unknown corpus of claim 4, wherein the step S5 specifically comprises the following steps:

s51: sending the special dynamic verification character string to a corresponding special character string text-related voiceprint discriminator to generate a second verification embedded code, solving the second verification embedded code and the second embedded code to obtain the similarity of the special dynamic verification character string so as to perform voiceprint matching, if the similarity of the special dynamic verification character string exceeds a preset threshold value, judging that the step S52 is successful, and if not, judging that the verification fails and stopping;

s52: and sending the special dynamic verification character string into the text-independent corpus for storage and the text-dependent corpus for storage, and calculating the second verification embedded code and the rest of second embedded codes to form a new second embedded code.

9. The self-learning dynamic voiceprint authentication method based on unknown corpus of claim 4, wherein the step S6 specifically comprises the following steps:

s64: sending the special dynamic verification character string into the text-independent corpus for storage, generating special character string-related corpora and sending the special character string-related corpora into a temporary corpus, and forming a special character string-related corpus for training a new related discriminator when the length of the special character string-related corpora in the temporary corpus and the number of the people who input the test corpora exceed preset thresholds;

s65: and sending the special character string related language material into the text related language material base for storage, and training the special character string related language material to form a new special character string text related voiceprint discriminator.

10. The self-learning dynamic voiceprint authentication method based on unknown corpus of claim 4, wherein the step S7 specifically comprises the following steps:

s71: sending the special dynamic verification character string to the text-independent discriminator to generate a second verification embedded code, solving the second verification embedded code and the second embedded code to obtain the similarity of the special dynamic verification character string for voiceprint matching, entering the step S72 if a preset threshold value is met, otherwise, determining that the verification fails and stopping;

s72: sending the special dynamic verification character string into the text-independent corpus for storage, and implementing post-processing on the text-dependent corpus;

s73: and activating a text-related voiceprint discriminator of the corresponding special character string in the text-related corpus, and adding at least one registration corpus which can be discriminated by the text-related voiceprint discriminator of the special character string.

11. The unknown corpus based self-learning dynamic voiceprint authentication method as claimed in claim 1, wherein said step a3 further comprises:

for the text related corpus, training all corpora of the text related corpus in a countertraining mode by a specific verification artificial data organization mode to form a related voiceprint synthesis network; and taking the existing corpus text as input, forming audio as an amplified corpus file, and adding a noise signal to form a new corpus file in the text-related corpus.