CN111145758A - Voiceprint recognition method, system, mobile terminal and storage medium - Google Patents

Voiceprint recognition method, system, mobile terminal and storage medium Download PDF

Info

Publication number
CN111145758A
CN111145758A CN201911357829.6A CN201911357829A CN111145758A CN 111145758 A CN111145758 A CN 111145758A CN 201911357829 A CN201911357829 A CN 201911357829A CN 111145758 A CN111145758 A CN 111145758A
Authority
CN
China
Prior art keywords
text
recognized
voice data
recognition model
voiceprint recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911357829.6A
Other languages
Chinese (zh)
Inventor
叶林勇
肖龙源
李稀敏
蔡振华
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN201911357829.6A priority Critical patent/CN111145758A/en
Publication of CN111145758A publication Critical patent/CN111145758A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voiceprint recognition method, a system, a mobile terminal and a storage medium, wherein the method comprises the following steps: acquiring voice data to be recognized, and performing feature extraction on the voice data to be recognized to obtain acoustic features; decoding and identifying the acoustic features to obtain text contents, and performing text cutting on the voice data to be identified according to the text contents; judging the text type of the voice data to be recognized according to the text cutting result, and inquiring a target recognition model according to the text type; and carrying out voiceprint recognition on the voice data to be recognized according to the target recognition model so as to obtain a voiceprint recognition result. According to the invention, the text type of the voice data to be recognized is judged by performing the text cutting design on the voice data to be recognized according to the text content, so that the voice data to be recognized can be transmitted to the corresponding voiceprint recognition model for voiceprint recognition according to the judged text type, the problem of inconsistency between the registered voice and the voice to be recognized in the voiceprint recognition process is prevented, and the accuracy of voiceprint recognition is improved.

Description

Voiceprint recognition method, system, mobile terminal and storage medium
Technical Field
The invention belongs to the technical field of voiceprint recognition, and particularly relates to a voiceprint recognition method, a voiceprint recognition system, a mobile terminal and a storage medium.
Background
The voice of each person implies unique biological characteristics, and the voiceprint recognition refers to a technical means for recognizing a speaker by using the voice of the speaker. The voiceprint recognition has high safety and reliability as the techniques of fingerprint recognition and the like, and can be applied to all occasions needing identity recognition. Such as in the financial fields of criminal investigation, banking, securities, insurance, and the like. Compared with the traditional identity recognition technology, the voiceprint recognition technology has the advantages of simple voiceprint extraction process, low cost, uniqueness and difficulty in counterfeiting and counterfeit.
The existing voiceprint recognition scheme is to collect voice data of at least one user in advance, extract a characteristic value of the voice data, and input the extracted characteristic value into a voiceprint model to obtain an N-dimensional voiceprint vector. And when confirming or identifying, firstly acquiring the voice data of any user, then extracting a characteristic value of the voice data, inputting the characteristic value into a voiceprint model to obtain an N-dimensional voiceprint vector, and then performing similarity matching with the original voiceprint vector in a voiceprint library, wherein each matched user can obtain a score, and the voiceprint with the highest score and larger than a threshold value is the user corresponding to the voice to be detected. However, in the prior art, when the speech to be detected is not a speaking speech, for example, when the speech to be detected is a section of random text and the registered speech is a sentence of fixed text, the obtained recognition result is inaccurate, so that the voiceprint recognition accuracy is low.
Disclosure of Invention
The embodiment of the invention aims to provide a voiceprint recognition method, a voiceprint recognition system, a mobile terminal and a storage medium, and aims to solve the problem that the existing voiceprint recognition method is low in recognition accuracy.
The embodiment of the invention is realized in such a way that a voiceprint recognition method comprises the following steps:
acquiring voice data to be recognized, and performing feature extraction on the voice data to be recognized to obtain acoustic features;
decoding and identifying the acoustic features to obtain text content, and performing text cutting on the voice data to be identified according to the text content;
judging the text type of the voice data to be recognized according to the text cutting result, and inquiring a target recognition model according to the text type, wherein the target recognition model is a text-related recognition model, a text-unrelated recognition model or a text semi-related recognition model;
and carrying out voiceprint recognition on the voice data to be recognized according to the target recognition model so as to obtain a voiceprint recognition result.
Further, the step of decoding and identifying the acoustic features comprises:
inputting the acoustic features into an acoustic model to obtain phoneme information;
and inputting the phoneme information into a language model and decoding according to a preset text dictionary to obtain the text content.
Further, the step of performing text segmentation on the speech data to be recognized according to the text content includes:
judging whether text characters are stored in the text content;
when the text characters are judged to be stored in the text content, carrying out text marking on corresponding voice in the voice data to be recognized according to the text characters;
when the text word is judged not to be stored in the text content, judging whether a number is stored in the text content;
and when the number is judged to be stored in the text content, carrying out digital marking on the corresponding voice in the voice data to be recognized according to the number.
Further, the step of determining the text type of the speech data to be recognized according to the text cutting result comprises:
judging whether the text characters are fixed texts prestored locally;
if so, judging that the voice data to be recognized is a text related type;
and if not, judging that the voice data to be recognized is a text-independent type.
Further, the step of determining the text type of the speech data to be recognized according to the text cutting result comprises:
when the number is judged to be stored in the text content, judging whether the number value of the number is a number threshold value;
if so, judging that the voice data to be recognized is a text semi-correlation type;
if not, sending out a text content error prompt.
Further, the step of querying the target recognition model according to the text type comprises:
when the voice data to be recognized is judged to be the text-related type, judging that the target recognition model is the text-related recognition model;
when the voice data to be recognized is judged to be the text-independent type, judging that the target recognition model is the text-independent recognition model;
and when the voice data to be recognized is judged to be the text semi-correlation type, judging that the target recognition model is the text semi-correlation recognition model.
Further, the step of performing voiceprint recognition on the voice data to be recognized according to the target recognition model comprises:
inputting the acoustic features into the target recognition model to obtain feature vectors;
calculating a matching value between the characteristic vector and a locally pre-stored sample vector according to an Euclidean distance formula, and acquiring a serial number value of the sample vector corresponding to the maximum value in the matching value;
and when the number value is judged to be larger than the number threshold value, judging that the voiceprint recognition of the voice data to be recognized is qualified.
Another object of an embodiment of the present invention is to provide a voiceprint recognition system, which includes:
the acoustic feature extraction module is used for acquiring voice data to be recognized and extracting features of the voice data to be recognized to obtain acoustic features;
the text cutting module is used for decoding and identifying the acoustic features to obtain text contents and performing text cutting on the voice data to be identified according to the text contents;
the model query module is used for judging the text type of the voice data to be recognized according to the text cutting result and querying a target recognition model according to the text type, wherein the target recognition model is a text-related recognition model, a text-unrelated recognition model or a text semi-related recognition model;
and the voiceprint recognition module is used for carrying out voiceprint recognition on the voice data to be recognized according to the target recognition model so as to obtain a voiceprint recognition result.
Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the above voiceprint recognition method.
Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the above-mentioned mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the above-mentioned voiceprint recognition method.
According to the method and the device, the text type of the voice data to be recognized is judged by performing text cutting design on the voice data to be recognized according to the text content, so that the voice data to be recognized can be transmitted to the corresponding voice print recognition model according to the judged text type for voice print recognition, the problem that the registered voice is inconsistent with the voice to be recognized in the voice print recognition process is avoided, and the accuracy of the voice print recognition is effectively improved.
Drawings
Fig. 1 is a flowchart of a voiceprint recognition method provided by a first embodiment of the invention;
FIG. 2 is a flow chart of a voiceprint recognition method provided by a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a voiceprint recognition system provided by a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Example one
Referring to fig. 1, a flowchart of a voiceprint recognition method according to a first embodiment of the present invention is shown, which includes the following steps:
step S10, acquiring voice data to be recognized, and performing feature extraction on the voice data to be recognized to obtain acoustic features;
wherein, the acoustic feature is extracted by adopting an MFCC algorithm to obtain Mel-scale frequency Cepstral coeffients (Mel-scale frequency Cepstral coeffients);
specifically, the extraction method of the mel-frequency cepstrum coefficient includes: pre-emphasis, framing, windowing, FFT processing, filter processing, logarithm operation and discrete cosine transform;
step S20, decoding and identifying the acoustic features to obtain text content, and performing text cutting on the voice data to be identified according to the text content;
specifically, in this embodiment, the acoustic model and the language model are respectively subjected to model training according to locally pre-stored sample voice data and sample text data, so that the voice in the voice data to be recognized can be effectively subjected to character recognition according to the trained acoustic model and voice model, and the text content is obtained;
in the step, data such as characters, numbers or letters and the like can be stored in the text content obtained by decoding, and the text content obtained by decoding corresponds to the information in the voice data to be recognized one by one;
step S30, judging the text type of the voice data to be recognized according to the text cutting result, and inquiring a target recognition model according to the text type;
the target identification model is a text-related identification model, a text-unrelated identification model or a text semi-related identification model;
specifically, the text related model is used for recognizing fixed voices, such as fixed texts like 'a magpie is on a tree', 'shopping is really convenient', 'a jujube tree is in front of my family', and the like, wherein the registration of the text related model needs a section of voice file repeated for three times to ensure the quality of the extracted voiceprint characteristic value;
the text-independent voiceprint model is used for identifying random texts, and the registration requires that the effective voice text length is more than 30 s;
the text semi-correlation voiceprint model is used for identifying random 8-bit length dynamic numbers, and the registration of the text semi-correlation model requires 5 groups of 8-bit random dynamic number voices;
preferably, in this step, the step of querying the target recognition model according to the text type includes:
when the voice data to be recognized is judged to be the text-related type, judging that the target recognition model is the text-related recognition model, namely when the voice data to be recognized is judged to be a fixed text pre-stored locally, judging that the voice data to be recognized can be subjected to voiceprint recognition by adopting the text-related recognition model currently;
when the voice data to be recognized is judged to be the text-independent type, the target recognition model is judged to be the text-independent recognition model, namely when the voice data to be recognized is judged to be a random text, the voice print recognition of the voice data to be recognized can be carried out by adopting the text-independent recognition model at present;
when the voice data to be recognized is judged to be the text semi-correlation type, the target recognition model is judged to be the text semi-correlation recognition model, namely when the voice data to be recognized is judged to be a digital sequence, the voice print recognition of the voice data to be recognized can be carried out by adopting the text semi-correlation recognition model at present;
step S40, performing voiceprint recognition on the voice data to be recognized according to the target recognition model to obtain a voiceprint recognition result;
extracting acoustic features of the voice data to be recognized by using an MFCC algorithm, and then inputting the extracted acoustic features into a corresponding target recognition model (a text-related model, a text-unrelated model or a text semi-related model) to output a voiceprint recognition result, wherein the voiceprint recognition result is qualified or unqualified voiceprint recognition detection of the voice data to be recognized;
in the embodiment, the text type of the voice data to be recognized is judged by performing text cutting on the voice data to be recognized according to the text content, so that the voice data to be recognized can be transmitted to the corresponding voice print recognition model according to the judged text type for voice print recognition, the problem of inconsistency between the registered voice and the voice to be recognized in the voice print recognition process is prevented, and the accuracy of the voice print recognition is effectively improved.
Example two
Referring to fig. 2, a flowchart of a voiceprint recognition method according to a second embodiment of the present invention is shown, which includes the following steps:
step S11, acquiring voice data to be recognized, and performing feature extraction on the voice data to be recognized to obtain acoustic features;
step S21, inputting the acoustic features into an acoustic model to obtain phoneme information, inputting the phoneme information into a language model and decoding according to a preset text dictionary to obtain the text content;
the method comprises the steps that model training is respectively carried out on an acoustic model and a language model according to sample voice data and sample text data which are pre-stored locally, so that phoneme information acquisition and text decoding can be effectively carried out on voice in the voice data to be recognized according to the acoustic model and the voice model which are trained, and text content is obtained, preferably, in the step, the preset text dictionary can be coded in a unique hot coding mode;
step S31, judging whether the text content stores text characters;
the text characters are any preset characters, and the preset characters can be Chinese, English, Japanese or Korean;
specifically, in this step, whether the text content stores the text characters is determined by sequentially matching the characters in the text content with preset characters, wherein the matching between the text content and the preset characters can be performed by adopting an image matching manner;
when the step S31 determines that the text word is stored in the text content, text labeling the corresponding voice in the voice data to be recognized according to the text word, and executing step S41;
by designing the text marks on the corresponding voices, the subsequent capturing of the corresponding voices is effectively facilitated, and the recognition efficiency and accuracy of the voiceprint recognition method are improved;
step S41, judging whether the text characters are fixed texts pre-stored locally;
the fixed text can be set according to requirements, for example, the fixed text can be set to 'a magpie is on a tree', 'shopping is convenient', 'a jujube tree is in front of my home', and the like, that is, the step judges whether the speech to be recognized is a text related type by judging whether text characters are pre-stored fixed text;
when the judgment result of the step S41 is yes, step S51 is performed;
step S51, judging the voice data to be recognized as text-related type, and setting the text-related recognition model as a target recognition model;
when the voice data to be recognized is judged to be the text related type, the voice to be recognized is judged to be the voice data sent out aiming at the fixed text, for example, when the locally pre-stored fixed text is 'a magpie exists on a tree', and the text stored in the text content is judged to be 'a magpie exists on a tree', the voice data to be recognized is judged to be the text related type;
preferably, in this embodiment, when it is determined that the repetition probability between the text characters and the fixed text is greater than or equal to a probability threshold, it is determined that the speech data to be recognized is a text-related type, where the probability threshold may be set according to a requirement, and in this embodiment, the probability threshold is 50%, for example:
when the locally pre-stored fixed text is 'magpie only on tree' and the text stored in the text content is 'magpie only', the repetition probability is 50 percent, so that the voice data to be recognized is judged to be a text related type;
when the judgment result of the step S41 is no, step S61 is performed;
step S61, judging the voice data to be recognized as a text-independent type, and setting a text-independent recognition model as a target recognition model;
when the voice data to be recognized is judged to be a text-independent type, judging that the voice to be recognized is the voice data sent out by aiming at the random text, wherein the random text is subjected to character dynamic change along with the change of time;
when the step S31 determines that the text word is not stored in the text content, execute step S71;
step S71, judging whether the text content stores numbers;
preferably, in other embodiments, the step may further determine whether a preset identifier is stored in the text content, where the preset identifier may be a letter or a symbol;
when the step S71 determines that the number is stored in the text content, digitally labeling the corresponding voice in the voice data to be recognized according to the number, and executing step S81;
step S81, determining whether the number value of the number is a number threshold;
in this embodiment, the number threshold is 8, that is, whether the number of the digits in the text content is 8 is determined;
preferably, in this step, when it is determined that the ratio between the number of the digits and the number threshold is greater than or equal to a preset ratio, it is determined that the number of the digits is the number threshold, and the preset ratio may be set according to a requirement, where the preset ratio in this embodiment is 0.5, for example:
when the number of the numbers is equal to 4:8, the ratio of the number of the numbers to the number threshold value is 4:8, so that the number of the numbers is judged to be the number threshold value;
when the judgment result of the step S81 is yes, step S91 is performed;
step S91, judging the voice data to be recognized as a text semi-correlation type, and setting a text semi-correlation recognition model as a target recognition model;
when the voice data to be recognized is judged to be of a text semi-correlation type, judging that the voice to be recognized is the voice data sent out aiming at the dynamic number;
when the judgment result of step S71 or step S81 is no, step 101 is executed;
step 101, sending out a text content error prompt;
when the text content is judged to be not stored with characters and numbers, sending a text content error prompt, wherein the sent text content error prompt is used for prompting a user that errors exist in the current collection or decoding of the voice data to be recognized;
step S111, performing voiceprint recognition on the voice data to be recognized according to the target recognition model to obtain a voiceprint recognition result;
specifically, in this step, the step of performing voiceprint recognition on the speech data to be recognized according to the target recognition model includes:
inputting the acoustic features into the target recognition model to obtain feature vectors;
calculating a matching value between the characteristic vector and a locally pre-stored sample vector according to an Euclidean distance formula, and acquiring a serial number value of the sample vector corresponding to the maximum value in the matching value;
when the number value is judged to be larger than the number threshold value, judging that the voiceprint recognition of the voice data to be recognized is qualified;
in the embodiment, the text type of the voice data to be recognized is judged by performing text cutting on the voice data to be recognized according to the text content, so that the voice data to be recognized can be transmitted to the corresponding voice print recognition model according to the judged text type for voice print recognition, the problem that the registered voice is inconsistent with the voice to be recognized in the voice print recognition process is prevented, and the accuracy of the voice print recognition is effectively improved.
EXAMPLE III
Referring to fig. 3, a schematic structural diagram of a voiceprint recognition system 100 according to a third embodiment of the present invention is shown, including: the acoustic feature extraction module 10, the text cutting module 11, the model query module 12 and the voiceprint recognition module 13, wherein:
the acoustic feature extraction module 10 is configured to acquire voice data to be recognized and perform feature extraction on the voice data to be recognized to obtain acoustic features;
and the text cutting module 11 is configured to decode and identify the acoustic features to obtain text content, and perform text cutting on the voice data to be identified according to the text content.
Wherein the text cutting module 11 is further configured to: inputting the acoustic features into an acoustic model to obtain phoneme information; and inputting the phoneme information into a language model and decoding according to a preset text dictionary to obtain the text content.
Preferably, the text cutting module 11 is further configured to: judging whether text characters are stored in the text content;
when the text characters are judged to be stored in the text content, carrying out text marking on corresponding voice in the voice data to be recognized according to the text characters;
when the text word is judged not to be stored in the text content, judging whether a number is stored in the text content;
and when the number is judged to be stored in the text content, carrying out digital marking on the corresponding voice in the voice data to be recognized according to the number.
And the model query module 12 is configured to determine a text type of the speech data to be recognized according to the result of the text segmentation, and query a target recognition model according to the text type, where the target recognition model is a text-related recognition model, a text-unrelated recognition model, or a text semi-related recognition model.
Further, the model query module 12 is further configured to determine whether the text characters are fixed texts pre-stored locally; if so, judging that the voice data to be recognized is a text related type; and if not, judging that the voice data to be recognized is a text-independent type.
Preferably, the model query module 12 is further configured to: when the number is judged to be stored in the text content, judging whether the number value of the number is a number threshold value; if so, judging that the voice data to be recognized is a text semi-correlation type; if not, sending out a text content error prompt. A
Further, the model query module 12 is further configured to: when the voice data to be recognized is judged to be the text-related type, judging that the target recognition model is the text-related recognition model; when the voice data to be recognized is judged to be the text-independent type, judging that the target recognition model is the text-independent recognition model; and when the voice data to be recognized is judged to be the text semi-correlation type, judging that the target recognition model is the text semi-correlation recognition model.
And the voiceprint recognition module 13 is configured to perform voiceprint recognition on the voice data to be recognized according to the target recognition model to obtain a voiceprint recognition result.
Wherein the voiceprint recognition module 13 is further configured to: inputting the acoustic features into the target recognition model to obtain feature vectors; calculating a matching value between the characteristic vector and a locally pre-stored sample vector according to an Euclidean distance formula, and acquiring a serial number value of the sample vector corresponding to the maximum value in the matching value; and when the number value is judged to be larger than the number threshold value, judging that the voiceprint recognition of the voice data to be recognized is qualified.
In the embodiment, the text type of the voice data to be recognized is judged by performing text cutting on the voice data to be recognized according to the text content, so that the voice data to be recognized can be transmitted to the corresponding voice print recognition model according to the judged text type for voice print recognition, the problem of inconsistency between the registered voice and the voice to be recognized in the voice print recognition process is prevented, and the accuracy of the voice print recognition is effectively improved.
Example four
Referring to fig. 4, a mobile terminal 101 according to a fourth embodiment of the present invention includes a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal 101 execute the above voiceprint recognition method.
The present embodiment also provides a storage medium on which a computer program used in the above-mentioned mobile terminal 101 is stored, which when executed, includes the steps of:
acquiring voice data to be recognized, and performing feature extraction on the voice data to be recognized to obtain acoustic features;
decoding and identifying the acoustic features to obtain text content, and performing text cutting on the voice data to be identified according to the text content;
judging the text type of the voice data to be recognized according to the text cutting result, and inquiring a target recognition model according to the text type, wherein the target recognition model is a text-related recognition model, a text-unrelated recognition model or a text semi-related recognition model;
and carrying out voiceprint recognition on the voice data to be recognized according to the target recognition model so as to obtain a voiceprint recognition result. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is used as an example, in practical applications, the above-mentioned function distribution may be performed by different functional units or modules according to needs, that is, the internal structure of the storage device is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.
Those skilled in the art will appreciate that the component structures shown in fig. 3 are not intended to be limiting of the voiceprint recognition system of the present invention and can include more or fewer components than shown, or some components in combination, or a different arrangement of components, and that the voiceprint recognition method of fig. 1-2 can also be implemented using more or fewer components than shown in fig. 3, or some components in combination, or a different arrangement of components. The units, modules, etc. referred to herein are a series of computer programs that can be executed by a processor (not shown) in the target voiceprint recognition system and that are functionally capable of performing certain functions, all of which can be stored in a storage device (not shown) of the target voiceprint recognition system.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A voiceprint recognition method, the method comprising:
acquiring voice data to be recognized, and performing feature extraction on the voice data to be recognized to obtain acoustic features;
decoding and identifying the acoustic features to obtain text content, and performing text cutting on the voice data to be identified according to the text content;
judging the text type of the voice data to be recognized according to the text cutting result, and inquiring a target recognition model according to the text type, wherein the target recognition model is a text-related recognition model, a text-unrelated recognition model or a text semi-related recognition model;
and carrying out voiceprint recognition on the voice data to be recognized according to the target recognition model so as to obtain a voiceprint recognition result.
2. The voiceprint recognition method of claim 1 wherein said step of decoding and recognizing said acoustic features comprises:
inputting the acoustic features into an acoustic model to obtain phoneme information;
and inputting the phoneme information into a language model and decoding according to a preset text dictionary to obtain the text content.
3. The voiceprint recognition method according to claim 1, wherein the step of text-cutting the speech data to be recognized according to the text content comprises:
judging whether text characters are stored in the text content;
when the text characters are judged to be stored in the text content, carrying out text marking on corresponding voice in the voice data to be recognized according to the text characters;
when the text word is judged not to be stored in the text content, judging whether a number is stored in the text content;
and when the number is judged to be stored in the text content, carrying out digital marking on the corresponding voice in the voice data to be recognized according to the number.
4. The voiceprint recognition method according to claim 3, wherein the step of determining the text type of the speech data to be recognized according to the result of the text cutting comprises:
judging whether the text characters are fixed texts prestored locally;
if so, judging that the voice data to be recognized is a text related type;
and if not, judging that the voice data to be recognized is a text-independent type.
5. The voiceprint recognition method according to claim 4, wherein the step of determining the text type of the speech data to be recognized according to the result of the text cutting comprises:
when the number is judged to be stored in the text content, judging whether the number value of the number is a number threshold value;
if so, judging that the voice data to be recognized is a text semi-correlation type;
if not, sending out a text content error prompt.
6. The voiceprint recognition method of claim 5 wherein said step of querying a target recognition model based on said text type comprises:
when the voice data to be recognized is judged to be the text-related type, judging that the target recognition model is the text-related recognition model;
when the voice data to be recognized is judged to be the text-independent type, judging that the target recognition model is the text-independent recognition model;
and when the voice data to be recognized is judged to be the text semi-correlation type, judging that the target recognition model is the text semi-correlation recognition model.
7. The voiceprint recognition method according to claim 1, wherein the step of voiceprint recognizing the speech data to be recognized according to the target recognition model comprises:
inputting the acoustic features into the target recognition model to obtain feature vectors;
calculating a matching value between the characteristic vector and a locally pre-stored sample vector according to an Euclidean distance formula, and acquiring a serial number value of the sample vector corresponding to the maximum value in the matching value;
and when the number value is judged to be larger than the number threshold value, judging that the voiceprint recognition of the voice data to be recognized is qualified.
8. A voiceprint recognition system, said system comprising:
the acoustic feature extraction module is used for acquiring voice data to be recognized and extracting features of the voice data to be recognized to obtain acoustic features;
the text cutting module is used for decoding and identifying the acoustic features to obtain text contents and performing text cutting on the voice data to be identified according to the text contents;
the model query module is used for judging the text type of the voice data to be recognized according to the text cutting result and querying a target recognition model according to the text type, wherein the target recognition model is a text-related recognition model, a text-unrelated recognition model or a text semi-related recognition model;
and the voiceprint recognition module is used for carrying out voiceprint recognition on the voice data to be recognized according to the target recognition model so as to obtain a voiceprint recognition result.
9. A mobile terminal, characterized in that it comprises a storage device for storing a computer program and a processor running the computer program to make the mobile terminal execute the voiceprint recognition method according to any one of claims 1 to 7.
10. A storage medium, characterized in that it stores a computer program for use in a mobile terminal according to claim 9, which computer program, when executed by a processor, implements the steps of the voiceprint recognition method according to any one of claims 1 to 7.
CN201911357829.6A 2019-12-25 2019-12-25 Voiceprint recognition method, system, mobile terminal and storage medium Pending CN111145758A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911357829.6A CN111145758A (en) 2019-12-25 2019-12-25 Voiceprint recognition method, system, mobile terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911357829.6A CN111145758A (en) 2019-12-25 2019-12-25 Voiceprint recognition method, system, mobile terminal and storage medium

Publications (1)

Publication Number Publication Date
CN111145758A true CN111145758A (en) 2020-05-12

Family

ID=70520061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911357829.6A Pending CN111145758A (en) 2019-12-25 2019-12-25 Voiceprint recognition method, system, mobile terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111145758A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783939A (en) * 2020-05-28 2020-10-16 厦门快商通科技股份有限公司 Voiceprint recognition model training method and device, mobile terminal and storage medium
CN111835522A (en) * 2020-05-19 2020-10-27 北京捷通华声科技股份有限公司 Audio processing method and device
CN112269897A (en) * 2020-10-20 2021-01-26 上海明略人工智能(集团)有限公司 Method and device for determining voice acquisition equipment
CN113488059A (en) * 2021-08-13 2021-10-08 广州市迪声音响有限公司 Voiceprint recognition method and system
CN113744727A (en) * 2021-07-16 2021-12-03 厦门快商通科技股份有限公司 Model training method, system, terminal device and storage medium
CN115862638A (en) * 2023-03-01 2023-03-28 北京海上升科技有限公司 Financial transaction operation and big data secure storage method and system based on block chain

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
US20060111905A1 (en) * 2004-11-22 2006-05-25 Jiri Navratil Method and apparatus for training a text independent speaker recognition system using speech data with text labels
CN101506828A (en) * 2006-06-09 2009-08-12 索尼爱立信移动通讯股份有限公司 Media identification
CN102238189A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system
CN103456304A (en) * 2012-05-31 2013-12-18 新加坡科技研究局 Method and system for dual scoring for text-dependent speaker verification
CN109473107A (en) * 2018-12-03 2019-03-15 厦门快商通信息技术有限公司 A kind of relevant method for recognizing sound-groove of text half and system
US10255922B1 (en) * 2013-07-18 2019-04-09 Google Llc Speaker identification using a text-independent model and a text-dependent model
CN109616124A (en) * 2019-01-25 2019-04-12 厦门快商通信息咨询有限公司 Lightweight method for recognizing sound-groove and system based on ivector
CN111835522A (en) * 2020-05-19 2020-10-27 北京捷通华声科技股份有限公司 Audio processing method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
US20060111905A1 (en) * 2004-11-22 2006-05-25 Jiri Navratil Method and apparatus for training a text independent speaker recognition system using speech data with text labels
CN101506828A (en) * 2006-06-09 2009-08-12 索尼爱立信移动通讯股份有限公司 Media identification
CN102238189A (en) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 Voiceprint password authentication method and system
CN103456304A (en) * 2012-05-31 2013-12-18 新加坡科技研究局 Method and system for dual scoring for text-dependent speaker verification
US10255922B1 (en) * 2013-07-18 2019-04-09 Google Llc Speaker identification using a text-independent model and a text-dependent model
CN109473107A (en) * 2018-12-03 2019-03-15 厦门快商通信息技术有限公司 A kind of relevant method for recognizing sound-groove of text half and system
CN109616124A (en) * 2019-01-25 2019-04-12 厦门快商通信息咨询有限公司 Lightweight method for recognizing sound-groove and system based on ivector
CN111835522A (en) * 2020-05-19 2020-10-27 北京捷通华声科技股份有限公司 Audio processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李弼程: "《模式识别原理与应用》", 28 February 2008 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111835522A (en) * 2020-05-19 2020-10-27 北京捷通华声科技股份有限公司 Audio processing method and device
CN111783939A (en) * 2020-05-28 2020-10-16 厦门快商通科技股份有限公司 Voiceprint recognition model training method and device, mobile terminal and storage medium
CN112269897A (en) * 2020-10-20 2021-01-26 上海明略人工智能(集团)有限公司 Method and device for determining voice acquisition equipment
CN112269897B (en) * 2020-10-20 2024-04-05 上海明略人工智能(集团)有限公司 Method and device for determining voice acquisition equipment
CN113744727A (en) * 2021-07-16 2021-12-03 厦门快商通科技股份有限公司 Model training method, system, terminal device and storage medium
CN113744727B (en) * 2021-07-16 2023-12-26 厦门快商通科技股份有限公司 Model training method, system, terminal equipment and storage medium
CN113488059A (en) * 2021-08-13 2021-10-08 广州市迪声音响有限公司 Voiceprint recognition method and system
CN115862638A (en) * 2023-03-01 2023-03-28 北京海上升科技有限公司 Financial transaction operation and big data secure storage method and system based on block chain
CN115862638B (en) * 2023-03-01 2023-12-12 北京海上升科技有限公司 Big data safe storage method and system based on block chain

Similar Documents

Publication Publication Date Title
CN111145758A (en) Voiceprint recognition method, system, mobile terminal and storage medium
US6401063B1 (en) Method and apparatus for use in speaker verification
CN108447471B (en) Speech recognition method and speech recognition device
CN111243603B (en) Voiceprint recognition method, system, mobile terminal and storage medium
CN109147797B (en) Customer service method, device, computer equipment and storage medium based on voiceprint recognition
CN109410664B (en) Pronunciation correction method and electronic equipment
KR100655491B1 (en) Two stage utterance verification method and device of speech recognition system
KR101259558B1 (en) apparatus and method for detecting sentence boundaries
US11348590B2 (en) Methods and devices for registering voiceprint and for authenticating voiceprint
CN110265037B (en) Identity verification method and device, electronic equipment and computer readable storage medium
US8340429B2 (en) Searching document images
CN111881297B (en) Correction method and device for voice recognition text
JP4136316B2 (en) Character string recognition device
CN109033212B (en) Text classification method based on similarity matching
CN111783939A (en) Voiceprint recognition model training method and device, mobile terminal and storage medium
CN111312259B (en) Voiceprint recognition method, system, mobile terminal and storage medium
CN112927679A (en) Method for adding punctuation marks in voice recognition and voice recognition device
CN112309406A (en) Voiceprint registration method, voiceprint registration device and computer-readable storage medium
CN111429921B (en) Voiceprint recognition method, system, mobile terminal and storage medium
US6499012B1 (en) Method and apparatus for hierarchical training of speech models for use in speaker verification
CN113051384A (en) User portrait extraction method based on conversation and related device
CN113887239A (en) Statement analysis method and device based on artificial intelligence, terminal equipment and medium
CN111798841A (en) Acoustic model training method and system, mobile terminal and storage medium
US20240126851A1 (en) Authentication system and method
CN116913279A (en) Voice instruction recognition method and device, electronic equipment and vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512