CN102509547B - Method and system for voiceprint recognition based on vector quantization based - Google Patents
Method and system for voiceprint recognition based on vector quantization based Download PDFInfo
- Publication number
- CN102509547B CN102509547B CN2011104503646A CN201110450364A CN102509547B CN 102509547 B CN102509547 B CN 102509547B CN 2011104503646 A CN2011104503646 A CN 2011104503646A CN 201110450364 A CN201110450364 A CN 201110450364A CN 102509547 B CN102509547 B CN 102509547B
- Authority
- CN
- China
- Prior art keywords
- speaker
- code word
- code book
- sound
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Telephonic Communication Services (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method and a system for voiceprint recognition based on vector quantization, which have high recognition performance and noise immunity, are effective in recognition, require few modeling data, and are quick in judgment speed and low in complexity. The method includes steps: acquiring audio signals; preprocessing the audio signals; extracting audio signal characteristic parameters by using MFCC (mel-frequency cepstrum coefficient) parameters, wherein the order of the MFCC ranges from 12 to 16; template training, namely using the LBG (linde, buzo and gray) clustering algorithm to set up a codebook for each speaker and store the codebooks in an audio data base to be used as the audio templates of the speakers; voiceprint recognizing, namely comparing acquired characteristic parameters of the audio signals to be recognized with the speaker audio templates set up in the audio data base and judging according to weighting Euclidean distance measure, and if the corresponding speaker template enables the audio characteristic vector X of a speaker to be recognized to have the minimum average distance measure, the speaker is supposed to be recognized.
Description
Technical field
The invention belongs to voice process technology, particularly a kind of voice signal with the speaker comes method for recognizing sound-groove and the system based on vector quantization of identification speaker ' s identity.
Background technology
In recent years, widespread use along with information processing and artificial intelligence technology, and people are to the effectively an urgent demand of authentication fast, the identification of conventional cipher authentication has lost his status gradually, and in field of biological recognition, but be subject to increasing people's favor based on the identity recognizing technology of speaker's voice.
Due to everyone differences of Physiological of vocal organs and the behavior difference that form the day after tomorrow causes articulation type and the custom of speaking is different, therefore identifying identity with speaker's voice becomes possibility.The advantages such as Application on Voiceprint Recognition can not forget except having, need not remember, easy to use, also have following properties: at first, its authentication mode is easy to accept, and " password " that uses is sound, opening and get final product; Secondly, the content of identification text can be random, is difficult for stealing, and security performance is higher; The 3rd, the terminal device that uses of identification is microphone or phone, and is with low cost and be easy to combine with the existing communication system.Therefore, the application prospect of Application on Voiceprint Recognition is boundless: in economic activity, can realize each bank remittance, inquiry into balance, transfer accounts etc.; In secret and safe, can check the personnel in secret place with the sound of appointment, it responds the speaker dependent; In judicial expertise, can judge according to instantaneous recording the true identity of criminal in the suspect; In biomedicine, can make this system only respond patient's order, thereby realize the control to user's artificial limb.
The gordian technique of Application on Voiceprint Recognition is mainly phonic signal character parameter extraction and Model Matching.The phonic signal character parameter can be divided into two classes substantially: a class is the low-level feature of major embodiment speaker vocal organs physiological property, as the Mel frequency cepstral coefficient (MFCC) that the sensitivity of the voice signal of different frequency is extracted according to people's ear, the linear prediction cepstrum coefficient coefficient (LPCC) that obtains according to the all-pole modeling of voice signal etc.; Another kind of is the high-level characteristic of major embodiment speaker term custom, pronunciation characteristic, as the prosodic features (Prosodic Features) that reflects the modulation in tone of speaker's voice, the phoneme feature (Phone Features) that reflects phoneme statistical law in speaker's idiom etc.LPCC is based on that the pronunciation model of voice signal sets up, and easily is subject to the impact of hypothesized model, although use in some document of high-level characteristic, discrimination is not very high.
The Model Matching method that proposes for various phonic signal character parameters mainly contains dynamic time warping (DTW) method, vector quantization (VQ) method, gauss hybrid models (GMM) method, artificial neural network (ANN) method etc.Wherein the DTW model depends on the time sequencing of parameter, and real-time performance is relatively poor, is fit to the Speaker Identification based on isolated word (word); GMM is mainly used in the Speaker Identification of a large amount of voice, needs more model training data, training time and the recognition time grown, but also need larger memory headroom.In the ANN model, might not guarantee convergence to the training algorithm of the design of best model topological structure, and can have the problem of study.In the Speaker Identification based on VQ, template matches does not rely on the time sequencing of parameter, and real-time is relatively good, and modeling data is few, and judgement speed is fast, and complexity is not high yet.Speaker Identification principle based on the vector quantization model is that each speaker's phonic signal character parameter quantification is become code book, be kept in sound bank the sound template as the speaker, sound template with existing some speakers in the eigenvector of voice to be identified and sound bank during identification compares, calculate overall average quantizing distortion separately, with the sound template of minimum distortion as recognition result.Be into elliptoid normal distribution yet weak point is voice signal, the distribution of each vector is unequal, does not obtain the reaction of arriving very much in estimating based on the Euclidean distance of traditional VQ Speaker Recognition System.
Summary of the invention
The technical problem to be solved in the present invention is to propose a kind of method for recognizing sound-groove and system based on vector quantization, has good recognition performance and anti-noise ability, and recognition effect is relatively good, and modeling data is few, and judgement speed is fast, and complexity is not high.
A kind of method for recognizing sound-groove based on vector quantization, concrete steps are as follows:
1, the collection of voice signal: as the terminal device that gathers voice, gather voice signal by sound card with the phone of programme-controlled exchange comprehensive experiment box;
2, voice signal pre-service: divide frame windowing operation by computing machine with the voice signal that extracts, a frame comprises 256 sampled points in minute frame process, and it is 128 sampled points that frame moves, and added window function is Hamming window; End-point detection adopts the end-point detection method that combines based on short-time energy and short-time zero-crossing rate; Pre-emphasis, the value that increases the weight of coefficient is 0.90 ~ 1.00;
3, phonic signal character parameter extraction: adopt the MFCC parameter, the exponent number of MFCC is 12 ~ 16;
4, template training: adopting the LBG clustering algorithm is that each speaker in system sets up a code book and is stored in speech database sound template as this speaker;
5, sound-groove identification: compare by speaker's sound template of having set up by step 1,2,3,4 in the phonic signal character parameter to be identified that will collect and storehouse, and estimate according to weighted euclidean distance and judge, if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker.
Above-mentioned phonic signal character parameter extraction step is as follows:
(1) pretreated voice signal is carried out Short Time Fourier Transform and obtain its frequency spectrum X (k), the DFT formula of voice signal is:
Wherein,
Be the voice signal take frame as unit of input, N is counting of Fourier transform, gets 256;
(2) ask frequency spectrum
Square, i.e. energy spectrum
, then undertaken smoothly by the frequency spectrum of Mel frequency filter to voice signal, and harmonic carcellation, highlight the resonance peak of original voice;
The Mel frequency filter is one group of V-belt bandpass filter, and centre frequency is
,
=1,2 ..., Q, Q are the number of V-belt bandpass filter, the Mel wave filter
Be expressed as follows:
(3) the Mel frequency spectrum of bank of filters output is taken the logarithm: the dynamic range of compressed voice spectrum; The property the taken advantage of composition conversion of noise in frequency domain is become the additivity composition, logarithm Mel frequency spectrum
As follows:
(4) discrete cosine transform (DCT)
Logarithm Mel frequency spectrum with formula (3) acquisition
Transform to time domain, its result is Mel frequency cepstral coefficient (MFCC), n coefficient
The formula that is calculated as follows:
Wherein, L is the exponent number of MFCC parameter, and Q is the number of Mel wave filter, and L gets 12 ~ 16, Q and gets 23 ~ 26;
During above-mentioned template training the concrete steps of LBG clustering algorithm that adopt as follows:
(1) obtain all trained vector X in the eigenvector set S of input, and the code word by the given initial codebook of division codebook method
(2) utilize a less threshold value
,
, will
Be divided into two, the method for division is followed following rule:
(3) according to the most contiguous criterion, seek nearest code word for the code word of new code book, at last S is divided into the m subset, namely work as
The time,
In formula, M is the number of code word in current initial codebook;
(4) calculate the barycenter of eigenvector in every subset, and replace code word in this set with this barycenter, so just obtained new code book;
(5) by (3), (4) go on foot the iterative computation of carrying out, and obtain the code word of new code book
,
(6) and then repeated for (2) step, the code word that newly obtains respectively is divided into two, then again by (3), (4) stepping row iteration is calculated, and so continues, until required code book code word number is
, r is integer, need to do altogether the above-mentioned circular treatment of r wheel, until cluster is complete, at this moment, all kinds of barycenter is required code word.
Initial codebook in above-mentioned LBG clustering algorithm adopts the division codebook method to carry out the code book initialization, and detailed process is as follows:
The average of the eigenvector of all frames that (1) will extract is as the code word of initial codebook
Wherein m is the code word number that changes to current code book from 1,
Parameter when being division is got
(3) according to new code word, all eigenvectors are carried out cluster, then calculate total distance measure D and
:
Be total distance measure of next iteration,
Be training characteristics vector X and training code book out
Between distance measure;
If
(
), stopping iterative computation, current code book is exactly the code book that designs, otherwise, turn next step.
(4) recomputate the new barycenter of regional;
(5) repeat (3) step and (4) step, until form the code book of the best of a 2m code word;
(6) repetition (2), (3) and (4) step are until be formed with the code book of M code word;
During above-mentioned discrete cosine transform, L=13, Q=25.
A kind of Voiceprint Recognition System based on vector quantization, composed as follows:
Speech signal collection module, voice signal pretreatment module, phonic signal character parameter extraction module, sound template training module and voiceprint identification module.
The present invention's beneficial effect compared with prior art is:
Gather voice signal by sound card, utilize voice process technology to carry out pre-service to the voice signal that collects, then extract the phonic signal character parameter, build a Speaker Recognition System thereby utilize vector quantization technology to set up speech model to the phonic signal character parameter that obtains.Adopt the MFCC parameter, have good recognition performance and anti-noise ability and can fully simulate the auditory perceptual ability, the most useful speaker information is included in the 2nd rank of MFCC parameter between 16 rank in Speaker Identification; By adopting vector quantization (VQ) method, have good recognition performance and anti-noise ability, real-time, recognition effect is good, and modeling data is few, and algorithm is simple, and judgement speed is fast, and complexity is not high.
Description of drawings
Fig. 1 is system chart of the present invention;
Fig. 2 is main flow chart of the present invention;
Fig. 3 is the LBG algorithm flow chart;
Fig. 4 is based on the Application on Voiceprint Recognition human-computer interaction interface of VQ.
Embodiment
As shown in Figure 1, should be based on the Voiceprint Recognition System of vector quantization, complete identification to speaker's voice by software and hardware combining, composed as follows:
Speech signal collection module, voice signal pretreatment module, phonic signal character parameter extraction module, speech model training module and voiceprint identification module.
As Fig. 2~shown in Figure 3, should be as follows based on concrete steps of the method for recognizing sound-groove of vector quantization:
1, the collection of voice signal
The collection of voice signal is that original voice analog signal is converted to digital signal, channel number, sample frequency are set, the present invention carries out the collection of voice signal with the SHT-8B/PCI type sound card that adopts Hangzhou San Hui company to produce, channel number is 2 (sound card default channel number be 2), and sample frequency is 8KHz (sound card acquiescence sample frequency).The terminal device of identification is the telephone set of experiment with the programme-controlled exchange comprehensive experiment box, and the programme-controlled exchange experimental box exchanged form be space switching, speech channel is first two tunnel (totally four tunnel: Jia Yilu, first two tunnel, second one tunnel, second two tunnel, the present invention chooses Jia Erlu at random, on experimental result without the impact).
2, the pre-service of voice signal
(1) windowing divides frame
The time-varying characteristics of voice signal determine it is processed and must carry out on a bit of voice, therefore to divide frame to process to it, simultaneously in order to guarantee that voice signal can not cause because of minute frame the loss of information, to guarantee certain overlapping between frame and frame, be that frame moves, frame move and the ratio of frame length generally between 0 ~ 1/2.The frame length that uses in the present invention is 256 sampled points, and it is 128 sampled points that frame moves.Window function
Adopt smoothness properties Hamming window function preferably, as follows:
(10)
In formula, N is length of window, and the present invention is 256 points.
(2) end-point detection
The present invention adopts the end-point detection method that combines based on short-time energy and short-time average zero-crossing rate to carry out end-point detection to voice signal, thus the starting point and ending point of judgement voice signal.Short-time energy detects voiced sound, and zero-crossing rate detects voiceless sound.Suppose
Be voice signal,
Be Hamming window function, define short-time energy
For
(3) pre-emphasis
Be subject to the impact of glottal excitation and mouth and nose radiation due to the average power spectra of voice signal, front end falls by 6dB/ times of journey more than 8000Hz greatly, will carry out the HFS that pre-emphasis processes to promote voice signal for this reason, makes the frequency spectrum of signal become smooth.Pre-emphasis realizes with the digital filter that having of 6dB/ times of journey promotes high frequency characteristics, and it is generally the digital filter of single order
, namely
Wherein u value discrimination of system between 0.90 ~ 1.00 is the highest, and the present invention gets u=0.97.
3, phonic signal character parameter extraction
The phonic signal character parameter extraction is exactly to extract the parameter that can reflect speaker's individual character from speaker's voice signal, and detailed process is as follows:
(1) pretreated voice signal is carried out Short Time Fourier Transform (DFT) and obtain its frequency spectrum X (k).The DFT formula of voice signal is:
Wherein,
Be the voice signal take frame as unit of input, N is counting of Fourier transform, gets 256.
(2) ask frequency spectrum
Square, i.e. energy spectrum
, then with them by the Mel wave filter, the frequency spectrum of voice signal is carried out smoothly realizing, and harmonic carcellation, highlight the resonance peak of original voice.
The Mel frequency filter is one group of V-belt bandpass filter, and centre frequency is
,
=1,2 ..., Q, Q are the number of V-belt bandpass filter, the Mel wave filter
Be expressed as follows:
(3) output of bank of filters is taken the logarithm: the dynamic range of compressed voice spectrum; The property the taken advantage of composition conversion of noise in frequency domain is become the additivity composition, the logarithm Mel frequency spectrum that obtains
As follows:
(4) discrete cosine transform (DCT)
Mel frequency spectrum with the above-mentioned steps acquisition
Transform to time domain, its result is exactly Mel frequency cepstral coefficient (MFCC).N coefficient
The formula that is calculated as follows:
(17)
Wherein, L is the exponent number of MFCC, and Q is the number of Mel wave filter, and both value is often decided according to the experiment situation.The present embodiment is got L=13, Q=25, and reality is not limited by the present embodiment.
4, template training
(1) ultimate principle
In Application on Voiceprint Recognition, be generally first to use the code book of vector quantization as speaker's sound template, namely each speaker's voice in system, be quantified as a code book and deposit in sound bank as this speaker's sound template.For the speech characteristic vector sequential extraction procedures characteristic parameter of any input, calculate this speech characteristic parameter to the overall average distortion quantization error of each sound template during identification, the corresponding speaker of the template of total mean error minimum is recognition result.
(2) distance measure
If the K dimensional feature vector of unknown pattern is X, compare with certain K dimension code word vector Y in code book,
Represent respectively the same one dimension component of X and Y, Euclidean distance is estimated
For:
Each component for traditional Euclidean distance Measure Characteristics vector is equal weight, this NATURAL DISTRIBUTION of only having when eigenvector is spherical or when spherical, that is to say when the distribution of each component of eigenvector just can obtain recognition effect preferably when equal.And voice signal is into elliptoid normal distribution, and the distribution of each vector is unequal, and they are not well reacted in Euclidean distance is estimated, if directly adopt Euclidean distance to estimate, the speaker is adjudicated, and the discrimination of system will be affected.
The present invention adopts the MFCC on 13 rank, in order to embody them in the difference contribution of cluster, adopt the Euclidean distance of weighting to estimate, give different weights to the vector of different distributions, the more discrete vector that distributes is given very little weight, and the vector of concentrating is given very large weight for distributing.The dispersion degree that distributes is weighed to the Euclidean distance of cluster centre (vector average) with vector, weighting factor
For:
K in following formula is the dimension of eigenvector.When training and identification, the Euclidean distance that obtains is carried out descending sort, then carry out pre-emphasis with weighting factor, be equivalent to the Euclidean distance that adopts not weighting when training and identification on this process nature, and the component of respectively tieing up of eigenvector is carried out pre-emphasis with scale factor, like this to the very high vector that destruction character is arranged of sequence, give very little weight as isolated point or noise, and give larger weight to the very low good vector of sequence, thereby each vector is well embodied the contribution of identifying.
(3) template training
The LBG algorithm that is based on disintegrating method that the present invention adopts, concrete steps are as follows:
1) obtain all trained vector X in the eigenvector set S of input, and by dividing the code word of code book (code book is vector set, or perhaps the set of code word) the given initial codebook of method
2) utilize a less threshold value
(
) will
Be divided into two, the method for division is followed following rule:
3) according to the most contiguous criterion, seek nearest code word for the code word of new code book, at last S is divided into the m subset, namely work as
The time,
In formula, M is the number of code word in current initial codebook;
4) calculate the barycenter of eigenvector in every subset, and replace code word in this set with this barycenter, so just obtained new code book;
5) by the 3rd), 4) go on foot the iterative computation of carrying out, obtain the code word of new code book
,
6) and then repeat the 2nd) step, the code word that newly obtains respectively is divided into two, then again by the 3rd), 4) the stepping row iteration calculates, so continues, until required code book code word number is
(r is integer) need to do the above-mentioned circular treatment of r wheel altogether, until cluster is complete, at this moment, all kinds of barycenter is required code word.
Initial codebook in above-mentioned LBG clustering algorithm adopts the division codebook method to carry out the code book initialization, and detailed process is as follows:
(22)
Wherein m is the code word number that changes to current code book from 1,
Parameter when being division, the present invention gets
3. according to new code word, all eigenvectors are carried out cluster, then calculate total distance measure D and
:
(23)
Be total distance measure of next iteration,
Be training characteristics vector X and training code book out
Between distance measure.
If
, stopping iterative computation, current code book is exactly the code book that designs, otherwise, turn next step;
4. recomputate the new barycenter of regional;
5. repeat 3. and 4., until form the code book of the best of a 2m code word;
6. repeat 2., 3. and 4., until be formed with the code book of M code word;
5, sound-groove identification
(1) extracting length is the feature vector sequence of speaker's voice signal to be identified of T
, the code book in formed sound bank of training stage is:
(N represents speaker's number).
(2) distance measure between existing speaker's sound template in calculated characteristics vector and storehouse, namely obtain
:
In formula, j represents in X
The eigenvector of frame, m represent i speaker's m code word, total M code word, and K is the dimension of eigenvector.Weighting factor
For:
Native system belongs to closed set identification, that is to say that all speakers to be identified belong to known speaker's set.The human-computer interaction interface of Speaker Identification as shown in Figure 4.In the human-computer interaction interface of Voiceprint Recognition System, " demonstration of sound card state " List View shows the available voice channel of current speech card number and channel status; " speech samples storehouse " List View shows speaker's number of samples and the speaker's name in the current speech Sample Storehouse." setting of Application on Voiceprint Recognition parameter " hurdle shows the parameter that voice collecting will arrange, and comprising: training duration (acquiescence 23s), length of testing speech (acquiescence 15s) and candidate's number (acquiescence 1).
Be specifically described below in conjunction with example: suppose to have deposited in advance in the speech samples storehouse 100 people's voice, when an XX puts through phone, the process how its sound is identified.
If 1 XX does not belong to known speech samples storehouse
(1) collection of voice signal: as the terminal device that gathers voice, gather voice by sound card with the phone of programme-controlled exchange comprehensive experiment box;
At first, " training duration " parameter (scope: 10-39s), then add speaker's name " XX " in the name edit box, click " adding the speaker " button of the training utterance that needs collection is set.After interpolation is completed, click " is determined ", then put through the phone (number: 8700) of programme-controlled exchange comprehensive experiment box, after connection, the state of sound card passage 2 (being defaulted as passage 2) is updated to " in recording ", and this moment, sound card just can gather voice.The voice that gather reach predetermined training duration, phone meeting auto-hang up;
(2) pre-service of voice signal: divide frame windowing operation by computing machine and VC software in conjunction with the voice signal that will extract, a frame comprises 256 sampled points in minute frame process, and it is 128 sampled points that frame moves, and added window function is Hamming window; End-point detection adopts the detection method that combines based on short-time energy and short-time zero-crossing rate method; Pre-emphasis, the value that increases the weight of coefficient is 0.97;
(3) extract the phonic signal character parameter: utilize computing machine to be combined the MFCC parameter on extraction 13 rank with VC software;
(4) template training: utilize the division codebook method to carry out initialization to code book, then adopting the LBG clustering algorithm is that each speaker in system sets up a code book and is stored in speech database sound template as this speaker;
(5) Speaker Identification
At first, " length of testing speech " parameter (scope: 5-20s), put through the phone (number: 8700), utilize sound card (passage is 2) to gather voice of programme-controlled exchange comprehensive experiment box of the tested speech that needs collection is set.The voice that gather reach predetermined length of testing speech, phone meeting auto-hang up;
then software forbids that " carrying out speaker's identification " button uses, voice to the speaker carry out step (2), (3) operation, the speaker's to be tested that will extract at last voice and the sound template in the storehouse compare, click " carrying out speaker's identification " button, the number of candidates that selection will show (scope 1-3), if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker, show simultaneously identification result " XX " and resolution on " speaker's identification " view list.
If 2 XX belong to known speech samples storehouse
Speaker's identification is directly carried out in the storehouse if XX belongs to known speech samples: at first, " length of testing speech " parameter (scope: 5-20s) of the tested speech that needs collection is set, put through the phone (number: 8700), utilize sound card (passage is 2) to gather voice of programme-controlled exchange comprehensive experiment box.The voice that gather reach predetermined length of testing speech, phone meeting auto-hang up;
Then software forbids that " carrying out speaker's identification " button uses, speaker's voice are carried out the operation of step (2), (3), the speaker's to be tested that will extract at last voice and the sound template in the storehouse compare, if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker, show identification result " XX " and resolution simultaneously on " speaker's identification " view list.
Claims (2)
1. the method for recognizing sound-groove based on vector quantization, is characterized in that, concrete steps are as follows:
(1), the collection of voice signal: as the terminal device that gathers voice, gather voice signal by sound card with the phone of programme-controlled exchange comprehensive experiment box;
(2), voice signal pre-service: divide frame windowing operation by computing machine with the voice signal that extracts, a frame comprises 256 sampled points in minute frame process, and it is 128 sampled points that frame moves, and added window function is Hamming window; End-point detection adopts the end-point detection method that combines based on short-time energy and short-time zero-crossing rate; Pre-emphasis, the value that increases the weight of coefficient is 0.90 ~ 1.00;
(3), phonic signal character parameter extraction: adopt the MFCC parameter, the exponent number of MFCC is 12 ~ 16;
(4), template training: adopting the LBG clustering algorithm is that each speaker in system sets up a code book and is stored in speech database sound template as this speaker, the concrete steps of LBG clustering algorithm that adopt as follows:
(4.1) obtain all trained vector X in the eigenvector set S of input, and the code word by the given initial codebook of division codebook method
(4.2) utilize a less threshold value
,
, will
Be divided into two, the method for division is followed following rule:
(4.3) according to the most contiguous criterion, seek nearest code word for the code word of new code book, at last S is divided into the m subset, namely work as
The time,
In formula, M is the number of code word in current initial codebook;
(4.4) calculate the barycenter of eigenvector in every subset, and replace code word in this set with this barycenter, so just obtained new code book;
(4.5) go on foot the iterative computation of carrying out by the 3rd step, the 4th, obtain the code word of new code book
,
(4.6) and then repeated for the 2nd step, the code word that newly obtains respectively is divided into two, then calculates by the 3rd step, the 4th stepping row iteration again, so continue, until required code book code word number is
, r is integer, need to do altogether the above-mentioned circular treatment of r wheel, until cluster is complete, at this moment, all kinds of barycenter is required code word.
(5), sound-groove identification: compare by speaker's sound template of having set up by the 1st the~the 4 step of step in the phonic signal character parameter to be identified that will collect and storehouse, and estimate according to weighted euclidean distance and judge, if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker.
2. the method for recognizing sound-groove based on vector quantization according to claim 1, is characterized in that, the initial codebook in the LBG clustering algorithm adopts the division codebook method to carry out the code book initialization, and detailed process is as follows:
The average of the eigenvector of all frames that (1) will extract is as the code word of initial codebook
(7)
Wherein m is the code word number that changes to current code book from 1,
Parameter when being division is got
(3) according to new code word, all eigenvectors are carried out cluster, then calculate total distance measure D and
:
Be total distance measure of next iteration,
Be training characteristics vector X and training code book out
Between distance measure;
If
(
), stopping iterative computation, current code book is exactly the code book that designs, otherwise, turn next step
(4) recomputate the new barycenter of regional;
(5) repeat the 3rd step and the 4th step, until form the code book of the best of a 2m code word;
(6) the 2nd, the 3rd step of repetition, the 4th step are until be formed with the code book of M code word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104503646A CN102509547B (en) | 2011-12-29 | 2011-12-29 | Method and system for voiceprint recognition based on vector quantization based |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104503646A CN102509547B (en) | 2011-12-29 | 2011-12-29 | Method and system for voiceprint recognition based on vector quantization based |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102509547A CN102509547A (en) | 2012-06-20 |
CN102509547B true CN102509547B (en) | 2013-06-19 |
Family
ID=46221622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011104503646A Expired - Fee Related CN102509547B (en) | 2011-12-29 | 2011-12-29 | Method and system for voiceprint recognition based on vector quantization based |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102509547B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102810A (en) * | 2017-06-21 | 2018-12-28 | 北京搜狗科技发展有限公司 | Method for recognizing sound-groove and device |
Families Citing this family (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103794207A (en) * | 2012-10-29 | 2014-05-14 | 西安远声电子科技有限公司 | Dual-mode voice identity recognition method |
CN103714826B (en) * | 2013-12-18 | 2016-08-17 | 讯飞智元信息科技有限公司 | Formant automatic matching method towards vocal print identification |
CN103794219B (en) * | 2014-01-24 | 2016-10-05 | 华南理工大学 | A kind of Codebook of Vector Quantization based on the division of M code word generates method |
CN104485102A (en) * | 2014-12-23 | 2015-04-01 | 智慧眼(湖南)科技发展有限公司 | Voiceprint recognition method and device |
CN105989842B (en) * | 2015-01-30 | 2019-10-25 | 福建星网视易信息系统有限公司 | The method, apparatus for comparing vocal print similarity and its application in digital entertainment VOD system |
CN104994400A (en) * | 2015-07-06 | 2015-10-21 | 无锡天脉聚源传媒科技有限公司 | Method and device for indexing video by means of acquisition of host name |
CN106340298A (en) * | 2015-07-06 | 2017-01-18 | 南京理工大学 | Voiceprint unlocking method integrating content recognition and speaker recognition |
CN105304087B (en) * | 2015-09-15 | 2017-03-22 | 北京理工大学 | Voiceprint recognition method based on zero-crossing separating points |
CN105355206B (en) * | 2015-09-24 | 2020-03-17 | 车音智能科技有限公司 | Voiceprint feature extraction method and electronic equipment |
US10262654B2 (en) * | 2015-09-24 | 2019-04-16 | Microsoft Technology Licensing, Llc | Detecting actionable items in a conversation among participants |
CN105355195A (en) * | 2015-09-25 | 2016-02-24 | 小米科技有限责任公司 | Audio frequency recognition method and audio frequency recognition device |
CN106920558B (en) * | 2015-12-25 | 2021-04-13 | 展讯通信(上海)有限公司 | Keyword recognition method and device |
CN106971726A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of adaptive method for recognizing sound-groove and system based on code book |
CN106971729A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope |
CN106971711A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of adaptive method for recognizing sound-groove and system |
CN106971735B (en) * | 2016-01-14 | 2019-12-03 | 芋头科技(杭州)有限公司 | A kind of method and system regularly updating the Application on Voiceprint Recognition of training sentence in caching |
CN106971712A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of adaptive rapid voiceprint recognition methods and system |
CN106981287A (en) * | 2016-01-14 | 2017-07-25 | 芋头科技(杭州)有限公司 | A kind of method and system for improving Application on Voiceprint Recognition speed |
CN105931637A (en) * | 2016-04-01 | 2016-09-07 | 金陵科技学院 | User-defined instruction recognition speech photographing system |
CN106057212B (en) * | 2016-05-19 | 2019-04-30 | 华东交通大学 | Driving fatigue detection method based on voice personal characteristics and model adaptation |
CN106448682A (en) * | 2016-09-13 | 2017-02-22 | Tcl集团股份有限公司 | Open-set speaker recognition method and apparatus |
CN107945807B (en) * | 2016-10-12 | 2021-04-13 | 厦门雅迅网络股份有限公司 | Voice recognition method and system based on silence run |
CN108269573A (en) * | 2017-01-03 | 2018-07-10 | 蓝盾信息安全技术有限公司 | Speaker Recognition System based on vector quantization and gauss hybrid models |
CN106847292B (en) | 2017-02-16 | 2018-06-19 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove and device |
CN107039036B (en) * | 2017-02-17 | 2020-06-16 | 南京邮电大学 | High-quality speaker recognition method based on automatic coding depth confidence network |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN107799114A (en) * | 2017-04-26 | 2018-03-13 | 珠海智牧互联科技有限公司 | A kind of pig cough sound recognition methods and system |
CN107993663A (en) * | 2017-09-11 | 2018-05-04 | 北京航空航天大学 | A kind of method for recognizing sound-groove based on Android |
CN108022584A (en) * | 2017-11-29 | 2018-05-11 | 芜湖星途机器人科技有限公司 | Office Voice identifies optimization method |
CN107993661A (en) * | 2017-12-07 | 2018-05-04 | 浙江海洋大学 | The method and system that a kind of anti-spoken language impersonates |
CN108417226A (en) * | 2018-01-09 | 2018-08-17 | 平安科技(深圳)有限公司 | Speech comparison method, terminal and computer readable storage medium |
CN108460081B (en) * | 2018-01-12 | 2019-07-12 | 平安科技(深圳)有限公司 | Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium |
CN110047491A (en) * | 2018-01-16 | 2019-07-23 | 中国科学院声学研究所 | A kind of relevant method for distinguishing speek person of random digit password and device |
CN108922541B (en) * | 2018-05-25 | 2023-06-02 | 南京邮电大学 | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models |
CN109147798B (en) * | 2018-07-27 | 2023-06-09 | 北京三快在线科技有限公司 | Speech recognition method, device, electronic equipment and readable storage medium |
CN109146002B (en) * | 2018-09-30 | 2021-06-01 | 佛山科学技术学院 | Quick identification method of GMM (Gaussian mixture model) identifier |
CN109841229A (en) * | 2019-02-24 | 2019-06-04 | 复旦大学 | A kind of Neonate Cry recognition methods based on dynamic time warping |
CN110889009B (en) * | 2019-10-18 | 2023-07-21 | 平安科技(深圳)有限公司 | Voiceprint clustering method, voiceprint clustering device, voiceprint processing equipment and computer storage medium |
CN111128198B (en) * | 2019-12-25 | 2022-10-28 | 厦门快商通科技股份有限公司 | Voiceprint recognition method, voiceprint recognition device, storage medium, server and voiceprint recognition system |
CN111341327A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Speaker voice recognition method, device and equipment based on particle swarm optimization |
CN111583938B (en) * | 2020-05-19 | 2023-02-03 | 威盛电子股份有限公司 | Electronic device and voice recognition method |
CN113611284B (en) * | 2021-08-06 | 2024-05-07 | 工银科技有限公司 | Speech library construction method, speech library recognition method, speech library construction system and speech library recognition system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011004098A1 (en) * | 2009-07-07 | 2011-01-13 | France Telecom | Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals |
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
-
2011
- 2011-12-29 CN CN2011104503646A patent/CN102509547B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011004098A1 (en) * | 2009-07-07 | 2011-01-13 | France Telecom | Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals |
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
Non-Patent Citations (1)
Title |
---|
张彩娟,霍春宝,吴峰,韦春丽.《改进K-means算法在声纹识别中的应用》.《辽宁工业大学学报》.2011,第31卷(第5期),第1-4节. * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102810A (en) * | 2017-06-21 | 2018-12-28 | 北京搜狗科技发展有限公司 | Method for recognizing sound-groove and device |
Also Published As
Publication number | Publication date |
---|---|
CN102509547A (en) | 2012-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN102324232A (en) | Method for recognizing sound-groove and system based on gauss hybrid models | |
CN102800316B (en) | Optimal codebook design method for voiceprint recognition system based on nerve network | |
CN102820033B (en) | Voiceprint identification method | |
Chavan et al. | An overview of speech recognition using HMM | |
CN101540170B (en) | Voiceprint recognition method based on biomimetic pattern recognition | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
Todkar et al. | Speaker recognition techniques: A review | |
CN112735435A (en) | Voiceprint open set identification method with unknown class internal division capability | |
Zhang et al. | Voice biometric identity authentication system based on android smart phone | |
Kekre et al. | Speaker identification by using vector quantization | |
Rudresh et al. | Performance analysis of speech digit recognition using cepstrum and vector quantization | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
Sun et al. | A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea | |
JPH09507921A (en) | Speech recognition system using neural network and method of using the same | |
Goh et al. | Robust computer voice recognition using improved MFCC algorithm | |
CN104464738A (en) | Vocal print recognition method oriented to smart mobile device | |
Sarangi et al. | A novel approach in feature level for robust text-independent speaker identification system | |
Nijhawan et al. | Speaker recognition using support vector machine | |
Panda et al. | Study of speaker recognition systems | |
Wang et al. | Robust Text-independent Speaker Identification in a Time-varying Noisy Environment. | |
Khetri et al. | Automatic speech recognition for marathi isolated words | |
Chelali et al. | MFCC and vector quantization for Arabic fricatives speech/speaker recognition | |
CN109003613A (en) | The Application on Voiceprint Recognition payment information method for anti-counterfeit of combining space information | |
Kabir et al. | Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130619 Termination date: 20131229 |