JPH11249697A

JPH11249697A - Method and device for extracting pitch position

Info

Publication number: JPH11249697A
Application number: JP10046765A
Authority: JP
Inventors: Tadashi Yamaura; 正山浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1998-02-27
Filing date: 1998-02-27
Publication date: 1999-09-17

Abstract

PROBLEM TO BE SOLVED: To reproduce a voice reducing the deterioration of quality due to a transmission error at the time of applying a voice coding/decoding method by limiting the polarity of periodic pulses. SOLUTION: A time series vector outputted from a periodic pulse generation means 101 is weighted in accordance with gain applied from a pulse polarity limit means 105 and supplied to a synthetic file 103. The means 105 limits the polarity of a periodic pulse in the time series vector weighted in order to generate a synthetic voice by outputting the value of gain inputted from an optimum gain calculation means 102 when the sign of the gain is positive or outputting zero when the sign is negative. After executing the processing mentioned above for all time series vectors generated by the means 101, a time series vector minimizing a distance between an input voice and a synthetic voice is searched and the leading pulse position of the pulse string is outputted as a pitch position S6.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は音声信号や音響信
号のピッチ周期毎の特徴点を抽出するピッチ位置抽出方
法及びピッチ位置抽出装置に関し、特に通信に用いる音
声符号化復号化方法に適用するピッチ位置抽出方法及び
ピッチ位置抽出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pitch position extracting method and a pitch position extracting apparatus for extracting a characteristic point of a voice signal or an acoustic signal for each pitch period, and more particularly to a pitch applied to a voice coding / decoding method used for communication. The present invention relates to a position extracting method and a pitch position extracting device.

【０００２】[0002]

【従来の技術】従来のピッチ位置抽出方法を用いた音声
符号化復号化方法として、特開平６─２０２６９９号公
報に開示されたものがある。図５は、この従来のピッチ
位置抽出方法を用いたＣＥＬＰ系音声符号化復号化方法
の全体構成の一例を示すもので、図中１は符号化部、２
は復号化部、３は多重化手段、４は分離手段である。符
号化部１は線形予測パラメータ分析手段５、線形予測パ
ラメータ符号化手段６、合成フィルタ７、適応符号帳
８、ピッチ位置抽出手段９、雑音符号帳１０、ピッチ同
期化手段１１、ゲイン符号化手段１２、距離計算手段１
３より構成されている。また、復号化部２は線形予測パ
ラメータ復号化手段１４、合成フィルタ１５、適応符号
帳１６、ピッチ位置復号化手段１７、雑音符号帳１８、
ピッチ同期化手段１９、ゲイン復号化手段２０より構成
されている。2. Description of the Related Art A conventional speech encoding / decoding method using a pitch position extracting method is disclosed in Japanese Patent Laid-Open No. 6-202699. FIG. 5 shows an example of the overall configuration of a conventional CELP-based speech encoding / decoding method using the pitch position extraction method. In FIG.
Denotes a decoding unit, 3 denotes a multiplexing unit, and 4 denotes a separating unit. The encoding unit 1 includes a linear prediction parameter analysis unit 5, a linear prediction parameter encoding unit 6, a synthesis filter 7, an adaptive codebook 8, a pitch position extraction unit 9, a noise codebook 10, a pitch synchronization unit 11, a gain encoding unit. 12. Distance calculation means 1
3. Further, the decoding unit 2 includes a linear prediction parameter decoding unit 14, a synthesis filter 15, an adaptive codebook 16, a pitch position decoding unit 17, a noise codebook 18,
It comprises pitch synchronization means 19 and gain decoding means 20.

【０００３】ＣＥＬＰ系音声符号化では、5〜50ms程度
を１フレームとして、そのフレームの音声をスペクトル
情報と音源情報に分て符号化する。以下、図５に示す構
成によるＣＥＬＰ系音声符号化復号化方法の動作につい
て説明する。まず符号化部１において、線形予測パラメ
ータ分析手段５は入力音声Ｓ１を分析し、音声のスペク
トル情報である線形予測パラメータを抽出する。線形予
測パラメータ符号化手段６はその線形予測パラメータを
符号化し、符号化した線形予測パラメータを合成フィル
タ７の係数として設定する。[0003] In CELP-based speech coding, about 5 to 50 ms is regarded as one frame, and speech of the frame is divided into spectrum information and sound source information and encoded. Hereinafter, the operation of the CELP-based speech encoding / decoding method having the configuration shown in FIG. 5 will be described. First, in the encoding unit 1, the linear prediction parameter analysis unit 5 analyzes the input speech S1 and extracts a linear prediction parameter which is spectrum information of the speech. The linear prediction parameter encoding unit 6 encodes the linear prediction parameter, and sets the encoded linear prediction parameter as a coefficient of the synthesis filter 7.

【０００４】次に音源情報の符号化について説明する。
適応符号帳８には、過去の駆動音源ベクトルが記憶され
ており、適応符号に対応して過去の駆動音源ベクトルを
周期的に繰り返した時系列ベクトルを出力する。ピッチ
位置抽出手段９は、適応符号に対応した周期のパルス列
を作成し、これを音源として線形予測パラメータ分析手
段５から入力される線形予測パラメータを用いて合成音
声を生成したときに、入力音声Ｓ１との距離が最小とな
るパルス列を探索する。そして、そのパルス位置をピッ
チ位置としてピッチ同期化手段１１に出力する。Next, the encoding of the sound source information will be described.
The adaptive codebook 8 stores past driving excitation vectors, and outputs a time-series vector obtained by periodically repeating the past driving excitation vectors corresponding to the adaptive codes. The pitch position extracting means 9 creates a pulse train having a cycle corresponding to the adaptive code, and generates a synthesized speech using the linear prediction parameter input from the linear prediction parameter analyzing means 5 as a sound source. Search for a pulse train that minimizes the distance to. Then, the pulse position is output to the pitch synchronization means 11 as a pitch position.

【０００５】図６は、ピッチ位置抽出手段９の内部構成
の一例を示し、図中１０１は周期パルス生成手段、１０
２は最適ゲイン計算手段、１０３は合成フィルタ、１０
４は距離計算手段である。このような構成によるピッチ
位置抽出方法の動作を説明する。周期パルス生成手段１
０１は、適応符号Ｓ４に対応した周期のパルス列の先頭
のパルス位置をフレームの先頭の位置から、１サンプル
毎ずらした時系列ベクトルを順次生成する（図７）。周
期パルス生成手段１０１から出力された時系列ベクトル
は最適ゲイン計算手段１０２から与えられるゲインに応
じて重み付けされ、合成フィルタ１０３へ供給される。
合成フィルタ１０３は線形予測パラメータＳ５に応じた
係数を用いて上記重み付けされた時系列ベクトルの合成
音声を得る。距離計算手段１０４は合成音声と入力音声
Ｓ１との距離を求め、最適ゲイン計算手段は距離が最小
となるゲインを算出し、出力する。以上の処理を周期パ
ルス生成手段１０１で生成した全ての時系列ベクトルに
対して行った後、入力音声と合成音声との距離を最小と
する時系列ベクトルを探索し、そのパルス列の先頭パル
ス位置をピッチ位置Ｓ６として出力する。FIG. 6 shows an example of the internal configuration of the pitch position extracting means 9. In FIG.
2 is an optimum gain calculating means, 103 is a synthesis filter, 10
4 is a distance calculation means. The operation of the pitch position extracting method having such a configuration will be described. Periodic pulse generation means 1
01 sequentially generates a time-series vector in which the leading pulse position of the pulse train of the cycle corresponding to the adaptive code S4 is shifted by one sample from the leading position of the frame (FIG. 7). The time series vector output from the periodic pulse generating means 101 is weighted according to the gain given from the optimum gain calculating means 102 and is supplied to the synthesis filter 103.
The synthesis filter 103 obtains the weighted synthesized speech of the time-series vector using the coefficient corresponding to the linear prediction parameter S5. The distance calculation means 104 calculates the distance between the synthesized voice and the input voice S1, and the optimum gain calculation means calculates and outputs the gain that minimizes the distance. After performing the above processing on all the time-series vectors generated by the periodic pulse generation means 101, a time-series vector that minimizes the distance between the input voice and the synthesized voice is searched, and the top pulse position of the pulse train is determined. Output as pitch position S6.

【０００６】雑音符号帳１０には、例えばランダム雑音
から生成した複数の時系列ベクトルが記憶されており、
雑音符号に対応した符号ベクトルを出力する。各符号ベ
クトルにはピッチ同期位置が設定されており、ピッチ同
期化手段１１は、前記ピッチ位置抽出手段９で抽出され
たピッチ位置にピッチ同期位置が合うように、符号ベク
トルを切り出しピッチ周期長にし、これを周期的に繰り
返した時系列ベクトルを生成する。適応符号帳８、ピッ
チ同期化手段１１からの各時系列ベクトルはゲイン符号
化手段１２から与えられるそれぞれのゲインに応じて重
み付けして加算され、その加算結果を駆動音源ベクトル
として合成フィルタ７へ供給し符号化音声を得る。距離
計算手段１３は符号化音声と入力音声Ｓ１との距離を求
め、距離が最小となる適応符号、雑音符号、ゲインを探
索する。この符号化が終了した後、線形予測パラメータ
の符号、入力音声と符号化音声との歪みを最小にする適
応符号、雑音符号、ゲインの符号およびピッチ位置の符
号を符号化結果として出力する。[0006] The random codebook 10 stores a plurality of time-series vectors generated from, for example, random noise.
Output a code vector corresponding to the noise code. A pitch synchronization position is set for each code vector, and the pitch synchronization unit 11 cuts out the code vector and sets the pitch period length to match the pitch position extracted by the pitch position extraction unit 9 with the pitch synchronization position. , To generate a time-series vector that is periodically repeated. The time series vectors from the adaptive codebook 8 and the pitch synchronization unit 11 are weighted and added according to the respective gains provided from the gain encoding unit 12, and the result of the addition is supplied to the synthesis filter 7 as a driving excitation vector. To obtain coded speech. The distance calculation means 13 finds the distance between the coded speech and the input speech S1, and searches for an adaptive code, a noise code, and a gain that minimize the distance. After this coding is completed, a code of the linear prediction parameter, an adaptive code for minimizing distortion between the input voice and the coded voice, a noise code, a gain code, and a pitch position code are output as coding results.

【０００７】一方復号化部２において、線形予測パラメ
ータ復号化手段１４は線形予測パラメータの符号から線
形予測パラメータを復号化し、合成フィルタ１５の係数
として設定する。次に、適応符号帳１６は、適応符号に
対応して、過去の駆動音源ベクトルを周期的に繰り返し
た時系列ベクトルを出力する。ピッチ位置復号化手段１
７はピッチ位置の符号から復号化したピッチ位置をピッ
チ同期化手段１９に出力する。雑音符号帳１８は雑音符
号に対応した符号ベクトルを出力する。ピッチ同期化手
段１９は、雑音符号帳１８から入力される符号ベクトル
とピッチ位置復号化手段１７から入力されるピッチ位置
から、符号化部１のピッチ同期化手段１１と同様の方法
で時系列ベクトルを生成する。適応符号帳１６、ピッチ
同期化手段１９からの各時系列ベクトルは、ゲイン復号
化手段２０でゲインの符号から復号化したそれぞれのゲ
インに応じて重み付けして加算され、その加算結果が駆
動音源ベクトルとして合成フィルタ１５へ供給され出力
音声Ｓ３が得られる。On the other hand, in the decoding unit 2, the linear prediction parameter decoding means 14 decodes the linear prediction parameter from the code of the linear prediction parameter, and sets it as a coefficient of the synthesis filter 15. Next, adaptive codebook 16 outputs a time-series vector obtained by periodically repeating a past excitation vector corresponding to the adaptive code. Pitch position decoding means 1
7 outputs the pitch position decoded from the code of the pitch position to the pitch synchronization means 19. The random codebook 18 outputs a code vector corresponding to the random code. The pitch synchronization unit 19 calculates a time series vector from the code vector input from the noise codebook 18 and the pitch position input from the pitch position decoding unit 17 in the same manner as the pitch synchronization unit 11 of the encoding unit 1. Generate Each time-series vector from the adaptive codebook 16 and the pitch synchronization unit 19 is weighted and added according to each gain decoded from the gain code by the gain decoding unit 20, and the addition result is a driving excitation vector Is supplied to the synthesis filter 15 to obtain an output sound S3.

【０００８】また前記特開平６─２０２６９９号公報に
は、ピッチ位置の符号の伝送が不要である従来のピッチ
位置抽出方法を用いた音声符号化復号化方法も開示され
ている。図５との対応部分に同一符号を付けた図８は、
この従来の音声符号化復号化方法の構成を示し、図中２
１、２２はピッチ位置抽出手段である。このような構成
による符号化復号化方法の動作を説明する。まず符号化
部１において、ピッチ位置抽出手段２０は、適応符号帳
８から出力される周期的に繰り返した時系列ベクトルか
ら、例えば時系列ベクトルの最大振幅をとる周期的な点
をピッチ位置として抽出する。これは即ち、適応符号に
対応した周期のパルス列を作成したときに、適応符号帳
からの時系列ベクトルとの距離が最小となるパルス列を
探索することに等しい。Japanese Patent Application Laid-Open No. 6-202699 also discloses a speech encoding / decoding method using a conventional pitch position extraction method which does not require transmission of a code of a pitch position. FIG. 8 in which the same reference numerals are assigned to corresponding parts to FIG.
The configuration of this conventional speech encoding / decoding method is shown in FIG.
Reference numerals 1 and 22 denote pitch position extracting means. The operation of the encoding / decoding method having such a configuration will be described. First, in the encoding unit 1, the pitch position extracting means 20 extracts, for example, a periodic point having the maximum amplitude of the time series vector as a pitch position from the periodically repeated time series vector output from the adaptive codebook 8. I do. This is equivalent to searching for a pulse train that minimizes the distance from the adaptive codebook to a time-series vector when a pulse train having a cycle corresponding to the adaptive code is created.

【０００９】雑音符号帳１０には、例えばランダム雑音
から生成した複数の時系列ベクトルが記憶されており、
雑音符号に対応した符号ベクトルを出力する。各符号ベ
クトルにはピッチ同期位置が設定されており、ピッチ同
期化手段１１は、前記ピッチ位置抽出手段２１で抽出さ
れたピッチ位置にピッチ同期位置が合うように、符号ベ
クトルを切り出しピッチ周期長にし、これを周期的に繰
り返した時系列ベクトルを生成する。そして、適応符号
帳８、ピッチ同期化手段１１からの各時系列ベクトルを
用いて符号化音声が生成され、この符号化音声と入力音
声Ｓ１との距離が最小となる適応符号、雑音符号、ゲイ
ンが選択され、符号化される。The noise codebook 10 stores a plurality of time-series vectors generated from, for example, random noise.
Output a code vector corresponding to the noise code. A pitch synchronization position is set for each code vector, and the pitch synchronization unit 11 cuts out the code vector and sets the pitch vector to the pitch period length so that the pitch synchronization position matches the pitch position extracted by the pitch position extraction unit 21. , To generate a time-series vector that is periodically repeated. Then, an encoded speech is generated using the adaptive codebook 8 and each time-series vector from the pitch synchronization unit 11, and an adaptive code, a noise code, and a gain that minimize the distance between the encoded speech and the input speech S1. Are selected and encoded.

【００１０】次に復号化部２において、ピッチ位置抽出
手段２２は適応符号帳１６から出力される周期的に繰り
返した時系列ベクトルから、符号化部１のピッチ位置抽
出手段２１と同様の方法でピッチ位置を抽出する。雑音
符号帳１８は雑音符号に対応した符号ベクトルを出力す
る。ピッチ同期化手段１９は、雑音符号帳１８から入力
される符号ベクトルとピッチ位置抽出手段２２から入力
されるピッチ位置から、符号化部１のピッチ同期化手段
１１と同様の方法で時系列ベクトルを生成する。そし
て、適応符号帳１６、ピッチ同期化手段１９からの各時
系列ベクトルを用いて出力音声Ｓ３を得る。このように
構成することにより、符号ベクトルを周期化する際のピ
ッチ位置は適応符号帳からの時系列ベクトルから適応的
に求めるので、ピッチ位置の符号の伝送が不要であり、
伝送情報量を削減することができる。Next, in the decoding unit 2, the pitch position extracting unit 22 uses the periodically repeated time series vector output from the adaptive codebook 16 in the same manner as the pitch position extracting unit 21 of the encoding unit 1. Extract the pitch position. The random codebook 18 outputs a code vector corresponding to the random code. The pitch synchronization unit 19 converts the time series vector from the code vector input from the noise codebook 18 and the pitch position input from the pitch position extraction unit 22 in the same manner as the pitch synchronization unit 11 of the encoding unit 1. Generate. Then, an output speech S3 is obtained using each time-series vector from the adaptive codebook 16 and the pitch synchronization unit 19. With this configuration, the pitch position when the code vector is periodicized is adaptively obtained from the time series vector from the adaptive codebook, so that transmission of the code at the pitch position is unnecessary,
The amount of transmission information can be reduced.

【００１１】[0011]

【発明が解決しようとする課題】上記従来のピッチ位置
抽出方法では、フレーム単位に、音声信号と周期パルス
を音源として生成した合成音声との距離、あるいは、音
源信号と周期パルスとの距離が最小となるパルス列を探
索し、そのパルス位置をピッチ位置として抽出してい
る。このとき、フレーム内で距離を最小にする周期パル
スの極性は、該フレーム区間の音声信号あるいは音源信
号の形状によって決まるので、定常な有声部であって
も、抽出されるピッチ位置はフレーム間で見た場合には
必ずしも周期的になるとは限らなかった。例えば図９に
示すように、フレーム（Ａ）、（Ｃ）での周期パルスの
極性が正、フレーム（Ｂ）では負となっている場合、フ
レーム（Ａ）、（Ｂ）間およびフレーム（Ｂ）、（Ｃ）
間では、パルスが周期的にはならない。In the above-described conventional pitch position extraction method, the distance between a speech signal and a synthesized speech generated by using a periodic pulse as a sound source or the distance between a sound source signal and a periodic pulse is minimum in frame units. And a pulse position is extracted as a pitch position. At this time, the polarity of the periodic pulse that minimizes the distance in the frame is determined by the shape of the voice signal or sound source signal in the frame section. When viewed, it was not always periodic. For example, as shown in FIG. 9, when the polarity of the periodic pulse in frames (A) and (C) is positive and that in frame (B) is negative, between frames (A) and (B) and between frame (B). ), (C)
Between, the pulses are not periodic.

【００１２】このため、従来のピッチ位置抽出方法を用
いた音声符号化復号化方法では、入力音声から抽出、伝
送されるピッチ位置はフレーム間で相関がなく、符号伝
送時にピッチ位置の符号に誤りが発生した場合、復号化
側で誤りがあることを検出してもピッチ位置の推定がで
きず、ピッチ位置の符号以外の符号が全て正しく伝送さ
れていても適当な再生音声が生成できず、再生音声の劣
化が大きいという問題があった。For this reason, in the speech coding / decoding method using the conventional pitch position extraction method, the pitch position extracted and transmitted from the input voice has no correlation between frames, and the code of the pitch position has an error when transmitting the code. Occurs, the pitch position cannot be estimated even if the decoding side detects that there is an error, and even if all codes other than the code at the pitch position are correctly transmitted, an appropriate reproduced voice cannot be generated, There is a problem that the reproduced sound is greatly deteriorated.

【００１３】また、上記従来のピッチ位置抽出方法で
は、聴感上差異は感じられない複数の音声信号あるいは
音源信号に対してピッチ位置抽出を行った場合、ピッチ
位置抽出の評価に用いる距離のわずかな差によって、極
性が正／負のどちらの周期パルス列が選択されるかが異
なり、抽出されるピッチ位置が大きく異なる場合があっ
た。図１０に聴感上差異がない音声（Ａ）、（Ｂ）に対
して抽出されるピッチ位置のずれの例を示す。In the above-described conventional pitch position extraction method, when pitch positions are extracted from a plurality of audio signals or sound source signals for which no difference in hearing is perceived, a small distance used for evaluating the pitch position extraction is required. Depending on the difference, which of the periodic pulse trains of the polarity is positive or negative is different, and the pitch position to be extracted may be significantly different. FIG. 10 shows an example of a shift in the pitch position extracted for sounds (A) and (B) having no difference in auditory sense.

【００１４】このため、ピッチ位置の伝送が不要である
従来のピッチ位置抽出法を用いた音声符号化復号化方法
では、伝送誤りが発生し駆動音源ベクトルが誤りの影響
を受けた場合、この駆動音源ベクトルから生成される適
応符号帳からの時系列ベクトルも誤りの影響を受けたも
のになり、このとき誤りの影響が少なくても抽出される
ピッチ位置が本来の正しいものから大きく異なることが
ある。For this reason, in the conventional speech encoding / decoding method using the pitch position extraction method which does not require transmission of the pitch position, when a transmission error occurs and the driving excitation vector is affected by the error, the driving The time series vector from the adaptive codebook generated from the excitation vector is also affected by the error. At this time, even if the influence of the error is small, the extracted pitch position may be significantly different from the original correct one. .

【００１５】ここで、一度伝送誤りの影響によりピッチ
位置が大きく誤ると、その後は正しい符号が伝送されて
も、ピッチ同期化処理により生成される時系列ベクトル
も正しいものと大きく異なるため、駆動音源ベクトルが
正しく生成できなくなる。これがまたピッチ位置が正し
く求まらない原因となるという悪循環が起こるため、伝
送誤りの影響の時間的波及が大きく、伝送誤りが発生し
た場合の再生音声の劣化が大きいという問題があった。Here, once the pitch position is greatly erroneous due to the effect of the transmission error, even if the correct code is transmitted thereafter, the time series vector generated by the pitch synchronization processing is significantly different from the correct one. Vectors cannot be generated correctly. This also causes a vicious cycle in which the pitch position cannot be determined correctly, so that the influence of the transmission error over time is large, and there is a problem that the reproduction sound is greatly deteriorated when the transmission error occurs.

【００１６】この発明はかかる課題を解決するためにな
されたものであり、音声符号化復号化方法に適用した
際、伝送誤りによる品質劣化の少ない音声を再生可能と
するピッチ位置抽出法を提供するものである。SUMMARY OF THE INVENTION The present invention has been made to solve such a problem, and provides a pitch position extraction method capable of reproducing a speech with little quality deterioration due to a transmission error when applied to a speech encoding / decoding method. Things.

【００１７】[0017]

【課題を解決するための手段】上述の課題を解決するた
めにこの発明のピッチ位置抽出方法は、フレーム単位
に、音声信号と周期パルスを音源として生成した合成音
声との距離、あるいは、音源信号と周期パルスとの距離
を評価することにより、該音声信号あるいは音源信号に
おけるピッチ周期間隔で並ぶ特徴位置を抽出するピッチ
位置抽出方法において、上記周期パルスの極性を制限す
るようにした。In order to solve the above-mentioned problems, a pitch position extracting method according to the present invention uses a distance between a voice signal and a synthesized voice generated by using a periodic pulse as a sound source, or a sound source signal. In the pitch position extracting method for extracting the characteristic positions of the audio signal or the sound source signal arranged at the pitch period interval by evaluating the distance between the periodic pulse and the periodic pulse, the polarity of the periodic pulse is limited.

【００１８】またこの発明のピッチ位置抽出装置は、入
力音声のピッチ周期に対応した周期のパルス列の先頭の
パルス位置をフレームの先頭位置から、１サンプル毎ず
らした時系列ベクトルを順次生成する周期パルス生成手
段と、周期パルス生成手段から出力された時系列ベクト
ルを最適ゲイン計算手段から与えられるゲインに応じて
重み付けされ、上記入力音声から得られた線形予測パラ
メータに応じた係数を用いて上記重み付けされた時系列
ベクトルの合成音声を得る合成フィルタと、この合成音
声と上記入力音声との距離を求め、距離を最小とする時
系列ベクトルを探索し、そのパルス列の先頭パルス位置
をピッチ位置として出力する距離計算手段と、上記合成
音声を生成するため重み付けされた時系列ベクトルにお
ける周期パルスの極性を制限するパルス極性制限手段と
を備えた。Further, the pitch position extracting apparatus of the present invention provides a periodic pulse for sequentially generating a time-series vector in which the leading pulse position of a pulse train having a period corresponding to the pitch period of an input voice is shifted by one sample from the leading position of a frame. The time series vector output from the generating means and the periodic pulse generating means is weighted according to a gain given from the optimal gain calculating means, and the weighted using a coefficient corresponding to a linear prediction parameter obtained from the input speech. A synthesis filter for obtaining a synthesized voice of the time-series vector obtained, a distance between the synthesized voice and the input voice, searching for a time-series vector that minimizes the distance, and outputting a leading pulse position of the pulse train as a pitch position. A distance calculation means, and a periodic pulse in a time series vector weighted to generate the synthesized speech. And a pulse polarity limiting means for limiting the sex.

【００１９】[0019]

【発明の実施の形態】以下図面を参照しながら、この発
明の実施の形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２０】実施の形態１．図６との対応部分に同一符
号を付けて示す図１は、この発明によるピッチ位置抽出
方法の実施の形態１を行うピッチ位置抽出手段９の内部
構成を示し、１０５はパルス極性制限手段である。Embodiment 1 FIG. 1 in which parts corresponding to those in FIG. 6 are assigned the same reference numerals shows the internal configuration of the pitch position extracting means 9 for performing the first embodiment of the pitch position extracting method according to the present invention, and 105 is a pulse polarity limiting means. .

【００２１】以下、動作を説明する。周期パルス生成手
段１０１は、適応符号Ｓ４に対応した周期のパルス列の
先頭のパルス位置をフレームの先頭の位置から、１サン
プル毎ずらした時系列ベクトルを順次生成する。周期パ
ルス生成手段１０１から出力された時系列ベクトルはパ
ルス極性制限手段１０５から与えられるゲインに応じて
重み付けされ、合成フィルタ１０３へ供給される。合成
フィルタ１０３は線形予測パラメータＳ５に応じた係数
を用いて上記重み付けされた時系列ベクトルの合成音声
を得る。距離計算手段１０４は合成音声と入力音声Ｓ１
との距離を求め、最適ゲイン計算手段は距離が最小とな
るゲインを算出し、出力する。パルス極性制限手段１０
５は、例えば最適ゲイン計算手段１０２から入力された
ゲインの符号が正の場合はそのゲインの値を、また負の
場合は零をゲインとして出力するとして、合成音声を生
成するための重み付けされた時系列ベクトルにおける周
期パルスの極性を制限する。以上の処理を周期パルス生
成手段１０１で生成した全ての時系列ベクトルに対して
行った後、入力音声と合成音声との距離を最小とする時
系列ベクトルを探索し、そのパルス列の先頭パルス位置
をピッチ位置Ｓ６として出力する。Hereinafter, the operation will be described. The periodic pulse generating means 101 sequentially generates a time-series vector in which the leading pulse position of the pulse train of the cycle corresponding to the adaptive code S4 is shifted by one sample from the leading position of the frame. The time-series vector output from the periodic pulse generation unit 101 is weighted according to the gain given from the pulse polarity restriction unit 105 and is supplied to the synthesis filter 103. The synthesis filter 103 obtains the weighted synthesized speech of the time-series vector using the coefficient corresponding to the linear prediction parameter S5. The distance calculation means 104 calculates the synthesized voice and the input voice S1.
The optimum gain calculating means calculates a gain that minimizes the distance and outputs the gain. Pulse polarity limiting means 10
5 is weighted for generating a synthesized speech, assuming that, for example, when the sign of the gain input from the optimum gain calculating means 102 is positive, the gain value is output as the gain, and when the sign is negative, zero is output as the gain. Limit the polarity of the periodic pulse in the time series vector. After performing the above processing on all the time-series vectors generated by the periodic pulse generation means 101, a time-series vector that minimizes the distance between the input voice and the synthesized voice is searched, and the top pulse position of the pulse train is determined. Output as pitch position S6.

【００２２】図９と対比して示す図２は、この発明によ
るピッチ位置抽出方法の実施の形態１の動作の説明に供
する略線図であり、例えばピッチ位置抽出処理において
評価に用いるピッチ周期パルスの極性を正に制限したも
のである。図９と対比して図２ではフレーム（Ｂ）にお
ける周期パルスの極性が正となっており、フレーム
（Ａ）、（Ｂ）間およびフレーム（Ｂ）、（Ｃ）間でも
パルスが周期的になっている。FIG. 2 in comparison with FIG. 9 is a schematic diagram for explaining the operation of the pitch position extracting method according to the first embodiment of the present invention. Is limited to a positive polarity. In contrast to FIG. 9, in FIG. 2, the polarity of the periodic pulse in frame (B) is positive, and the pulse is periodically generated between frames (A) and (B) and between frames (B) and (C). Has become.

【００２３】この実施の形態１のピッチ位置抽出方法を
図５に示す音声符号化復号化方法に適用した場合、入力
音声から抽出、伝送されるピッチ位置は、フレーム間で
もピッチ周期性をもつので、符号伝送時にピッチ位置の
符号に誤りが発生した場合、復号化側で誤りがあること
を検出すれば、前フレームのピッチ位置、ピッチ周期か
ら現フレームのピッチ位置を推定することができる（図
３）。When the pitch position extraction method according to the first embodiment is applied to the speech encoding / decoding method shown in FIG. 5, the pitch positions extracted and transmitted from the input speech have pitch periodicity even between frames. If an error occurs in the code at the pitch position during code transmission, if the decoding side detects that there is an error, the pitch position of the current frame can be estimated from the pitch position and pitch period of the previous frame (FIG. 3).

【００２４】この実施の形態１によれば、ピッチ位置抽
出処理において評価に用いるピッチ周期パルスの極性を
制限することにより、抽出されるピッチ位置がフレーム
間でも周期性をもつので、音声符号化復号化方法に適用
した場合、ピッチ位置の符号に伝送誤りが発生し、正し
いピッチ位置を復号できなくても、前フレームのピッチ
位置の情報、ピッチ周期から伝送誤りが発生したフレー
ムのピッチ位置を推定することが可能であり、伝送誤り
による再生音声の劣化を回避し、伝送誤りによる品質劣
化の少ない音声を再生可能とすることができる。According to the first embodiment, by limiting the polarity of the pitch period pulse used for the evaluation in the pitch position extraction processing, the extracted pitch position has periodicity even between frames, so that speech coding and decoding is performed. If a transmission error occurs in the code at the pitch position and the correct pitch position cannot be decoded, the pitch position information of the previous frame and the pitch period of the frame where the transmission error occurred can be estimated from the pitch period information even if the correct pitch position cannot be decoded. Therefore, it is possible to avoid the deterioration of the reproduced sound due to the transmission error, and to reproduce the sound with less quality deterioration due to the transmission error.

【００２５】実施の形態２．実施の形態１と同様に、例
えばピッチ位置抽出処理において評価に用いるピッチ周
期パルスの極性を正に制限した場合の、動作の説明に供
する略線図を図１０と対比して図４に示す。図１０と対
比して図４では入力音声（Ｂ）に対する周期パルスの極
性が正となっており、近似した音声信号（Ａ）、（Ｂ）
に対して抽出されるピッチ位置のずれが小さくなる。Embodiment 2 Similar to the first embodiment, for example, FIG. 4 shows a schematic diagram for explaining the operation when the polarity of the pitch period pulse used for evaluation in the pitch position extraction process is restricted to be positive, in comparison with FIG. In FIG. 4, as compared with FIG. 10, the polarity of the periodic pulse with respect to the input voice (B) is positive, and the approximate voice signals (A) and (B)
, The deviation of the pitch position extracted becomes smaller.

【００２６】この実施の形態２のピッチ位置抽出方法を
図８に示すピッチ位置の伝送が不要である音声符号化復
号化方法に適用した場合、伝送誤りが発生し駆動音源ベ
クトルが誤りの影響を受けた場合でも、誤りの影響が少
なければピッチ位置は本来の正しいものの近傍に求ま
る。When the pitch position extracting method according to the second embodiment is applied to a speech coding / decoding method which does not require transmission of the pitch position shown in FIG. 8, a transmission error occurs and the driving excitation vector is affected by the error. Even if it is received, if the influence of the error is small, the pitch position is found near the original correct one.

【００２７】この実施の形態２によれば、ピッチ位置抽
出処理において評価に用いるピッチ周期パルスの極性を
制限することにより、聴感上差異は感じられない複数の
音声信号あるいは音源信号に対してピッチ位置抽出を行
った場合、抽出されるピッチ位置はほぼ同じ位置に求ま
るので、ピッチ位置の伝送が不要な音声符号化復号化方
法に適用した場合、伝送誤りが発生し駆動音源ベクトル
が誤りの影響を受けた場合でも、誤りの影響が少なけれ
ばピッチ位置は本来の正しいものの近傍に求まり、ピッ
チ同期化処理により生成される時系列ベクトルも正しい
ものと差異が小さいものが生成されるため、伝送誤りに
よる生成音声の劣化を回避し、伝送誤りによる品質劣化
の少ない音声を再生可能とすることができる。According to the second embodiment, by limiting the polarity of the pitch period pulse used for evaluation in the pitch position extraction processing, the pitch position can be determined for a plurality of audio signals or sound source signals for which no audible difference is perceived. When extraction is performed, the pitch positions to be extracted are obtained at almost the same position.Therefore, when applied to a speech coding / decoding method that does not require transmission of pitch positions, a transmission error occurs and the driving excitation vector is affected by the error. Even if it is received, if the influence of the error is small, the pitch position is found in the vicinity of the original correct one, and the time series vector generated by the pitch synchronization process is generated with a small difference from the correct one. Deterioration of generated sound can be avoided, and sound with low quality deterioration due to transmission errors can be reproduced.

【００２８】実施の形態３．上述の実施の形態１及び実
施の形態２でさらに、周期パルスの極性を常に正に制限
するなど固定にはせず、連続する有声音区間内では常に
同一の極性とするが、無声音または無音区間で区分され
る各有声音区間毎では異なる極性であっても良い。この
実施の形態３によれば、ピッチ位置抽出に用いる周期パ
ルスの極性を正または負のどちらでも取ることができる
ので、各有声音区間毎に音声の様態により適した極性を
選択することが可能となり、常に一方の極性に制限する
のに比較しより品質の高い音声を再生可能とすることが
できる。Embodiment 3 In the first and second embodiments described above, the polarity of the periodic pulse is not fixed, for example, always limited to positive. The same polarity is always used in continuous voiced sound sections. The polarity may be different in each voiced sound section divided by. According to the third embodiment, since the polarity of the periodic pulse used for pitch position extraction can be either positive or negative, it is possible to select a polarity that is more suitable for the voice mode for each voiced sound section. Thus, a higher quality sound can be reproduced as compared with the case where the polarity is always limited to one polarity.

【００２９】実施の形態４．上述の実施の形態１から実
施の形態３では、音声信号と周期パルスを音源として生
成した合成音声との距離を評価するピッチ位置抽出方法
において周期パルスの極性を制限するとしているが、こ
れに代え、音源信号と周期パルスとの距離を評価するピ
ッチ位置抽出方法において周期パルスの極性を制限する
としても同様の効果が得られる。Embodiment 4 In the above-described first to third embodiments, the polarity of the periodic pulse is limited in the pitch position extraction method for evaluating the distance between the audio signal and the synthesized voice generated using the periodic pulse as a sound source. The same effect can be obtained even if the polarity of the periodic pulse is limited in the pitch position extracting method for evaluating the distance between the sound source signal and the periodic pulse.

【００３０】[0030]

【発明の効果】以上詳述したように、この発明によれ
ば、フレーム単位に、音声信号と周期パルスを音源とし
て生成した合成音声との距離、あるいは、音源信号と周
期パルスとの距離を評価することにより、該音声信号あ
るいは音源信号におけるピッチ周期間隔で並ぶ特徴位置
を抽出するピッチ位置抽出方法及びピッチ位置抽出装置
において、上記周期パルスの極性を制限するようにした
ので、音声符号化復号化方法に適用した場合、ピッチ位
置の符号に伝送誤りが発生してもピッチ位置を推定する
ことが可能であり、伝送誤りによる再生音声の劣化を回
避し、伝送誤りによる品質劣化の少ない音声を再生可能
とすることができる。また、ピッチ位置の伝送が不要で
ある音声符号化復号化方法に適用した場合、伝送誤りが
発生し駆動音源ベクトルが誤りの影響を受けた場合で
も、誤りの影響が少なければピッチ位置は本来の正しい
ものの近傍に求まり、ピッチ同期化処理により生成され
る時系列ベクトルも正しいものと差異が小さいものが生
成でき、伝送誤りによる再生音声の劣化を回避し、伝送
誤りによる品質劣化の少ない音声を再生可能とすること
ができる。As described above in detail, according to the present invention, the distance between a speech signal and a synthesized speech generated using a periodic pulse as a sound source, or the distance between a sound source signal and a periodic pulse, is evaluated for each frame. In the pitch position extracting method and the pitch position extracting apparatus for extracting characteristic positions arranged at pitch interval intervals in the audio signal or the sound source signal, the polarity of the periodic pulse is limited. When applied to the method, it is possible to estimate the pitch position even if a transmission error occurs in the code at the pitch position, avoid deterioration of the reproduced sound due to the transmission error, and reproduce the sound with less quality deterioration due to the transmission error. Can be possible. Also, when applied to a speech coding / decoding method that does not require transmission of the pitch position, even if a transmission error occurs and the driving excitation vector is affected by the error, the pitch position is not affected by the error, and the pitch position is not changed. The time series vector that is found near the correct one and the time series vector generated by the pitch synchronization processing can be generated with a small difference from the correct one, avoiding the deterioration of the reproduced sound due to transmission errors, and reproducing the sound with less quality deterioration due to transmission errors Can be possible.

[Brief description of the drawings]

【図１】この発明によるピッチ位置抽出装置の実施の
形態１の内部構成を示すブロック図である。FIG. 1 is a block diagram showing an internal configuration of a pitch position extracting device according to a first embodiment of the present invention.

【図２】この発明によるピッチ位置抽出装置の実施の
形態１の動作の説明に供する略線図である。FIG. 2 is a schematic diagram illustrating the operation of the pitch position extracting device according to the first embodiment of the present invention;

【図３】この発明によるピッチ位置抽出装置の実施の
形態１を適用した音声符号化復号化方法の伝送誤り時の
ピッチ位置推定処理の動作の説明に供する略線図であ
る。FIG. 3 is a schematic diagram for explaining an operation of a pitch position estimating process at the time of a transmission error in a speech encoding / decoding method to which the embodiment 1 of the pitch position extracting device according to the present invention is applied;

【図４】この発明によるピッチ位置抽出方法の実施の
形態２の動作の説明に供する略線図である。FIG. 4 is a schematic diagram for explaining an operation of a pitch position extracting method according to a second embodiment of the present invention;

【図５】従来のピッチ位置抽出方法を用いた音声符号
化復号化装置の全体構成を示すブロック図である。FIG. 5 is a block diagram showing an overall configuration of a speech encoding / decoding device using a conventional pitch position extraction method.

【図６】従来のピッチ位置抽出装置の内部構成を示す
ブロック図である。FIG. 6 is a block diagram showing an internal configuration of a conventional pitch position extracting device.

【図７】従来の周期パルス生成手段の動作の説明に供
する略線図である。FIG. 7 is a schematic diagram for explaining the operation of a conventional periodic pulse generating means.

【図８】ピッチ位置の符号の伝送が不要な従来のピッ
チ位置抽出装置を用いた音声符号化復号化方法の全体構
成を示すブロック図である。FIG. 8 is a block diagram illustrating an overall configuration of a speech encoding / decoding method using a conventional pitch position extracting device that does not require transmission of a code of a pitch position.

【図９】従来のピッチ位置抽出方法の動作の説明に供
する略線図である。FIG. 9 is a schematic diagram for explaining the operation of a conventional pitch position extracting method.

【図１０】従来のピッチ位置抽出方法の動作の説明に
供する略線図である。FIG. 10 is a schematic diagram for explaining the operation of a conventional pitch position extracting method.

[Explanation of symbols]

１：符号化部、２：復号化部、３：多重化手段、
４：分離手段、５：線形予測パラメータ分析手段、
６：線形予測パラメータ符号化手段、７、１５：合成フ
ィルタ、８、１６：適応符号帳、９：ピッチ
位置抽出手段、１０、１８：雑音符号帳、１
１、１９：ピッチ同期化手段、１２：ゲイン符号
化手段、１３：距離計算手段、１４：線形
予測パラメータ復号化手段、１７：ピッチ位置復号化手
段、２０：ゲイン復号化手段、２１、２２：ピッチ
位置抽出手段、１０１：周期パルス生成手段、１０
２：最適ゲイン計算手段、１０３：合成フィル
タ、１０４：距離計算手段、１０５：パル
ス極性制限手段。1: encoding unit, 2: decoding unit, 3: multiplexing means,
4: separation means, 5: linear prediction parameter analysis means,
6: linear prediction parameter coding means, 7, 15: synthesis filter, 8, 16: adaptive codebook, 9: pitch position extraction means, 10, 18: noise codebook, 1
1, 19: pitch synchronization means, 12: gain encoding means, 13: distance calculation means, 14: linear prediction parameter decoding means, 17: pitch position decoding means, 20: gain decoding means, 21, 22: Pitch position extracting means, 101: periodic pulse generating means, 10
2: optimal gain calculation means, 103: synthesis filter, 104: distance calculation means, 105: pulse polarity restriction means.

Claims

[Claims]

An evaluation of a distance between a speech signal and a synthesized speech generated by using a periodic pulse as a sound source or a distance between a sound source signal and a periodic pulse in a frame unit to determine a pitch cycle in the speech signal or the sound source signal. A pitch position extracting method for extracting characteristic positions arranged at intervals, wherein the polarity of the periodic pulse is limited.

2. A periodic pulse generating means for sequentially generating a time series vector in which a leading pulse position of a pulse train having a cycle corresponding to a pitch cycle of an input voice is shifted by one sample from a leading position of a frame, and a periodic pulse generating means. The time-series vector output from is weighted according to the gain given from the optimal gain calculating means, and the synthesized voice of the weighted time-series vector is calculated using a coefficient corresponding to the linear prediction parameter obtained from the input voice. Obtaining a synthesis filter and a distance between the synthesized voice and the input voice,
A distance calculating means for searching for a time series vector which minimizes the distance and outputting the leading pulse position of the pulse train as a pitch position, and limiting the polarity of the periodic pulse in the time series vector weighted to generate the synthesized speech A pitch position extracting device comprising: a pulse polarity limiter.