JP3466507B2

JP3466507B2 - Audio coding method, audio coding device, and data recording medium

Info

Publication number: JP3466507B2
Application number: JP16038399A
Authority: JP
Inventors: 栄治河原
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-06-15
Filing date: 1999-06-08
Publication date: 2003-11-10
Anticipated expiration: 2019-06-08
Also published as: JP2000078018A

Abstract

PROBLEM TO BE SOLVED: To produce the coded data of high sound quality in real time and with no sound break by switching the using bit allocation means to execute the bit allocation and then the coding so as to perform the due processing by means of the prescribed one of plural bit allocation means. SOLUTION: A sub-band analyzing means 102 divides the inputted digital audio signals into 32 frequency components. A grouping means 111 divides the 32 frequency components into the number of groups that is designated by the throughput control information 121. An allocatable bit arithmetic means 113 decides each allocatable bit number to an aural psychological model bit allocation means 110 and a band output adaptive bit allocation means 109 based on the total allocatable bit number. Each of sub-band signals is coded by a quantizing/coding means 106 and an auxiliary information coding means 107, and a bit stream forming means 108 forms and outputs a bit stream.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声符号化方式、
及び音声符号化装置、及びデータ記憶媒体に関し、特に
ＭＰＥＧ（Motion Picture Experts Group）方式で用い
られているようなサブバンド符号化方式を用いる音声符
号化方式、及び音声符号化装置、及び前記音声符号化方
式を実行するためのプログラムを格納したデータ記憶媒
体に関するものである。TECHNICAL FIELD The present invention relates to a speech coding system,
The present invention relates to a speech coding apparatus and a data storage medium, and in particular, a speech coding method and a speech coding apparatus using a subband coding method used in the MPEG (Motion Picture Experts Group) method, and the speech coding apparatus. The present invention relates to a data storage medium that stores a program for executing the coding method.

【０００２】[0002]

【従来の技術】近年、パソコンのマルチメディア化やイ
ンターネットの普及により、パソコン（以下、ＰＣとも
いう）等の上で、ソフトウエアによってＭＰＥＧ等の動
画や音声を再生できる環境が整ってきており、ＭＰＥＧ
等の符号化データの利用範囲が広がっている。しかしな
がら、符号化データを作るエンコーダに関しては、いま
だに高価なハードウエアを用いられるのが主流である。
また、ソフトウエアで符号化データを作るものもある
が、符号化対象となる動画や音声の再生時間の実時間の
何倍もの処理時間をかけて符号化を行うものであるた
め、多大な時間、及び手間を要し、広く普及するに至っ
ていない。2. Description of the Related Art In recent years, with the advancement of multimedia in personal computers and the spread of the Internet, an environment has been established in which moving images and audio such as MPEG can be reproduced by software on personal computers (hereinafter also referred to as PCs). MPEG
The range of use of encoded data such as is expanding. However, with regard to an encoder that creates encoded data, it is still mainstream to use expensive hardware.
In addition, there are some that make encoded data by software, but it takes a lot of time because it takes a lot of processing time that is many times the real time of the playback time of the moving image or audio to be encoded. It takes time and labor, and has not been widely spread.

【０００３】このため、特に一般のパソコンユーザが安
価で簡単に符号化データを作成できるようになるために
は、ソフトウエア処理により実時間での符号化データの
作成を実現したいという要望がある。For this reason, in particular, in order for general user of a personal computer to be able to easily and inexpensively create coded data, there is a demand for real-time creation of coded data by software processing.

【０００４】以下に従来の音声符号化方式の一例につい
て説明する。図１１は音声に関する符号化データフォー
マットとして、ISO/IEC11172-3にて規格化されているＭ
ＰＥＧオーディオエンコーダのブロック図である。図１
１において、入力デジタルオーディオ信号は、サブバン
ド分析手段２０２において３２個の周波数成分に分割さ
れ、各サブバンド信号に対し、スケールファクタ抽出手
段２０３においてスケールファクタを計算し、ダイナミ
ックレンジをそろえる。また、入力デジタルオーディオ
信号は、ＦＦＴ手段２０４において、高速フーリエ変換
(FFT:Fast Fourier Transform)され、この結果を用い
て、聴覚心理分析手段２０５により人間の聴覚の特性を
利用した聴覚心理モデルに基づく信号対マスク比（ＳＭ
Ｒ）値の関係モデルを利用し、ビット割り当て手段２０
６により各サブバンド信号に対するビット割り当て数を
決める。各サブバンド信号へのビット割り当て数に応じ
て、各サブバンド信号を量子化／符号化手段２０７によ
り量子化／符号化する。そして、ビットストリーム形成
手段２０９により、補助情報符号化手段２０８により符
号化されたヘッダ情報と補助情報を共にしてビットスト
リームを形成して出力する。An example of a conventional speech coding system will be described below. FIG. 11 shows M, which is standardized by ISO / IEC11172-3, as a coded data format for voice.
It is a block diagram of a PEG audio encoder. Figure 1
In 1, the input digital audio signal is divided into 32 frequency components by the subband analysis unit 202, and the scale factor is calculated by the scale factor extraction unit 203 for each subband signal to make the dynamic range uniform. Further, the input digital audio signal is subjected to the fast Fourier transform in the FFT means 204.
(FFT: Fast Fourier Transform), and using this result, the psychoacoustic analysis unit 205 uses a signal-to-mask ratio (SM
R) using the relational model of values, the bit allocation means 20
The number of bit allocation for each subband signal is determined by 6. Quantization / encoding means 207 quantizes / encodes each subband signal in accordance with the number of bits assigned to each subband signal. Then, the bit stream forming unit 209 forms the bit stream by combining the header information and the auxiliary information encoded by the auxiliary information encoding unit 208, and outputs the bit stream.

【０００５】この従来の音声符号化方式は、各帯域電力
の偏在を利用して各帯域（サブバンド）毎に符号化を行
うような符号化方式であるため、聴覚心理モデルを利用
した各サブバンド信号に対するビット配分が音質を左右
することになる。また、蓄積媒体を利用目的として規格
化されたため、符号化データの高音質化には適している
が、実時間での符号化には適しておらず、音質を左右す
る聴覚心理モデルは非常に演算量の多いものとなってい
る。Since this conventional speech coding system is a coding system which performs coding for each band (sub-band) by utilizing uneven distribution of power of each band, each sub-cycle using the psychoacoustic model. The bit allocation to the band signal affects the sound quality. In addition, since it was standardized for the purpose of using storage media, it is suitable for improving the sound quality of encoded data, but is not suitable for encoding in real time, and the psychoacoustic model that affects sound quality is extremely It has a large amount of calculation.

【０００６】[0006]

【発明が解決しようとする課題】従来の音声符号化方
式、及び音声符号化装置は以上のように構成されてお
り、蓄積媒体を対象とする高音質な符号化データを作成
するには適しているが、聴覚心理モデルの利用は多くの
処理能力が必要とされるため、ソフトウエアで処理する
には現在のＣＰＵ能力ではＰＣ上で実時間処理するには
不適当である。また、実時間処理可能な高性能なＣＰＵ
を搭載したＰＣ上で動作させた場合においても、他のア
プリケーションのＣＰＵ占有率が大きくなった時などで
は、実時間での処理が不可能となる恐れがあり、その結
果、音途切れが発生する可能性があるという問題点があ
る。The conventional speech coding system and speech coding apparatus are configured as described above, and are suitable for creating high-quality coded data for a storage medium. However, since the use of the psychoacoustic model requires a lot of processing capacity, the current CPU capacity for processing by software is not suitable for real-time processing on a PC. Also, a high-performance CPU capable of real-time processing
Even when it is operated on a PC equipped with, there is a possibility that real-time processing becomes impossible when the CPU occupancy of other applications becomes large, and as a result, sound interruption occurs. There is a potential problem.

【０００７】この発明は、以上のような問題点を解決す
るためになされたもので、パソコンのＣＰＵ処理能力、
及び他のアプリケーションのＣＰＵ占有率に左右され
ず、実時間で音途切れがない高音質の符号化データの作
成をソフトウエア処理により実現することができる音声
符号化方式、音声符号化装置、及び上記符号化を実行す
るためのプログラムを格納したデータ記憶媒体を提供す
ることを目的とする。The present invention has been made to solve the above problems and has a CPU processing capability of a personal computer.
And a speech coding method, a speech coding apparatus, and a speech coding apparatus capable of realizing, by software processing, creation of high-quality coded data that does not cause sound interruption in real time regardless of the CPU occupancy of other applications. An object of the present invention is to provide a data storage medium that stores a program for executing encoding.

【０００８】[0008]

【課題を解決するための手段】この発明の請求項１にか
かる音声符号化方式は、デジタルオーディオ信号を複数
の周波数帯域に分割し、各帯域毎に符号化を行う音声符
号化方式であって、上記分割された各帯域に対するビッ
ト割り当て情報を生成し、それぞれ処理量の異なるビッ
ト割り当て手段を複数有し、外部からの制御情報に基づ
いて、上記複数のビット割り当て手段の中から、所定の
ビット割り当て手段を用いて処理がなされるように、使
用するビット割り当て手段を切り替えてビット割り当て
を実行して、符号化し、上記外部からの制御情報とし
て、符号化を行う時に占有可能な中央演算処理装置の処
理量を表す負荷値を用い、上記負荷値に基づいて、上記
中央演算処理装置上で各ビット割り当て手段を用いて符
号化を行った時の各処理量を予め記憶したデータテーブ
ルを参照して、上記占有可能な中央演算処理装置の処理
量を超えないよう上記ビット割り当て手段の選択を行う
ものである。A speech coding system according to claim 1 of the present invention is a speech coding system in which a digital audio signal is divided into a plurality of frequency bands and coding is performed for each band. Generating bit allocation information for each of the divided bands, having a plurality of bit allocation means each having a different processing amount, and selecting a predetermined bit from among the plurality of bit allocation means based on external control information. The bit allocation means to be used is switched so that the processing is performed by using the allocation means, the bit allocation is executed, and the data is encoded to obtain the control information from the outside.
The central processing unit that can be occupied at the time of encoding.
Based on the above load value, using the load value that represents the
Using each bit allocation means on the central processing unit,
Data table that stores each processing amount when encoding
Processing of the occupable central processing unit
The bit allocation means is selected so as not to exceed the amount .

【０００９】また、本発明の請求項２にかかる音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、上記負荷値として、符号化処理を行うために占有可
能な上記中央演算処理装置の処理量を監視する監視手段
からの処理量制御情報を用いるものである。The speech coding method according to claim 2 of the present invention is the speech coding method according to claim 1, wherein the load value can be occupied to perform coding processing.
Monitoring means for monitoring the throughput of the above central processing unit
The processing amount control information from is used .

【００１０】また、本発明の請求項３にかかる音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、上記ビット割り当て手段によるビット割り当て処理
として、符号化データの高音質化を実現可能な高効率に
ビット割り当てを行う高効率ビット割り当て方法を用い
た処理と、上記高効率ビット割り当て方法を用いた処理
と比較して処理量の少ない低負荷でビット割り当てを行
う低負荷ビット割り当て方法を用いた処理とが行われる
ものである。A speech coding method according to claim 3 of the present invention is the speech coding method according to claim 1, wherein the bit allocation processing is performed by the bit allocation means.
As high efficiency that can realize high sound quality of encoded data
Uses a highly efficient bit allocation method for bit allocation
Processing and processing using the above high-efficiency bit allocation method
Bit allocation is performed with low load and less processing compared to
The process using the low load bit allocation method is performed .

【００１１】また、本発明の請求項４にかかる音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、上記符号化時に使用されるビット割り当て手段の切
り替えを、オーディオ信号に復号可能な最小単位である
フレーム単位で行うものである。According to a fourth aspect of the present invention, in the voice encoding method according to the first aspect, the bit allocation means used during the encoding is switched off.
It is the smallest unit that can be replaced into an audio signal.
This is done in frame units .

【００１２】また、本発明の請求項５にかかる音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、複数の周波数帯域に分割された各帯域のサブバンド
信号を、各々予め定められた所定個数のサブバンド信号
からなるグループとなるようにグループ分けを行い、各
グループに対して独立したビット割り当て処理行い、各
帯域に対するビット割り当て情報を生成するものであ
る。A speech coding system according to claim 5 of the present invention is the speech coding system according to claim 1 in which the subbands of each band divided into a plurality of frequency bands.
Signal is a predetermined number of subband signals
Group into groups consisting of
Independent bit allocation processing is performed for each group
The bit allocation information for the band is generated .

【００１３】また、本発明の請求項６にかかる音声符号
化方式は、上記請求項５記載の音声符号化方式におい
て、上記グループ分けは、グループの数、又はグループ
内の周波数軸方向に連続したサブバンド信号の数を、上
記外部からの制御情報により指定された数、又は上記監
視手段からの処理量制御情報に基づいて指定された数に
なるように可変的に行われるものである。A speech coding method according to claim 6 of the present invention is the speech coding method according to claim 5, wherein the grouping is performed by the number of groups or groups.
The number of consecutive subband signals in the
The number specified by the control information from the outside, or the above
To the number specified based on the throughput control information from the visual means
It is performed variably so that

【００１４】また、本発明の請求項７にかかる音声符号
化方式は、上記請求項６記載の音声符号化方式におい
て、上記サブバンド信号の数の変更を、オーディオ信号
に復号可能な最小単位であるフレーム単位で行うもので
ある。According to a seventh aspect of the present invention, there is provided a voice encoding method according to the sixth aspect, wherein the number of subband signals is changed by changing the audio signal.
It is performed in units of frames, which is the minimum unit that can be decoded into .

【００１５】また、本発明の請求項８にかかる音声符号
化方式は、上記請求項７記載の音声符号化方式におい
て、上記グループ分け時に、ビット割り当てを行わない
グループを少なくとも１つ設けるものである。According to an eighth aspect of the present invention, in the voice encoding method according to the seventh aspect , bit allocation is not performed at the time of grouping.
At least one group is provided .

【００１６】また、本発明の請求項９にかかる音声符号
化方式は、上記請求項５記載の音声符号化方式におい
て、上記サブバンド信号のグループ分けにより、低帯域
に属するグループにグループ分けされたサブバンド信号
に対し、上記高効率ビット割り当て方法を用いた処理に
より、グループ内のサブバンド信号にビット割り当てを
行い、一方、高帯域に属するグループにグループ分けさ
れたサブバンド信号に対し、上記低負荷ビット割り当て
方法を用いた処理により、グループ内のサブバンド信号
にビット割り当てを行うものである。また、本発明の請
求項１０にかかる音声符号化方式は、上記請求項５記載
の音声符号化方式において、各グループ毎に独立したビ
ット割り当て手段に対する割り当て可能ビット数を決定
する割り当て可能ビット演算手段を設け、各グループの
グループ全体に対する割合に各グループに属する各帯域
毎の特性に基づいた重み付けを加味したものを用いて、
グループ全体に対する割り当て可能ビット数を各グルー
プ毎に独立したビット割り当て手段に対し振り分けるも
のである。A speech coding method according to claim 9 of the present invention is the speech coding method according to claim 5, wherein the subband signals are grouped into a low band.
Subband signals grouped into groups belonging to
In contrast, processing using the above high-efficiency bit allocation method
Bit allocation for subband signals within a group
On the other hand, grouped into high band groups
Low-load bit allocation for the selected subband signal
Subband signals within a group by processing using the method
Bit allocation to. Moreover, the contract of the present invention
The speech coding method according to claim 10 is the above claim 5.
In the speech coding method of
Determine the number of assignable bits for the bit allocation means
Assignable bit operation means for
Bands belonging to each group as a percentage of the entire group
Using what added weighting based on each characteristic,
The number of assignable bits for the entire group is set to each group.
In which it takes swing component to separate bit allocation means for each flop.

【００１７】また、本発明の請求項１１にかかる音声符
号化方式は、上記請求項１０記載の音声符号化方式にお
いて、各グループに属する各帯域毎の特性に基づいた重
み付けを、各帯域毎の所定の最小可聴限界値に基づいた
重み付けとするものである。また、本発明の請求項１２
にかかる音声符号化方式は、上記請求項１０記載の音声
符号化方式において、各グループに属する各帯域毎の特
性に基づいた重み付けを、入力デジタルオーディオ信号
にサブバンド分析を施して得られる各グループに属する
各周波数帯域のサブバンド信号レベルに基づいた重み付
けとするものである。A speech coding method according to claim 11 of the present invention is the speech coding method according to claim 10 , wherein the weight is based on the characteristics of each band belonging to each group.
Based on a specified minimum audible limit for each band
Weighting is used. Further, claim 12 of the present invention
The voice encoding system according to claim 1 is the voice according to claim 10.
In the encoding system, the characteristics of each band belonging to each group
Based on the weighting of the input digital audio signal
Belong to each group obtained by subband analysis on
Weighting based on the subband signal level of each frequency band
It's just something.

【００１８】また、本発明の請求項１３にかかる音声符
号化方式は、上記請求項１０記載の音声符号化方式にお
いて、各グループに属する各帯域毎の特性に基づいた重
み付けを、入力デジタルオーディオ信号に線形変換を施
して得られる各グループに属するスペクトル信号レベル
に基づいた重み付けとするものである。According to a thirteenth aspect of the present invention, in the voice encoding method according to the tenth aspect , weighting based on the characteristics of each band belonging to each group is applied to the input digital audio signal. Linear transformation to
The weighting is performed based on the spectrum signal levels belonging to each group obtained as described above.

【００１９】また、本発明の請求項１４にかかる音声符
号化方式は、上記請求項５記載の音声符号化方式におい
て、各グループに属する信号レベルが、所定のしきい値
以上の高レベルな信号に対しては、上記高効率ビット割
り当て方法を用いた処理により、ビット割り当てを行
い、各グループに属する信号レベルが、所定のしきい値
以下の低レベルな信号に対しては、上記低負荷ビット割
り当て方法を用いた処理により、ビット割り当てを行う
ものである。A speech coding system according to a fourteenth aspect of the present invention is the speech coding system according to the fifth aspect , in which the signal level belonging to each group is a predetermined threshold value.
For the above high level signals, the high efficiency bit allocation
Bit allocation is performed by processing using the allocation method.
The signal level belonging to each group is a predetermined threshold
For the following low level signals, the above low load bit allocation
Bit allocation is performed by processing using the allocation method .

【００２０】また、本発明の請求項１５にかかる音声符
号化方式は、上記請求項１４記載の音声符号化方式にお
いて、上記各グループに属する信号レベルを、入力デジ
タルオーディオ信号にサブバンド分析を施して得られる
サブバンド信号レベルとするものである。Further, the speech encoding method according to claim 15 of the present invention, in the audio coding method of the claim 14, the signal level belonging to each group, the input digital
Obtained by subband analysis of digital audio signal
It is a subband signal level .

【００２１】また、本発明の請求項１６にかかる音声符
号化方式は、上記請求項１４記載の音声符号化方式にお
いて、上記各グループに属する信号レベルが、入力デジ
タルオーディオ信号に線形変換を施して得られるスペク
トル信号レベルであるものである。Further, the speech encoding method according to claim 16 of the present invention, in the audio coding method of the claim 14, the signal levels belonging to said each group is subjected to a linear transformation to the input digital audio signal Obtained Spec
Torr is a signal level der shall.

【００２２】また、本発明の請求項１７にかかる音声符
号化方式は、上記請求項１４記載の音声符号化方式にお
いて、上記各グループに属する信号レベルを、所定の各
帯域毎の最小可聴限界値とするものである。Further, the speech encoding method according to claim 17 of the present invention, in the audio coding method of the claim 14, the signal level belonging to each group, each given
This is the minimum audible limit value for each band .

【００２３】また、本発明の請求項１８にかかる音声符
号化方式は、上記請求項３, ９, １４のいずれかに記載
の音声符号化方式において、上記高効率ビット割り当て
方法を用いた処理は、所定の聴覚心理モデルに基づく信
号対マスク比値の関係を使用して行われ、上記低負荷ビ
ット割り当て方法を用いた処理は、複数の周波数帯域に
分割された信号レベルに各帯域毎の所定の最小可聴限界
値を加味して行われるものとしたものである。The speech coding system according to claim 18 of the present invention is the speech coding system according to any one of claims 3, 9 and 14, wherein the high efficiency bit allocation is performed.
The processing using the method is based on a predetermined psychoacoustic model.
Signal-to-mask ratio value relationship.
Processing using the network allocation method is applied to multiple frequency bands.
Predetermined minimum audible limit for each band for divided signal levels
The value is taken into consideration .

【００２４】また、本発明の請求項１９にかかる音声符
号化方式は、上記請求項１８記載の音声符号化方式にお
いて、聴覚心理モデルがＭＰＥＧ（Motion Picture Exp
ertsGroup）によって指定された聴覚心理モデルである
ものである。According to a nineteenth aspect of the present invention, in the speech encoding system according to the eighteenth aspect , the psychoacoustic model is MPEG (Motion Picture Exp).
ertsGroup) is a psychoacoustic model specified by the ertsGroup .

【００２５】また、本発明の請求項２０記載の音声符号
化方式は、上記請求項４または請求項７記載の音声符号
化方式において、上記オーディオ信号に復号可能な最小
単位であるフレームが、ＭＰＥＧ（Motion Picture Exp
erts Group）によって指定されたフレームであるもので
ある。The speech coding method according to claim 20 of the present invention is the minimum speech signal which can be decoded into the audio signal in the speech coding method according to claim 4 or 7.
The frame that is the unit is MPEG (Motion Picture Exp
erts Group) is the frame specified by.

【００２６】また、本発明の請求項２１記載の音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、上記ビット割り当て手段は、分割化された各帯域に
対し、所定の聴覚心理モデルから出力される情報に基づ
いてビット割り当て情報を生成するものであり、N(N＝
１，２，３...)フレームに１度、上記所定の聴覚心理モ
デルから出力される情報に基づいてビット割り当て情報
を生成し、上記ビット割り当て情報を生成しなかったフ
レームに対しては、上記聴覚心理モデルから出力された
情報と上記分割された各帯域の信号情報に基づいてビッ
ト割り当て情報を生成し、符号化を行なうものである。The speech coding system according to claim 21 of the present invention is the speech coding system according to claim 1, wherein the bit allocation means is provided for each divided band.
On the other hand, based on the information output from the specified psychoacoustic model,
To generate bit allocation information, and N (N =
1, 2, 3 ...) Once per frame, the specified auditory psychological mode
Bit allocation information based on information output by Dell
And the bit allocation information above was not generated.
For Laem, it was output from the above psychoacoustic model.
Based on the information and the signal information of each divided band above.
It generates the allocation information and encodes it .

【００２７】また、本発明の請求項２２記載の音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、段階的に処理量の制御可能な聴覚心理モデルを有
し、外部からの制御情報に基づいて、上記聴覚心理モデ
ルの処理量制御を行ない、所定の処理量の聴覚心理モデ
ルを用いて処理がなされるように、各帯域に対するビッ
ト割り当て情報を生成するものである。A speech coding method according to a twenty-second aspect of the present invention is the speech coding method according to the first aspect, which has a psychoacoustic model capable of controlling the processing amount stepwise.
Then, based on external control information, the above psychoacoustic model is used.
Control the processing volume of the
Bit for each band so that processing is performed using
To generate allocation information .

【００２８】また、本発明の請求項２３記載の音声符号
化方式は、上記請求項１記載の音声符号化方式におい
て、それぞれ処理量の異なる聴覚心理モデルを複数有
し、外部からの制御情報に基づいて、上記複数の聴覚心
理モデルの中から、所定の聴覚心理モデルを用いて処理
が成されるように、使用する聴覚心理モデルを切り替え
て、各帯域に対するビット割り当て情報を生成するもの
である。A speech coding method according to a twenty-third aspect of the present invention is the speech coding method according to the first aspect, wherein a plurality of psychoacoustic models each having a different processing amount are provided.
Then, based on external control information,
Process using a specified psychoacoustic model from the physical models
Switch the psychoacoustic model to be used so that
Then, the bit allocation information for each band is generated .

【００２９】また、本発明の請求項２４にかかる音声符
号化方式は、上記請求項１記載の音声符号化方式におい
て、符号化処理が実行される中央演算処理装置の性能に
応じて、符号化処理動作前の初期化時に複数のビット割
り当て手段、または複数の聴覚心理モデルの各処理負荷
値情報を外部へ出力するものである。According to a twenty-fourth aspect of the present invention, there is provided a speech coding system according to the first aspect, which is equivalent to the performance of the central processing unit in which the coding process is executed.
Accordingly, multiple bit allocations are required during initialization before the encoding processing operation.
Allocation means, or each processing load of multiple psychoacoustic models
The value information is output to the outside .

【００３０】また、本発明の請求項２５にかかる音声符
号化方式は、上記請求項２４記載の音声符号化方式にお
いて、外部への情報として出力される複数のビット割り
当て手段、または複数の聴覚心理モデルの各処理負荷値
情報が、降順、あるいは昇順で出力されるものである。The speech coding system according to claim 25 of the present invention is the speech coding system according to claim 24.
Multiple bit allocations that are output as external information.
Reliable measure or each processing load value of multiple psychoacoustic models
Information is output in descending or ascending order .

【００３１】また、本発明の請求項２６記載の音声符号
化装置は、上記請求項１ないし請求項２５のいずれかに
記載の音声符号化方式を用いて音声符号化を行うもので
ある。A speech coding apparatus according to claim 26 of the present invention is the speech coding device according to any one of claims 1 to 25.
Speech coding is performed using the described speech coding method.

【００３２】また、本発明の請求項２７記載の記録媒体
は、上記請求項１ないし請求項２５のいずれかに記載の
音声符号化方式のステップが記録されているものであ
る。 A recording medium according to claim 27 of the present invention has recorded therein the steps of the audio encoding method according to any one of claims 1 to 25 . It is a thing.

【００３３】[0033]

【００３４】[0034]

【００３５】[0035]

【００３６】[0036]

【００３７】[0037]

【００３８】[0038]

【００３９】[0039]

【００４０】[0040]

【００４１】[0041]

【発明の実施の形態】以下、本発明の実施の形態による
音声符号化方式、音声符号化装置について、図面を参照
しながら説明する。（実施の形態１）ここでは、例えば、入力信号を複数の
周波数成分に分割し、各帯域電力の偏在を利用して各帯
域（サブバンド）毎に符号化を行うような符号化方式の
場合について説明する。図１は例えば、上記符号化方式
による音声符号化装置としてパーソナルコンピュータ
（以下、ＰＣともいう）が用いられた場合のシステム全
体の概念図であり、図において、１はカメラ１７とマイ
ク１９などの外部機器よりのデータ入力が可能な、いわ
ゆるマルチメディア型のＰＣであり、各種データ及びプ
ログラムを格納する記憶容量が大きく固定式の記録媒体
であるハードディスクドライブ（ＨＤＤ）１１と、ＨＤ
Ｄ１１とプログラムやデータなどの入出力を行うための
比較的記憶容量の小さな着脱自在な記憶媒体であるＰＤ
ドライブ１２ａ, ＦＤドライブ１２ｂを有し、上記ＨＤ
Ｄに格納されたプログラムが中央演算処理装置（ＣＰ
Ｕ）１４からの命令により適宜ランダムアクセスメモリ
（ＲＡＭ）などで構成されたメモリ１３上に読み出され
て実行されるように構成されている。また、外部機器で
あるカメラ１７、マイク１９の映像、音声を取り込むた
めに、それぞれビデオキャプチャーカード１６、サウン
ドカード１８が内蔵されている。そして以上のような構
成を有するＰＣ１は、内部のデータバス１５によって各
要素が接続されている。BEST MODE FOR CARRYING OUT THE INVENTION A speech coding system and a speech coding apparatus according to embodiments of the present invention will be described below with reference to the drawings. (Embodiment 1) Here, for example, in the case of a coding method in which an input signal is divided into a plurality of frequency components and coding is performed for each band (subband) by utilizing uneven distribution of power of each band. Will be described. FIG. 1 is, for example, a conceptual diagram of the entire system when a personal computer (hereinafter, also referred to as a PC) is used as a voice encoding device according to the above-described encoding method. In the figure, 1 indicates a camera 17 and a microphone 19. A hard disk drive (HDD) 11, which is a so-called multimedia type PC capable of inputting data from an external device, has a large storage capacity for storing various data and programs, and is a fixed type recording medium, and an HD.
PD, which is a removable storage medium with a relatively small storage capacity for inputting and outputting programs and data to and from D11
Drive 12a, FD drive 12b, the above HD
The program stored in D is the central processing unit (CP
U) is configured to be read and executed by the instruction from the U) 14 as appropriate on the memory 13 configured by a random access memory (RAM) or the like. In addition, a video capture card 16 and a sound card 18 are respectively incorporated in order to capture video and audio from the camera 17 and the microphone 19, which are external devices. Each element of the PC 1 having the above configuration is connected by the internal data bus 15.

【００４２】図２は図１に示したＰＣ１により実行され
る音声符号化処理を実現するための音声符号化装置の符
号化器２０のブロック構成図であり、実際にはＨＤ１１
からメモリ１３に読み出されたプログラムによって実現
されているものである。図２において、２１はＣＰＵ１
４の負荷状態を監視するためのＣＰＵ負荷監視情報であ
り、２２はＣＰＵ負荷監視情報２１に基づいて、低帯域
符号化処理手段２３と高帯域符号化処理手段２４の動作
を制御する符号化手段制御手段である。また、２５は上
記２つの符号化処理手段２３, ２４の出力をそれぞれス
トリーム信号とするためのビットストリーム形成処理手
段である。さらに、２６は、ユーザが指定することによ
り上記符号化手段制御手段２２に入力される符号化モー
ド指定信号である。FIG. 2 is a block diagram of the encoder 20 of the speech coding apparatus for realizing the speech coding process executed by the PC 1 shown in FIG.
Is realized by the program read from the memory 13 to the memory 13. In FIG. 2, 21 is a CPU 1
CPU load monitoring information for monitoring the load state of No. 4, and 22 is coding means for controlling the operation of the low band coding processing means 23 and the high band coding processing means 24 based on the CPU load monitoring information 21. It is a control means. Further, reference numeral 25 is a bit stream forming processing means for converting the outputs of the two encoding processing means 23, 24 into stream signals. Further, reference numeral 26 is a coding mode designating signal which is inputted to the coding means control means 22 when designated by the user.

【００４３】図２における低帯域符号化処理手段２３の
構成としては、例えば、図１１の従来例に示したような
構成を用いられる。また、高帯域符号化処理手段２４の
構成としては、例えば、図３に示すように、図１１の例
と同様に、各帯域電力の偏在を利用して各帯域（サブバ
ンド）毎に符号化を行うような符号化方式を用いている
が、聴覚心理モデルを利用した各サブバンド信号に対す
るビット配分は行なわず、その代替手段として、帯域出
力適応ビット割り当て手段３０４を設け、サブバンド信
号毎のスケールファクタに人間の聴覚特性に基づいた重
み付けを行い、高音質化よりも低負荷処理を第１の目的
とし、演算量の少ない構成とする。また、特定帯域への
過度なビット割り当て集中を排除するために、ビットの
割り当て毎に各帯域に応じた重み付けの調整を行なうも
のとしている。As the configuration of the low band encoding processing means 23 in FIG. 2, for example, the configuration shown in the conventional example of FIG. 11 is used. As the configuration of the high band encoding processing means 24, for example, as shown in FIG. 3, as in the example of FIG. 11, the uneven distribution of power of each band is used to encode each band (subband). However, the bit allocation to each sub-band signal using the psychoacoustic model is not performed, and as an alternative means, a band output adaptive bit allocation means 304 is provided to The scale factor is weighted based on human auditory characteristics, and the first purpose is low load processing rather than high sound quality, and the amount of calculation is small. Further, in order to eliminate excessive concentration of bit allocation to a specific band, weighting adjustment is performed according to each band for each bit allocation.

【００４４】図４は図２に示した符号化器２０の詳細な
構成を示すブロック図であり、１０１は符号化器であ
り、後述するサブバンド分析手段１０２，スケールファ
クタ抽出手段１０３，ＦＦＴ手段１０４，聴覚心理分析
手段１０５，量子化／符号化手段１０６，補助情報符号
化手段１０７，ビットストリーム形成手段１０８，帯域
出力適応ビット割り当て手段１０９，聴覚心理モデルビ
ット割り当て手段１１０，グループ分け手段１１１，ビ
ット割り当て処理制御手段１１２，割り当て可能ビット
演算手段１１３から構成されている。FIG. 4 is a block diagram showing a detailed structure of the encoder 20 shown in FIG. 2. Reference numeral 101 denotes an encoder, which is a subband analyzing means 102, a scale factor extracting means 103, and an FFT means which will be described later. 104, psychoacoustic analysis means 105, quantization / encoding means 106, auxiliary information encoding means 107, bit stream forming means 108, band output adaptive bit allocation means 109, psychoacoustic model bit allocation means 110, grouping means 111, The bit allocation processing control unit 112 and the allocatable bit calculation unit 113 are included.

【００４５】上記サブバンド解析手段１０２は、入力さ
れたデジタルオーディオ信号を３２個の周波数成分に分
割する。スケールファクタ抽出手段１０３は、各サブバ
ンド信号に対するスケールファクタを計算し、各サブバ
ンドダイナミックレンジをそろえる。グループ分け手段
１１１は、上記分割された３２個の周波数成分を、外部
からの制御情報である処理量制御情報１２１で指定され
たグループ数に分割する。本実施の形態１では、図５に
示すように、グループ数を３とし、各グループを周波数
軸方向に連続したサブバンド信号として、０〜１５サブ
バンドの低帯域グループＡ、１６〜２９サブバンドの高
帯域グループＢ、及びビット割り当てを行わない３０〜
３１サブバンドの無効グループＣにグループ分けを行
う。なお、上記処理量制御情報１２１にはＣＰＵ負荷監
視情報２１と符号化モード設定信号２６の情報が含まれ
ているものとする。また、本実施の形態１では、上記各
サブバンドグループにビットを割り当てるビット割り当
て手段として、人間の耳に対して感度のよい低帯域に、
ＭＰＥＧによって指定された聴覚心理モデルに基づく信
号対マスク比値との関係を使用して、高効率にビット割
り当てを行う聴覚心理モデルビット割り当て手段１１０
を使用し、人間の耳に対して比較的感度の低い高帯域
に、スケールファクタ抽出手段１０３からのスケールフ
ァクタ情報に予め設定された各帯域毎の最小可聴限界値
を加えたものを使用して、聴覚心理モデルビット割り当
て方法と比較して低負荷でビット割り当てを行う帯域出
力適応ビット割り当て手段１０９を使用するように構成
している。The subband analysis means 102 divides the input digital audio signal into 32 frequency components. The scale factor extraction means 103 calculates a scale factor for each subband signal and aligns each subband dynamic range. The grouping unit 111 divides the 32 divided frequency components into the number of groups designated by the processing amount control information 121 which is external control information. In the first embodiment, as shown in FIG. 5, the number of groups is three, and each group is a subband signal continuous in the frequency axis direction, and a low band group A of 0 to 15 subbands and 16 to 29 subbands. High bandwidth group B, and bit allocation is not performed 30 to
The invalid group C of 31 subbands is divided into groups. The processing amount control information 121 includes the CPU load monitoring information 21 and the encoding mode setting signal 26. Further, in the first embodiment, as a bit allocating means for allocating bits to each of the subband groups, a low band that is sensitive to the human ear,
The psychoacoustic model bit allocating means 110 for highly efficient bit allocation using the relationship with the signal-to-mask ratio value based on the psychoacoustic model specified by MPEG.
Using a high-frequency band that is relatively insensitive to the human ear, the scale factor information from the scale factor extraction means 103 plus a minimum audible limit value for each band. The band output adaptive bit allocating means 109 for allocating bits with a lower load than that of the psychoacoustic model bit allocating method is used.

【００４６】また、ビット割り当て処理制御手段１１２
は、入力されたデジタルオーディオ信号を、聴覚心理モ
デルビット割り当てを行う０〜１５サブバンドの低帯域
グループＡに対して、必要とされる聴覚心理分析を行う
ために、ＦＦＴ（高速フーリエ変換）手段１０４にて高
速フーリエ変換を施すようにＦＦＴ手段１０４を制御す
る。そしてこの変換結果を用いて、聴覚心理分析手段１
０５は、人間の聴覚の特性を利用した聴覚心理モデルに
基づく信号対マスク比（ＳＭＲ）値の関係モデルを導き
出す。Also, the bit allocation processing control means 112.
FFT (Fast Fourier Transform) means for performing the required psychoacoustic analysis on the input digital audio signal with respect to the low band group A of 0 to 15 subbands for which the psychoacoustic model bit allocation is performed. The FFT means 104 is controlled so that the fast Fourier transform is performed at 104. Then, using this conversion result, the psychoacoustic analysis means 1
05 derives a relation model of signal-to-mask ratio (SMR) values based on a psychoacoustic model that utilizes the characteristics of human hearing.

【００４７】また、割り当て可能ビット演算手段１１３
は、サンプリング周波数や符号化のビットレート値か
ら、確定するグループ全体に対する割り当て可能ビット
数を，ビット割り当ての対象となる各グループのグルー
プ全体に対する割合に、各グループに属する各帯域毎の
特性に基づいた重み付けを加えたものを用いて、各グル
ープ毎に独立したビット割り当て手段に対する割り当て
可能ビット数を演算する。本実施の形態１では、聴覚心
理モデルビット割り当て手段１１０、及び帯域出力適応
ビット割り当て手段１１３に対し、スケール・ファクタ
・インデックス値、及び低帯域／高帯域の領域の比を考
慮し、全体の割り当て可能ビット数から、両手段１１
０，１１３への各割り当て可能ビット数を決定する。す
なわち、実際には、スケールファクタ抽出手段１０３に
より求められた各スケール・ファクタ・インデックス値
scf ＿index ［ i］から、下記の数式１，数式２に示す
ように、両領域内のscf ＿index ［ i］の加算値Vpsy,V
non を算出する。The assignable bit computing means 113
Is based on the sampling frequency and the coding bit rate value, the number of assignable bits for the entire group to be determined, the ratio of each group to be bit-allocated to the entire group, and the characteristics for each band belonging to each group. The number of assignable bits to the independent bit assigning means is calculated for each group by using the weighted values. In the first embodiment, the psychoacoustic model bit allocating means 110 and the band output adaptive bit allocating means 113 consider the scale factor index value and the ratio of the low band / high band region, and allocate the entire band. Based on the number of possible bits, both means 11
The number of bits that can be assigned to 0 and 113 is determined. That is, in reality, each scale factor index value obtained by the scale factor extraction means 103
From scf_index [i], the addition value Vpsy, V of scf_index [i] in both areas is calculated as shown in the following formulas 1 and 2.
Calculate non.

【数１】 [Equation 1]

【数２】ここで、 psy ＿end=16: 聴覚心理モデルビット割り当てを行うサ
ブバンド数 subband ＿end=30: 全ビット割り当てサブバンド数である。[Equation 2] Here, psy_end = 16: number of subbands for which psychoacoustic model bit allocation is performed subband_end = 30: total number of bit allocation subbands.

【００４８】次いで、人間の耳に感度のよい低帯域に、
よりビットを多く配分するために、Vpsyに対し重み付け
を行い、 Vpsy=Vpsy*0.75 聴覚心理モデルビット割り当て可能数psy ＿num 、及び
帯域出力適応ビット割り当て可能数non ＿num を下記の
式に基づいて求める。 Vnon=Vnon*psy-ratio psy ＿num=all ＿alloc ＿num*Vnon/(Vpsy+Vnon) non ＿num=all ＿alloc ＿num-psy ＿num ここで、 all ＿alloc ＿num:全体の割り当て可能ビット数 psy ＿ratio:psy ＿end/(subband＿end-psy ＿end) である。Next, in the low band which is sensitive to the human ear,
In order to allocate more bits, Vpsy is weighted, and Vpsy = Vpsy * 0.75 psychoacoustic model bit allocatable number psy_num and band output adaptive bit allocatable number non_num are calculated based on the following equations. Vnon = Vnon * psy-ratio psy _num = all _alloc _num * Vnon / (Vpsy + Vnon) non _num = all _alloc _num-psy _num where all _alloc _num: total allocatable bit number psy _ratio: psy _end / ( subband_end-psy_end).

【００４９】グループ毎の割り当て可能ビット数の範囲
内(psy＿num,non ＿num)で、聴覚心理モデルビット割り
当て手段１１０は、聴覚心理分析手段１０５からのＳＭ
Ｒ値の関係モデルを利用し、低帯域グループＡである０
〜１５サブバンドに対してビット割り合てを行う。一
方、帯域出力適応ビット割り当て手段１０９において
は、高帯域グループＢである１６〜２９サブバンドに対
してビット割り合てを行う。また、無効グループＣであ
る３０〜３１サブバンドに対しては、無効なサブバンド
としたため、ビット割り当ては行われない。Within the range of the number of assignable bits for each group (psy_num, non_num), the psychoacoustic model bit allocating means 110 uses the SM from the psychoacoustic analyzing means 105.
Using the relational model of the R value, 0 which is the low band group A
Bit allocation is performed for ~ 15 subbands. On the other hand, the band output adaptive bit allocating means 109 allocates bits to the 16 to 29 subbands of the high band group B. In addition, since the 30 to 31 subbands of the invalid group C are invalid subbands, bit allocation is not performed.

【００５０】これらのビット割り当て手段により決定さ
れた各サブバンド信号へのビット割り当て数に応じて、
各サブバンド信号を量子化／符号化手段１０６において
量子化／符号化し、補助情報符号化手段１０７により符
号化されたヘッダ情報と補助情報とをともに、ビットス
トリーム形成手段１０８においてビットストリームを形
成して出力する。According to the number of bit allocation to each subband signal determined by these bit allocation means,
Each subband signal is quantized / encoded by the quantization / encoding means 106, and the header information and the auxiliary information encoded by the auxiliary information encoding means 107 are combined together to form a bitstream in the bitstream forming means 108. Output.

【００５１】処理量制御情報１２１からの情報が、例え
ば、符号化処理量を減少させるための情報であった場合
には、図６に示すように、処理量の多い聴覚心理モデル
ビット割り当て手段１１０のビット割り当ての対象とな
る０〜１５サブバンド低帯域グループＡのバンド幅を、
０〜７サブバンド低帯域グループＡ' に減少させ、反対
に処理量の小さい帯域出力適応ビット割り当て手段１０
９のビット割り当ての対象となるバンド幅を、８〜２９
サブバンド低帯域グループＢ' となるように増加させ
る。さらに、符号化処理量の減少を考えた時の最終的な
形態は、帯域出力適応ビット割り当て手段１０９のビッ
ト割り当ての対象を０〜２９サブバンドグループとする
ことで処理量を制御する。この場合においては、実質的
に聴覚心理モデルビット割り当て手段１１０は動作しな
いため、ＦＦＴ手段１０４、及び聴覚心理分析手段１０
５も動作しないことになる。If the information from the processing amount control information 121 is, for example, information for reducing the encoding processing amount, as shown in FIG. 6, the psychoacoustic model bit allocating means 110 having a large processing amount is provided. The bandwidth of the 0 to 15 subband low band group A that is the target of bit allocation of
0 to 7 sub-bands A low band group A ′ is reduced, and conversely, a band output adaptive bit allocating means 10 with a small processing amount is provided.
The bandwidth targeted for the bit allocation of 9 is 8 to 29.
The subbands are increased to become the lowband group B ′. Further, in the final form when the reduction of the encoding processing amount is considered, the processing amount is controlled by setting the bit allocation target of the band output adaptive bit allocation unit 109 to the 0 to 29 subband groups. In this case, the psychoacoustic model bit allocating means 110 does not operate substantially, so that the FFT means 104 and the psychoacoustic analyzing means 10 do not operate.
5 will not work either.

【００５２】一方、処理量制御情報１２１からの情報
が、例えば、符号化データの高音質化を図るための情報
であった場合には、高効率（高音質）なビット配分が可
能な聴覚心理モデルビット割り当て手段１１０のビット
割り当ての対象となるバンド幅を増加させる。さらに、
高音質を考えた時の最終的な形態は、聴覚心理モデルビ
ット割り当て手段１１０のビット割り当ての対象を０〜
２９サブバンドグループとすることである。本実施の形
態では以上のようにサブバンドグループの増減やビット
割り当て手段の切り替えを、オーディオ信号に復号可能
な最小単位であるフレーム単位で行うことで、リアルタ
イムに符号化処理量を制御可能としている。On the other hand, when the information from the processing amount control information 121 is, for example, the information for improving the sound quality of the encoded data, the psychoacoustic method capable of highly efficient (high sound quality) bit allocation. The bandwidth targeted for bit allocation by the model bit allocation unit 110 is increased. further,
In the final form when considering high sound quality, the bit allocation target of the psychoacoustic model bit allocation unit 110 is 0 to 0.
29 sub-band groups. In the present embodiment, as described above, the increase / decrease of subband groups and the switching of bit allocation means are performed in frame units, which are the minimum units that can be decoded into audio signals, so that the encoding processing amount can be controlled in real time. .

【００５３】次に本実施の形態１による音声符号化装置
の全体的な動作の流れについて図７を参照しつつ説明す
る。まず、図７（ａ）に示されるような構成を用い、各
エンコーダ（２３，２４）の処理負荷を認識するため
に、各エンコーダの各モード（ビット割り当ての対象と
なるバンド幅の変化）において、所定時間分のダミーデ
ータを符号化することにより、ＣＰＵ負荷監視部７００
は各モードでのＣＰＵ負荷値をデータテーブル７０１に
格納する。そして、サンプル（データ）が入力される
と、図７（ｂ）のステップＳ７０において、サブバンド
分析が行われて３２の周波数成分に分割され、続いてス
テップＳ７１において各サブバンド信号のスケールファ
クタが計算される。Next, the flow of the overall operation of the speech coding apparatus according to the first embodiment will be described with reference to FIG. First, in order to recognize the processing load of each encoder (23, 24) using the configuration as shown in FIG. 7A, in each mode of each encoder (change in bandwidth to be bit-allocated) , By encoding the dummy data for a predetermined time, the CPU load monitoring unit 700
Stores the CPU load value in each mode in the data table 701. Then, when the sample (data) is input, in step S70 of FIG. 7B, subband analysis is performed and divided into 32 frequency components, and subsequently in step S71, the scale factor of each subband signal is calculated. Calculated.

【００５４】次いで、ステップＳ７２において、ＣＰＵ
負荷の検出データがあるかどうかの判定が行われ、ここ
では、動作開始直後なのでＣＰＵ負荷検出データはな
く、従ってステップＳ７４に進んで最も高音質な音声再
生を行うことができる通常のグループ分けを行い、ステ
ップＳ７５に進んで聴覚心理モデルビット割り当て処理
が行われる。そして、ステップＳ７６に進んで量子化／
符号化処理が行われ、さらにステップＳ７９においてビ
ットストリーム形成が行われて一連の処理が終了すると
ともに、処理終了時に、入力された所定数のサンプルの
符号化に要した時間をＣＰＵ負荷監視部７００に通知
し、現在のＣＰＵの負荷が検出される。Then, in step S72, the CPU
It is determined whether or not there is load detection data. Here, since there is no CPU load detection data immediately after the start of operation, the process proceeds to step S74, and normal grouping capable of performing voice reproduction with the highest sound quality is performed. Then, the process proceeds to step S75, and the psychoacoustic model bit allocation process is performed. Then, the process proceeds to step S76 and quantization /
The encoding process is performed, the bit stream is further formed in step S79, and the series of processes is completed. At the end of the process, the CPU load monitoring unit 700 determines the time required to encode a predetermined number of input samples. And the current CPU load is detected.

【００５５】すると、次回からの処理に際しては、ステ
ップＳ７２において、ＣＰＵ負荷検出が「有」と判定さ
れるようになり、ステップＳ７３において、検出された
ＣＰＵ負荷が実時間での符号化が不可能であると判定さ
れた場合には、ステップＳ７７に進んで、データテーブ
ル７０１を参照して最適なモード（グループ分け）を選
択し、ステップＳ７８における帯域ビット割り当て処理
を併用してステップＳ７５における聴覚心理モデルビッ
ト割り当て処理をそれぞれ、所定の割合にて行い、ステ
ップＳ７６に進んで量子化／符号化処理が行われ、ステ
ップＳ７９において、これら符号化されたデータを用い
たビットストリームが形成されることになる。Then, in the processing from the next time, it is determined in step S72 that the CPU load detection is "present", and in step S73, the detected CPU load cannot be encoded in real time. If it is determined that it is, the process proceeds to step S77, the optimum mode (grouping) is selected with reference to the data table 701, and the band bit allocation process in step S78 is used together with the psychoacoustic process in step S75. The model bit allocation process is performed at a predetermined rate, the process proceeds to step S76, and the quantization / encoding process is performed. In step S79, a bit stream using these encoded data is formed. Become.

【００５６】なお、本実施の形態１における割り当て可
能ビット演算手段１１３の演算を、スケール・ファクタ
・インデックス値、及び低帯域、高帯域の領域の比を考
慮し、各グループ毎に独立したビット割り当て手段に対
する割り当て可能ビット数を演算するものとしたが、ス
ケール・ファクタ・インデックス値の代わりにＦＦＴ手
段１０４からの各グループに属するスペクトル信号レベ
ルとしてもよく、各帯域毎に予め設定した最小可聴限界
値としてもよい。The calculation of the allocatable bit calculating means 113 according to the first embodiment is performed by allocating an independent bit to each group in consideration of the scale factor index value and the ratio of the low band region and the high band region. Although the number of assignable bits for the means is calculated, the spectrum signal level belonging to each group from the FFT means 104 may be used instead of the scale factor index value, and the minimum audible limit value preset for each band is used. May be

【００５７】また、実施の形態１における符号化器１０
１の処理量を制御するための情報を、符号化器１０１の
内部に、ＣＰＵの処理量を監視するＣＰＵ負荷監視手段
７００を設け、ＣＰＵの処理能力を超えないように符号
化器１０１を動作させるように構成したが、ユーザ入力
などによる外部からの制御情報としてもよい。ユーザ入
力を行うことにより、音質，画質をユーザの好みに応じ
て優先させたエンコード処理を行うことが可能となる。Further, the encoder 10 according to the first embodiment
The CPU load monitoring means 700 for monitoring the processing amount of the CPU for the information for controlling the processing amount of 1 is provided inside the encoder 101, and the encoder 101 is operated so as not to exceed the processing capacity of the CPU. However, the control information may be external control information such as user input. By performing the user input, it is possible to perform the encoding process in which the sound quality and the image quality are prioritized according to the user's preference.

【００５８】また、実施の形態１における符号化器１０
１のビット割り当て手段として、人間の耳に対して感度
のよい低帯域に、高効率にビット割り当てを行う聴覚心
理モデルビット割り当て手段１１０を用い、高帯域に、
低負荷でビット割り当てを行う帯域出力適応ビット割り
当て手段１０９を固定的に用いるようにしたが、スケー
ルファクタ抽出手段１０３からの信号により、各グルー
プに属するサブバンド信号レベルが、予め設定した各帯
域のしきい値以下の場合には、つまり、図８に示すよう
に、低帯域に符号化データとして意味のある信号が高帯
域に比べ少ない場合などには、ビット割り当て手段を帯
域に応じて固定的に用いる必要はなく、高帯域に聴覚心
理モデルビット割り当て手段１１０を用いるようにして
もよい。Further, the encoder 10 according to the first embodiment
As the bit allocating means of 1, the psychoacoustic model bit allocating means 110 for efficiently allocating bits to the low band which is sensitive to the human ear is used, and
The band output adaptive bit allocating means 109 for allocating bits with a low load is fixedly used, but the signal from the scale factor extracting means 103 causes the sub-band signal level belonging to each group to be set in each preset band. If it is less than or equal to the threshold value, that is, as shown in FIG. 8, that is, if the number of signals meaningful as encoded data in the low band is smaller than that in the high band, the bit allocation means is fixed according to the band. However, the psychoacoustic model bit allocating means 110 may be used in a high band.

【００５９】また、図９に示すように、スケールファク
タ抽出手段１０３からの信号により、各グループに属す
るサブバンド信号レベルと閾値とを比較判断するのでは
なく、スケールファクタ抽出手段１０３からの信号より
も分解能（周波数の）の高い、ＦＦＴ手段１０４からの
信号をビット割り当て処理制御手段１１２へ入力して、
各グループに属するサブバンド信号のレベルと、予め設
定した各帯域の閾値との比較判断を行うように構成して
もよい。Further, as shown in FIG. 9, the signal from the scale factor extracting means 103 is not used to judge the subband signal level belonging to each group and the threshold value, but the signal from the scale factor extracting means 103 is used. Also inputs a signal from the FFT means 104 having a high resolution (of frequency) to the bit allocation processing control means 112,
It may be configured such that the level of the subband signal belonging to each group is compared with the preset threshold value of each band.

【００６０】（実施の形態２）次に本発明の実施の形態
２によるデータ記録媒体について説明する。上記実施の
形態１で示した音声符号化装置あるいは符号化方法の構
成を実現するための符号化プログラムを、フロッピーデ
ィスク等のデータ記憶媒体に記録するようにすることに
より、本発明の各実施の形態で示した処理を、独立した
コンピュータシステムにおいて簡単に実施することが可
能となる。(Second Embodiment) Next, a data recording medium according to a second embodiment of the present invention will be described. By recording the coding program for realizing the configuration of the speech coding apparatus or the coding method shown in the first embodiment on a data storage medium such as a floppy disk, each embodiment of the present invention can be realized. The processing shown in the form can be easily implemented in an independent computer system.

【００６１】すなわち、図１０は、上記実施の形態１の
符号化処理を、上記符号化プログラムを格納したフロッ
ピーディスクを用いて、コンピュータシステムにより実
施する場合を説明するための図である。図１０（ａ）
は、フロッピーディスクの正面からみた外観、断面構
造、及びフロッピーディスク本体を示し、図１０（ｂ）
は、該フロッピーディスク本体の物理フォーマットの例
を示している。That is, FIG. 10 is a diagram for explaining a case where the encoding process of the first embodiment is executed by a computer system using a floppy disk storing the above encoding program. Figure 10 (a)
Fig. 10 (b) shows the external appearance of the floppy disk, the cross-sectional structure, and the main body of the floppy disk.
Shows an example of the physical format of the floppy disk body.

【００６２】上記フロッピーディスクＦＤは、上記フロ
ッピーディスク本体ＤをフロッピーディスクケースＦＣ
内に収容した構造となっており、該フロッピーディスク
本体Ｄの表面には、同心円状に外周からは内周に向かっ
て複数のトラックＴｒが形成され、各トラックＴｒは角
度方向に１６のセクタＳｅに分割されている。従って、
上記プログラムを格納したフロッピーディスクＦＤで
は、上記フロッピーディスク本体Ｄは、その上に割り当
てられた領域（セクタ）Ｓｅに、上記プログラムとして
のデータが記録されたものとなっている。In the floppy disk FD, the floppy disk body D is a floppy disk case FC.
A plurality of tracks Tr are formed concentrically from the outer circumference toward the inner circumference on the surface of the floppy disk body D, and each track Tr has 16 sectors Se in the angular direction. Is divided into Therefore,
In the floppy disk FD storing the program, the floppy disk body D has data (program) recorded in an area (sector) Se allocated thereon.

【００６３】また、図１０（ｃ）は、フロッピーディス
クＦＤに対する上記プログラムの記録、及びフロッピー
ディスクＦＤに格納したプログラムを用いた音声符号化
処理を行うための構成を示している。上記プログラムを
フロッピーディスクＦＤに記録する場合は、コンピュー
タシステムＣｓから上記プログラムとしてのデータを、
フロッピーディスクドライブＦDDを介してフロッピーデ
ィスクＦＤに書き込む。また、フロッピーディスクＦＤ
に記録されたプログラムにより、上記音声符号化装置を
コンピュータシステムＣｓ中に構築する場合は、フロッ
ピーディスクドライブＦDDによりプログラムをフロッピ
ーディスクＦＤから読み出し、コンピュータシステムＣ
ｓにロードする。Further, FIG. 10C shows a configuration for recording the above-mentioned program in the floppy disk FD and performing voice encoding processing using the program stored in the floppy disk FD. When the above program is recorded on the floppy disk FD, the data as the above program from the computer system Cs,
Writing to the floppy disk FD via the floppy disk drive FDD. In addition, floppy disk FD
When the above audio coding apparatus is constructed in the computer system Cs by the program recorded in the computer system Cs, the program is read from the floppy disk FD by the floppy disk drive FDD, and the computer system C is read.
load into s.

【００６４】なお、上記説明では、データ記録媒体とし
てフロッピーディスクを用いて説明を行ったが、光ディ
スクを用いても上記フロッピーディスクの場合と同様に
ソフトウェアによる音声符号化処理を行うことができ
る。また、記録媒体は上記光ディスクやフロッピーディ
スクに限るものではなく、ＩＣカード、ＲＯＭカセット
等、プログラムを記録できるものであればよく、これら
の記録媒体を用いる場合でも、上記フロッピーディスク
等を用いる場合と同様にソフトウェアによる音声符号化
処理を実施することができる。In the above description, a floppy disk is used as the data recording medium, but an audio disk can be used to perform voice encoding processing by software as in the case of the floppy disk. Further, the recording medium is not limited to the above-mentioned optical disk or floppy disk, but may be an IC card, a ROM cassette or the like, as long as it can record a program. Similarly, a voice encoding process by software can be performed.

【００６５】（実施の形態３）次に本発明の実施の形態
３による音声符号化方式、音声符号化装置について、図
面を参照しながら説明する。図２で示した高帯域符号化
処理手段２４の構成としては、例えば、図３に示したよ
うな構成を用いる。また、低帯域符号化処理手段２３の
構成としては、例えば、図１２示すように、図１１で示
したのと同様に、各帯域電力の偏在を利用して各帯域
（サブバンド）毎に符号化を行うような符号化方式を用
いるが、所定の聴覚心理モデル分析手段のみを利用した
各サブバンド信号に対するビット配分は行わず、新たに
処理量の少ない簡易聴覚心理モデル部４０６２を設け、
前フレームに出力された聴覚心理モデル部４０６１のマ
スキング閾値と当該フレームの帯域分割信号に基づき生
成されたビット割り当て情報からビット配分を行うこと
を可能としている。(Third Embodiment) Next, a speech coding system and a speech coding apparatus according to a third embodiment of the present invention will be described with reference to the drawings. As the configuration of the high band encoding processing means 24 shown in FIG. 2, for example, the configuration shown in FIG. 3 is used. Further, as the configuration of the low band encoding processing means 23, for example, as shown in FIG. 12, as in the case shown in FIG. 11, the uneven distribution of the power of each band is used to encode each band (subband). Although a coding method for performing encoding is used, bit allocation is not performed for each subband signal using only a predetermined psychoacoustic model analysis unit, and a simple psychoacoustic model unit 4062 having a small processing amount is newly provided.
Bit allocation can be performed based on the masking threshold of the psychoacoustic model unit 4061 output in the previous frame and the bit allocation information generated based on the band division signal of the frame.

【００６６】すなわち、図１２は図２に示した低帯域符
号化処理手段２３の詳細な構成を示すブロック図であ
り、４０１は符号化器であり、後述するサブパンド分析
手段４０２，スケールファクタ抽出手段４０３，ビット
削り当て処理制御手段４０４，ＦＦＴ処理手段４０５，
聴覚心理分析乎段４０６，聴覚心理モデルピツト割り当
て手段４０７，量子化／符号化手段４０８，補助情報符
号化手段４０９，ビットストリーム形成手段４１０から
構成されている。That is, FIG. 12 is a block diagram showing a detailed configuration of the low band encoding processing means 23 shown in FIG. 2, and 401 is an encoder, which is a sub-pand analysis means 402 and scale factor extraction means which will be described later. 403, bit cutting processing control means 404, FFT processing means 405,
It is composed of a psychoacoustic analysis stage 406, a psychoacoustic model bit allocation means 407, a quantization / encoding means 408, an auxiliary information encoding means 409, and a bit stream forming means 410.

【００６７】以下、動作について説明する。上記サブバ
ンド分析手段４０２は、入力されたデジタルオーデイオ
信号を３２個の周波数成分に分割する。スケールファク
タ抽出手段４０３は、各サブバンド信号に対するスケー
ルファクタを計算し、各サブバンドダイナミックレンジ
をそろえる。ＦＦＴ処理手段４０５は、入力されたデジ
タルオーディオ信号に対して、高速フーリエ変換を施
す。聴覚心理分析手段４０６は、例えば、ＭＰＥＧによ
って指定された通常聴覚心理モデル部４０６１，及び上
記通常聴覚心理モデル部４０６１に比べて処理量の少な
い上記簡易聴覚心理モデル部４０６２からなり、各モデ
ルは信号対マスク比を算出する。The operation will be described below. The subband analysis means 402 divides the input digital audio signal into 32 frequency components. The scale factor extraction means 403 calculates a scale factor for each subband signal and aligns each subband dynamic range. The FFT processing means 405 performs a fast Fourier transform on the input digital audio signal. The psychoacoustic analysis unit 406 includes, for example, a normal auditory psychological model unit 4061 designated by MPEG and the above-described simple auditory psychological model unit 4062 which has a smaller processing amount than the normal auditory psychological model unit 4061, and each model is a signal. Calculate the mask-to-mask ratio.

【００６８】なお、上記通常聴覚心理モデル部４０６１
は、下記の数式３に基づき各サブバンド信号の信号対マ
スク比を算出するのに対し、上記簡易聴覚心理モデル部
４０６２は、下記の数式４に示すように、該当フレーム
においては、各サブハンドの最小マスキング・レベルの
算出は行なわず、上記通常聴覚心理モデル部４０６１に
よって算出された最近の前フレームでの最小マスキング
・レベルを用い、音圧は該当フレームのスケールファク
タ抽出手段４０３によって抽出されたスケール・ファク
タ値を用いて信号対マスク比を算出する。The normal auditory psychology model unit 4061 is used.
While the signal-to-mask ratio of each sub-band signal is calculated based on Equation 3 below, the simplified psychoacoustic model 4062 described above calculates each sub-hand of each sub-hand in the corresponding frame as shown in Equation 4 below. The minimum masking level is not calculated, but the minimum masking level in the most recent previous frame calculated by the normal psychoacoustic model unit 4061 is used, and the sound pressure is the scale extracted by the scale factor extraction unit 403 of the corresponding frame. -Calculate the signal-to-mask ratio using the factor value.

【数３】ここで、 Lsb(n)：各サブバンドの奢圧ＬＴ_min(n)：各サブバンドの最小マスキング・レベルである。[Equation 3] Here, Lsb (n): pressure of each subband LT _min (n): minimum masking level of each subband.

【数4】ここで、 Lsb(n)＝20・log(scf_max(n) ・32768-10) db scf_max(n)：該当フレームの各サブバンドに対するスケ
ール・ファクタ値ＬＴ_min(n)：最近の上記通常聴覚心理モデル部４０６１
にて算出された各サブバンドの最小マスキング・レベル[Equation 4] Here, Lsb (n) = 20.log (scf _max (n) .32768-10) db scf _max (n): Scale factor value for each subband of the frame LT _min (n): Recent normal Auditory psychology model unit 4061
Minimum masking level for each sub-band calculated in

【００６９】ビット割り当て処理制御手段４０４は、処
理量制御情報１２１の情報を基に、本実施の形態３で
は、図１３に示すように、Ｎを３とし、低負荷処理が実
現可能な上記簡易聴覚心理モデル部４０６２と高□質化
を実現することができる最適なビット配分情報を出力す
ることが可能な上記通常聴覚心理モデル部４０６１を、
何フレームに一度行なうかの制御，及びＦＦＴ処理手段
４０５における高速フーリエ変換を行なうか否かの制御
を行なう。例えば、図１３の状態において、処理量制御
情報１２１の情報として、符号化処理に割くＣＰＵ占有
率を下げるという情報が、上記ビット割り当て処理制御
手段４０４に知らされると、処理量の小さい上記簡易聴
覚心理モデル部４０６２の使用を多くするために、Ｎの
値を大きくする。反対に、符号化処理に割くＣＰＵ占有
率をもっと使用してもよいという情報が、上記ビット割
り当て処理制御手段４０４に知らされると、高音質化が
実現可能な上記通常聴覚心理モデル部４０６１の使用を
多くするためにＮの値を小さくする。これにより、処理
量の制御を可能とすることができる。In the third embodiment, the bit allocation processing control means 404 sets N to 3 based on the information of the processing amount control information 121, as shown in FIG. The psychoacoustic model unit 4062 and the normal psychoacoustic model unit 4061 capable of outputting optimum bit allocation information capable of realizing high quality,
It controls how many frames are performed once and whether or not the fast Fourier transform is performed in the FFT processing means 405. For example, in the state of FIG. 13, when the bit allocation processing control means 404 is informed as the information of the processing amount control information 121 that the CPU occupancy rate for the encoding process is reduced, the above-mentioned simple operation with a small processing amount is performed. In order to increase the usage of the psychoacoustic model 4062, the value of N is increased. On the contrary, when the bit allocation processing control means 404 is notified of the information that the CPU occupancy rate for the encoding processing may be used more, the normal auditory psychology model unit 4061 of the normal auditory psychology model unit 4061 capable of realizing high sound quality. Lower the value of N for greater use. Thereby, it is possible to control the processing amount.

【００７０】聴覚心理モデルビット割り当て手段４０７
は、上記ビット割り当て処理制御手段４０４からの情報
である信号対マスク比の関係から上記サブパンド分析手
段４０２により分割された各サブパンド信号に対し、ピ
ツトの割り当てを行なう。量子化／符号化手段４０８に
より、各サブバンド信号の量子化，及び符号化を行い、
補助情報符号化手段４０９からの補助データと共にビツ
トストリーム形成手段４１０によりピツトストリームが
形成され出力される。Auditory psychological model bit allocation means 407
Assigns bits to each sub-pand signal divided by the sub-pand analysis means 402 based on the signal-to-mask ratio relationship which is information from the bit allocation processing control means 404. The quantizing / encoding means 408 quantizes and encodes each subband signal,
A bit stream is formed by the bit stream forming means 410 together with the auxiliary data from the auxiliary information encoding means 409 and is output.

【００７１】このように本実施の形態３によれば、ビッ
ト割り当てをＮフレームに一度の割合で行うようにした
ので、時間軸方向でのＣＰＵ負荷を低減することができ
るようになる。As described above, according to the third embodiment, bit allocation is performed once every N frames, so that the CPU load in the time axis direction can be reduced.

【００７２】なお、ここでは仮に符号化器４０１を、図
２に示した低帯域符号化処理手段２３としたが、低帯域
信号に対してのみ適用されるのではなく、全帯域信号に
対して適用するようにしてもよい。Although the encoder 401 is assumed to be the low band encoding processing means 23 shown in FIG. 2 here, it is not applied only to the low band signal but to the full band signal. You may make it apply.

【００７３】（実施の形態４）次に本発明の実施の形態
４による音声符号化方式、音声符号化装置について、図
面を参照しながら説明する。図１４は、図１１で示した
のと同様に、各帯域電力の偏在を利用して各帯域（サブ
バンド）毎に符号化を行なうような符号化方式を用いて
いるが、出カビットストリーム中に、オーディオデータ
以外の外部データを付加する機能を備えている点が異な
る。上記外部データとしては、画像データやテキストデ
ータなどが想定される。(Embodiment 4) Next, a speech coding system and speech coding apparatus according to a fourth embodiment of the present invention will be described with reference to the drawings. Similar to the case shown in FIG. 11, FIG. 14 uses an encoding method for performing encoding for each band (subband) by utilizing uneven distribution of power of each band. It differs in that it has a function to add external data other than audio data. Image data, text data, and the like are assumed as the external data.

【００７４】すなわち、図１４に示す符号化器５０１
は、後述するサブパンド分析手段５０２，スケールファ
クタ抽出手段５０３，ＦＦＴ処理手段５０４，聴覚心理
分析手段５０５，ビット割り当て手段５０６，量子化／
符号化手段５０７，補助情報符号化手段５０８，ビット
ストリーム形成手段５０９，ビット割り当て処理制御手
段５１０，付加データ符号化手段５１１から構成されて
いる。That is, the encoder 501 shown in FIG.
Are sub-band analysis means 502, scale factor extraction means 503, FFT processing means 504, psychoacoustic analysis means 505, bit allocation means 506, quantization /
The encoding unit 507, the auxiliary information encoding unit 508, the bit stream forming unit 509, the bit allocation processing control unit 510, and the additional data encoding unit 511 are included.

【００７５】以下、動作について説明する。上記サブバ
ンド分析手段５０２は、入力されたデジタルオーディオ
信号を３２個の周波数成分に分割する。スケールファク
タ抽出手段５０３は、各サブバンド信号に対するスケー
ルファクタを計算し、各サブバンドダイナミックレンジ
をそろえる。ＦＦＴ処理手段５０４は、入力されたデジ
タルオーディオ信号に対して、高速フーリエ変換を施
す。聴覚心理分析手段５０５は、例えば、ＭＰＥＧによ
って指定された聴覚心理モデルにより、信号対マスク比
を算出する。The operation will be described below. The subband analysis means 502 divides the input digital audio signal into 32 frequency components. The scale factor extraction means 503 calculates a scale factor for each subband signal and aligns each subband dynamic range. The FFT processing means 504 performs a fast Fourier transform on the input digital audio signal. The psychoacoustic analysis unit 505 calculates the signal-to-mask ratio using, for example, a psychoacoustic model specified by MPEG.

【００７６】ビット割り当て処理制御手段５１０は、出
カビットストリーム中に付加するデータを一時的に格納
しておく付加データバッファ５１２を監視し、付加デー
タがあるか否かの判断、あるいは付加データがオーバフ
ローするか否かの判断から生成された割り当て範囲制御
情報５１３を基に、ビット割り当て手段５０６に対し、
ビット割り当てを行なう範囲を指定する。The bit allocation processing control means 510 monitors the additional data buffer 512 for temporarily storing the data to be added in the output bit stream, and judges whether or not there is additional data, or Based on the allocation range control information 513 generated from the judgment of whether or not to overflow, the bit allocation means 506
Specify the range for bit allocation.

【００７７】例えば、付加データバッファ５１２にデー
タが存在しない場合には、図１５に示すように、サブバ
ンド０〜２９に対してビット割り当てが行われる。この
場合、全体の割り当て可能ビット数を１００とし、サブ
バンド０〜１５に８０ビット、サブバンド１６〜２９に
２０ビットが割り当てられている。For example, when there is no data in the additional data buffer 512, bits are assigned to the subbands 0 to 29 as shown in FIG. In this case, assuming that the total number of assignable bits is 100, 80 bits are assigned to subbands 0 to 15 and 20 bits are assigned to subbands 16 to 29.

【００７８】そして、付加データバッファ５１２に外部
よりデータが書き込まれ、付加データが存在する状態に
なった場合には、つまり、割り当て範囲制御情報５１３
として、付加データを挿入するという指示が、ビット割
り当て処理制御手段５１０に対して知らされ、本実施の
形態４では、例えば、サブバンド０〜１５に８０ビット
を割り当て、本来割り当てられるべきサブバンド１６〜
２９に対してはビット割り当てを行なわず、余った２０
ビットをデータの付加ビット数として割り当てられる。
また、ビット割り当てが行われないサブバンド１６以降
のサブバンドに対しては、処理量を削減するために、該
当範囲のＦＦＴ処理，及び聴覚心理分析を行なわないよ
うにしてもよい。When the external data is written in the additional data buffer 512 and the additional data exists, that is, the allocation range control information 513.
As a result, the instruction to insert the additional data is notified to the bit allocation processing control means 510, and in the fourth embodiment, for example, 80 bits are allocated to the subbands 0 to 15, and the subband 16 to be originally allocated is assigned. ~
Bit allocation is not performed for 29, and the remaining 20
Bits are assigned as the number of additional bits of data.
Further, in order to reduce the amount of processing, the FFT processing and the psychoacoustic analysis in the corresponding range may not be performed on the subbands 16 and after in which the bit allocation is not performed.

【００７９】そして、上記ビット割り当てが行われたサ
ブバンドに対し、量子化／符号化手段５０７により、量
子化，及び符号化を行い、補助情報符号化手段５０８か
らの補助データと、例えば、ＭＰＥＧのアンシラリーデ
ータとして符号化された付加データと共に、ビットスト
リーム形成手段５０９によりットストリームが形成され
出力される。Quantization / encoding means 507 quantizes and encodes the sub-band to which the above bit allocation has been performed, and the auxiliary data from auxiliary information encoding means 508 and, for example, MPEG. With the additional data encoded as the ancillary data of 1, the bit stream is formed by the bit stream forming means 509 and output.

【００８０】このように本実施の形態４によれば、定ビ
ットレートでの伝送を行う際に、オーディオデータ以外
の付加データの量に応じて、符号化時のビット割り当て
範囲を制御して符号化するオーディオデータの量を可変
として符号化データストリーム中に付加データを挿入す
るようにしたので、余剰帯域に様々なデータを重畳して
帯域を有効に利用することができる。なお、ビット割り
当て処理制御乎段５１０により実行されるビット割りあ
て範囲の制御は、フレーム単位で行い、そのビット割り
当て範囲も付加データバツファ５１２のデータ量に応じ
て、可変可能としている。As described above, according to the fourth embodiment, when transmission is performed at a constant bit rate, the bit allocation range at the time of encoding is controlled according to the amount of additional data other than audio data and the encoding is performed. Since the amount of audio data to be converted is variable and the additional data is inserted into the encoded data stream, various data can be superimposed on the surplus band and the band can be effectively used. The control of the bit allocation range executed by the bit allocation processing control stage 510 is performed on a frame-by-frame basis, and the bit allocation range is also variable according to the data amount of the additional data buffer 512.

【００８１】これらの処理により、付加データ挿入時に
おいてもピット割り当て範囲内の音質を損なうことな
く、リアルタイムにデータ挿入量の制御を可能とするこ
とができる。By these processes, even when the additional data is inserted, the data insertion amount can be controlled in real time without impairing the sound quality within the pit allocation range.

【００８２】（実施の形態５）次に本発明の実施の形態
５による音声符号化方式、音声符号化装置について、図
面を参照しながら説明する。図１６は本実施の形態５に
よる音声符号化方式を用いた音声符号化装置の符号化器
の構成を示すブロック図であり、図において、図２と同
一符号は同一または相当部分を示し、１６０〜１６２は
それぞれ独立的に動作可能な符号化処理手段Ａ〜Ｃ、１
６３は各符号化処理手段Ａ〜Ｃの処理負荷値情報を格納
するための処理負荷値格納バッファ、１６４は上記各符
号化処理手段Ａ〜Ｃにサンプルとなるデータを供給する
ためのサンプルデータバッファである。(Embodiment 5) Next, a speech coding system and speech coding apparatus according to a fifth embodiment of the present invention will be described with reference to the drawings. FIG. 16 is a block diagram showing the configuration of the encoder of the speech coding apparatus using the speech coding system according to the fifth embodiment. In the figure, the same reference numerals as those in FIG. 2 indicate the same or corresponding portions, and 160 ˜162 are encoding processing means A to C and 1 capable of operating independently.
Reference numeral 63 is a processing load value storage buffer for storing the processing load value information of each of the encoding processing means A to C, and 164 is a sample data buffer for supplying sample data to each of the encoding processing means A to C. Is.

【００８３】次に動作について説明する。符号化処理を
行う前の初期化時に、まず、サンプルデータバッファ１
６４に格納されている所定のサンプルデータを各符号化
処理手段Ａ〜Ｃに供給し、これによって発生する符号化
処理手段Ａ〜Ｃ、あるいは聴覚心理モデルの処理負荷値
を処理負荷値格納バッファ１６３に格納する。Next, the operation will be described. At initialization before encoding processing, first, the sample data buffer 1
The predetermined sample data stored in 64 is supplied to the respective encoding processing means A to C, and the processing load values of the encoding processing means A to C or the psychoacoustic model generated thereby are stored in the processing load value storage buffer 163. To store.

【００８４】そして、上記処理負荷値を昇順、もしくは
降順にて出力することによって、装置にて使用されるＣ
ＰＵの性能に見合った符号化処理手段を迅速に選択し、
当該符号化処理手段によって符号化処理を行う。符号化
処理の内容については、実施の形態１で示したものと同
じであるので個々では省略する。Then, by outputting the processing load values in ascending or descending order, the C used in the apparatus can be obtained.
Quickly select the encoding processing means that matches the performance of the PU,
Encoding processing is performed by the encoding processing means. The content of the encoding process is the same as that shown in the first embodiment, and therefore will not be repeated here.

【００８５】このように本実施の形態によれば、符号化
処理前の初期化時に、サンプルデータを用いて各符号化
処理手段を動作させ、そのときの負荷値を取得して、使
用するＣＰＵの処理能力に適した符号化処理手段を選択
して用いるようにしたので、ＣＰＵの負荷が減少して、
最適な符号化処理を行うことができるようになる。As described above, according to this embodiment, at the time of initialization before encoding processing, each encoding processing means is operated using sample data, the load value at that time is acquired, and the CPU to be used Since the encoding processing means suitable for the processing capacity of is selected and used, the load on the CPU is reduced,
It becomes possible to perform an optimum encoding process.

【００８６】なお、以上の各実施の形態では、音声符号
化装置として、ＰＣを用いて実現する構成を例に挙げて
説明したが、例えばＶＴＲカメラやＤＶＤエンコーダな
どの機器に組み込んで用いるような場合にも適用するこ
とができる。In each of the above-described embodiments, a configuration in which a PC is used as an audio encoding device has been described as an example, but it is used by being incorporated in a device such as a VTR camera or a DVD encoder. It can also be applied in cases.

【００８７】また、上記各実施の形態では、音声のみを
取り扱うようにしたが、音声とともに映像を処理する場
合には、図１７に示すように、図２の構成において、音
声信号とは別に映像信号を入力し、低帯域符号化処理手
段と高帯域符号化処理手段に代えて、映像符号化処理手
段１７０と音声符号化処理手段１７１を設け、さらに、
ビットストリーム形成処理手段に代えてシステムストリ
ーム処理手段１７２を設けた構成とすることにより、対
応することができる。以上のような構成を用いて、外部
からの制御情報に基づいて、上記各実施の形態で説明し
たような方法で、音声符号化の演算量を変更したり、演
算量の異なる複数の音声符号化方式を切り替えたりする
ことによって、ＣＰＵとしての全体の演算量を制御する
ことが可能となる。また、あるいは、符号化すべき音声
信号の量に応じて、符号化する映像信号の処理量を変化
させるように構成してもよい。In each of the above embodiments, only the audio is handled. However, when processing the video together with the audio, as shown in FIG. 17, in the configuration of FIG. A signal is input, and video coding processing means 170 and audio coding processing means 171 are provided in place of the low band coding processing means and the high band coding processing means, and further,
This can be dealt with by providing the system stream processing means 172 instead of the bit stream forming processing means. Using the configuration as described above, based on the control information from the outside, the operation amount of voice encoding can be changed or a plurality of voice codes having different operation amounts can be changed by the method described in each of the above embodiments. It is possible to control the total amount of calculation as the CPU by switching the conversion method. Alternatively, the processing amount of the video signal to be encoded may be changed according to the amount of the audio signal to be encoded.

【００８８】さらに、サブバンド方式のコーディングを
行うＭＰＥＧ１以外に、時間／周波数変換を行うＭＰＥ
Ｇ２，ＡＡＣ，ＤｏｌｂｙＡＣ−３、ＡＴＲＡＣ（Ｍ
Ｄ）などのコーディング方式を行う場合についても、符
号化処理に関わる各手段を、図１８に示すように、演算
量の異なる第１の量子情報算出手段１８１と第2の量子
化情報算出手段１８２に置換するとともに、これらを量
子化手段制御手段１８０によって選択して使用し、符号
化情報の代わりに量子化情報を取り扱う構成とすること
で、同様に対応することが可能である。Further, in addition to MPEG1 which performs subband coding, MPE which performs time / frequency conversion
G2, AAC, Dolby AC-3, ATRAC (M
Also in the case of performing the coding method such as D), as shown in FIG. 18, the respective units related to the encoding process are the first quantum information calculating unit 181 and the second quantum information calculating unit 182 which have different calculation amounts. It is possible to deal with the same problem by substituting the same with the above and selecting and using them by the quantizing means controlling means 180 to handle the quantized information instead of the encoded information.

【００８９】[0089]

【発明の効果】以上のように、この発明の請求項１にか
かる音声符号化方式によれば、デジタルオーディオ信号
を複数の周波数帯域に分割し、各帯域毎に符号化を行う
音声符号化方式であって、上記分割された各帯域に対す
るビット割り当て情報を生成し、それぞれ処理量の異な
るビット割り当て手段を複数有し、外部からの制御情報
に基づいて、上記複数のビット割り当て手段の中から、
所定のビット割り当て手段を用いて処理がなされるよう
に、使用するビット割り当て手段を切り替えてビット割
り当てを実行して、符号化を行い、上記外部からの制御
情報として、符号化を行う時に占有可能な中央演算処理
装置の処理量を表す負荷値を用い、上記負荷値に基づい
て、上記中央演算処理装置上で各ビット割り当て手段を
用いて符号化を行った時の各処理量を予め記憶したデー
タテーブルを参照して、上記占有可能な中央演算処理装
置の処理量を超えないようビット割り当て手段の選択を
行うようにしたので、常に最適な処理量のビット割り当
て手段を選択して使用することができ、稼動状態におい
て占有できるＣＰＵの処理量を超えないような符号化が
可能となり、リアルタイムエンコード時に処理が入力信
号に対して間に合わないということがない、つまり再生
音に音切れがない符号化を行うことができるという効果
がある。さらに、中央演算処理装置は常に稼動能力を超
えるような要求を受けることが無くなり、システム全体
の制御をスムーズに行うことができるという効果もあ
る。 As described above, according to the voice encoding system according to the first aspect of the present invention, the voice encoding system that divides the digital audio signal into a plurality of frequency bands and encodes each frequency band. The bit allocation information for each of the divided bands is generated, each of the plurality of bit allocation means having a different processing amount is provided, and based on control information from the outside, among the plurality of bit allocation means,
As processing using a predetermined bit allocation means is performed, by executing a bit allocation switching the bit allocation means to be used, have rows coding control from the outside
As information, central processing that can be occupied when encoding
Based on the above load value, using the load value that represents the processing amount of the device
Then, on the above central processing unit, each bit allocation means
Data that has been stored in advance for each processing amount when encoding using
The central processing unit that can be occupied by
Since the bit allocation means is selected so as not to exceed the maximum processing amount, the bit allocation means having the optimum processing amount can always be selected and used, and the CPU allocation that can be occupied in the operating state is limited. It is possible to perform encoding that does not exceed the processing amount, and it is possible to perform encoding in which the processing does not catch up with the input signal at the time of real-time encoding, that is, the reproduced sound can be encoded without sound interruption. Furthermore, the central processing unit always exceeds the operating capacity.
The entire system is no longer required
There is also an effect that can be smoothly controlled
It

【００９０】また、本発明の請求項２にかかる音声符号
化方式によれば、上記請求項１記載の行う音声符号化方
式において、上記負荷値として、符号化処理を行うため
に占有可能な上記中央演算処理装置の処理量を監視する
監視手段からの処理量制御情報を用いるとしたので、占
有可能な中央演算処理装置の最高パフォーマンスの範囲
内で、最適な処理量のビット割り当て手段を選択するこ
とができ、リアルタイムエンコード時に処理が入力信号
に対して間に合わないということがない、つまり再生音
に音切れがない符号化を行うことができるという効果が
ある。According to a second aspect of the present invention, there is provided a speech encoding method according to the first aspect.
In the formula, to perform encoding processing as the above load value
Monitor the processing amount of the central processing unit that can be occupied by
Since the processing amount control information from the monitoring means is used,
Range of highest possible central processor performance
Select the bit allocation method with the optimum
And can process the input signal during real-time encoding
There is nothing that can not be done in time, that is, the playback sound
There is an effect that it is possible to perform coding without sound interruption .

【００９１】また、本発明の請求項３にかかる音声符号
化方式によれば、上記請求項１記載の音声符号化方式に
おいて、上記ビット割り当て手段によるビット割り当て
処理として、符号化データの高音質化を実現可能な高効
率にビット割り当てを行う高効率ビット割り当て方法を
用いた処理と、該高効率ビット割り当て方法を用いた処
理と比較して処理量の少ない低負荷でビット割り当てを
行う低負荷ビット割り当て方法を用いた処理を行うよう
にしたので、符号化器が符号化データの高音質化を優先
するか、又は音質よりも符号化処理の低負荷性を優先す
るかの処理を適宜切り替えて実行することができる符号
化を実現することができるという効果がある。[0091] Further, according to the speech encoding method according to claim 3 of the present invention, the speech coding method according to the first aspect, the bit allocation by the bit allocation means
As processing, high efficiency that can realize high quality of encoded data
A high-efficiency bit allocation method that allocates bits to rates
The processing used and the processing using the high-efficiency bit allocation method
Bit allocation with low load and less processing compared to
Perform processing using the low-load bit allocation method
Since it is set, the encoder gives priority to improving the quality of encoded data.
Or prioritize low load of encoding processing over sound quality
Code that can be executed by appropriately switching Ruka processing
There is an effect that can be realized .

【００９２】また、本発明の請求項４にかかる音声符号
化方式によれば、上記請求項１記載の音声符号化方式に
おいて、上記符号化時に使用されるビット割り当て手段
の切り替えを、オーディオ信号に復号可能な最小単位で
あるフレーム単位で行うようにしたので、リアルタイム
エンコード時に、動作ＣＰＵ上で該ＣＰＵを共有する他
のアプリケーションなどのＣＰＵ占有率が突然大きくな
った場合などにおいても、フレーム単位時間で符号化処
理が占有できるＣＰＵの処理量に追従可能となり、ま
た、リアルタイムで音質や処理量を制御可能とすること
ができるという効果がある。According to the speech coding method of claim 4 of the present invention, in the speech coding method of claim 1, the bit allocating means used at the time of the coding is used.
Switching with the smallest unit that can be decoded into an audio signal.
Since it was done in a certain frame unit, real time
Share the CPU on the operating CPU when encoding
CPU occupancy of applications such as
Even when there is a problem, the encoding process is performed in frame unit time.
It becomes possible to follow the processing amount of the CPU that the logic can occupy.
Moreover, there is an effect that the sound quality and the processing amount can be controlled in real time .

【００９３】また、本発明の請求項５にかかる音声符号
化方式によれば、上記請求項１記載の音声符号化方式に
おいて、複数の周波数帯域に分割された各帯域のサブバ
ンド信号を、各々予め定められた所定個数のサブバンド
信号からなるグループとなるようにグループ分けを行
い、各グループに対して独立したビット割り当て処理を
行い、各帯域に対するビット割り当て情報を生成するよ
うにしたので、各帯域毎の特性に応じたビット割り当て
処理を選択して符号化を行うことができるという効果が
ある。According to the speech coding system of claim 5 of the present invention, in the speech coding system of claim 1 , subbands of each band divided into a plurality of frequency bands are used.
Band signal to a predetermined number of subbands
Group into groups of signals
Independent bit allocation processing for each group
And generate bit allocation information for each band.
Since this is done, bit allocation according to the characteristics of each band
There is an effect that encoding can be performed by selecting a process .

【００９４】また、本発明の請求項６にかかる音声符号
化方式によれば、上記請求項５記載の音声符号化方式に
おいて、上記グループ分けが、グループの数、又はグル
ープ内の周波数軸方向に連続したサブバンド信号の数
を、上記外部からの制御情報により指定された数、又は
上記監視手段からの処理量制御情報に基づいて指定され
た数となるように可変的に行われるようにしたので、Ｃ
ＰＵの使用状況に応じてダイナミックにグループ分けを
行うことができるという効果がある。According to a sixth aspect of the present invention, in the voice encoding method according to the fifth aspect , the grouping is performed by the number of groups or the group.
Number of consecutive subband signals along the frequency axis in the loop
Is the number specified by the control information from the outside, or
Specified based on the throughput control information from the monitoring means
Since it is designed to be performed variably so that
There is an effect that grouping can be dynamically performed according to the usage status of PUs .

【００９５】また、本発明の請求項７にかかる音声符号
化方式によれば、上記請求項６記載の音声符号化方式に
おいて、上記サブバンド信号の数の変更を、オーディオ
信号に復号可能な最小単位であるフレーム単位で行うよ
うにしたので、ビット割り当て方式の変更をきめ細かく
行うことができ、高精度な符号化器を実現することがで
きるという効果がある。According to a seventh aspect of the present invention, in the speech encoding method according to the sixth aspect , the change in the number of the sub-band signals is changed to the audio.
It is performed in frame units, which is the smallest unit that can be decoded into a signal.
Since it has been done so, change the bit allocation method carefully
There is an effect that it can be performed and a highly accurate encoder can be realized .

【００９６】また、本発明の請求項８にかかる音声符号
化方式によれば、請求項７記載の音声符号化方式におい
て、上記グループ分け時に、ビット割り当てを行わない
グループを少なくとも１つ設けるようにしたので、オー
ディオ信号に復号可能な最小単位であるフレーム単位
で、グループの数、またはグループ内の周波数軸方向に
連続したサブバンド信号の数を、外部からの制御情報に
より指定された数、または監視手段からの処理量制御情
報により指定された数に変えることで、ビット割り当て
が行われないグループに属する帯域の信号を符号化処理
する必要がなくなり、また、ビット割り当てが行われな
いグループに属する帯域に割り当てられるべきビット
を、ビット割り当てが行われる他のグループの帯域に分
配することができ、その結果、符号化処理が占有するＣ
ＰＵの処理量を制御可能となるとともに、ビット割り当
てが行われる他のグループの帯域の音質を向上すること
ができるという効果がある。[0096] Further, according to the speech encoding method according to claim 8 of the present invention, in the audio coding method of 請 Motomeko 7, wherein when in said grouping is not performed bit allocation
I have set up at least one group, so
Frame unit, which is the smallest unit that can be decoded into a Dio signal
In the direction of the number of groups or the frequency axis within the group.
The number of consecutive sub-band signals is used as control information from the outside.
More specified number, or throughput control information from monitoring means
Bit allocation by changing to the number specified by
Process the signals in the band that belongs to the group
And there is no need for bit allocation.
Bits that should be allocated to the band that belongs to the group
To the bandwidth of other groups where the bit allocation is done.
C, which can be allocated as a result of which the encoding process occupies
It is possible to control the amount of PU processing and allocate bits.
There is an effect that it is possible to improve the sound quality of the bands of other groups in which

【００９７】また、本発明の請求項９にかかる音声符号
化方式によれば、上記請求項５記載の音声符号化方式に
おいて、上記サブバンド信号のグループ分けにより、低
帯域に属するグループにグループ分けされたサブバンド
信号に対し、上記高効率ビット割り当て方法を用いた処
理により、グループ内のサブバンド信号にビット割り当
てを行い、一方、高帯域に属するグループにグループ分
けされたサブバンド信号に対し、上記低負荷ビット割り
当て方法を用いた処理により、グループ内のサブバンド
信号にビット割り当てを行うようにしたので、人間の耳
に対して感度のよい低帯域については符号化データの高
音質化を図ることができ、一方、人の耳に対して感度の
悪い高帯域については処理量優先の低負荷ビット割り当
てを行うことができるようになり、全体として処理量を
削減した符号化を行うことができるという効果がある。According to a ninth aspect of the present invention, in the voice encoding method according to the fifth aspect , the sub-band signals are grouped into a low-level signal.
Subbands grouped into groups that belong to the band
Signals are processed using the above-mentioned high-efficiency bit allocation method.
By reason, bit allocation is performed on the subband signals within the group.
On the other hand, on the other hand,
The above-mentioned low-load bit allocation
Subbands within a group can be
Since the bits are assigned to the signals, the human ear
For low band, which is sensitive to
The sound quality can be improved, while the sensitivity of the human ear is reduced.
Low load bit allocation with priority on processing amount for bad high bandwidth
Can be performed, and the processing amount as a whole
There is an effect that reduced coding can be performed .

【００９８】また、本発明の請求項１０にかかる音声符
号化方式によれば、上記請求項５記載の音声符号化方式
において、各グループ毎に独立したビット割り当て手段
に対する割り当て可能ビット数を決定する割り当て可能
ビット演算手段を設け、各グループのグループ全体に対
する割合に各グループに属する各帯域毎の特性に基づい
た重み付けを加味したものを用いて、グループ全体に対
する割り当て可能ビット数を、各グループ毎に独立した
ビット割り当て手段に対し振り分けるようにしたので、
入力信号、又は各帯域の特性に対して、聴覚特性を考慮
した符号化データの高音質を実現するのに最適な各グル
ープのビット割り当て手段に対するビット配分が可能な
符号化を行うことができるという効果がある。According to a tenth aspect of the speech encoding system of the present invention, in the speech encoding method according to the fifth aspect , independent bit allocation means is provided for each group.
Assignable to determine the number of assignable bits for
A bit operation means is provided so that each group
Based on the characteristics of each band belonging to each group
The weight of the entire group,
The number of assignable bits to be set is independent for each group.
Since it is distributed to the bit allocation means,
Consideration of auditory characteristics for input signal or characteristics of each band
Each group that is most suitable for achieving high sound quality of encoded data
There is an effect that it is possible to perform coding that enables bit allocation to the bit allocation means of the loop.

【００９９】また、本発明の請求項１１にかかる音声符
号化方式によれば、上記請求項１０記載の音声符号化方
式において、各グループに属する各帯域毎の特性に基づ
いた重み付けを、各帯域毎の所定の最小可聴限界値に基
づいた重み付けとすることにより、上記人間が聴く際に
意味の有る効果的なビット割り当て処理を行うことがで
きるという効果がある。According to the eleventh aspect of the speech coding system of the present invention, the speech coding system according to the tenth aspect is based on the characteristics of each band belonging to each group.
Based on the specified minimum audible limit for each band.
By weighting based on
There is an effect that a meaningful and effective bit allocation process can be performed.

【０１００】また、本発明の請求項１２にかかる音声符
号化方式によれば、上記請求項１０記載の音声符号化方
式において、各グループに属する各帯域毎の特性に基づ
いた重み付けを、入力デジタルオーディオ信号にサブバ
ンド分析を施して得られる、各グループに属する各周波
数帯域のサブバンド信号レベルに基づいた重み付けとす
ることにより、効果的なビット割り当て処理を行うこと
ができるという効果がある。According to a twelfth aspect of the speech coding system of the present invention, in the speech coding system according to the tenth aspect , weighting based on the characteristics of each band belonging to each group is applied to the input digital signal. Sub signal to audio signal
Frequency that belongs to each group, which is obtained by performing a band analysis.
By weighting based on subband signal levels of several bands, there is an effect that it is possible to perform effective bit allocation process.

【０１０１】また、本発明の請求項１３にかかる音声符
号化方式は、上記請求項１０記載の音声符号化方式にお
いて、各グループに属する各帯域毎の特性に基づいた重
み付けを、入力デジタルオーディオ信号に線形変換を施
して得られる、各グループに属するスペクトル信号レベ
ルに基づいた重み付けとすることにより、効果的なビッ
ト割り当て処理を行うことができるという効果がある。[0102] Also, according to claim 13 in such a speech coding scheme of the present invention, in the audio coding method of the claim 10, wherein, the weighting based on characteristics of respective bands belonging to each group, the input digital audio There is an effect that effective bit allocation processing can be performed by performing weighting based on the spectrum signal level belonging to each group, which is obtained by performing linear conversion on the signal.

【０１０２】また、本発明の請求項１４にかかる音声符
号化方式によれば、上記請求項５記載の音声符号化方式
において、各グループに属する信号レベルが、所定のし
きい値以上の高レベルな信号に対しては、上記高効率ビ
ット割り当て方法を用いた処理により、ビット割り当て
を行い、各グループに属する信号レベルが、所定のしき
い値以下の低レベルな信号に対しては、上記低負荷ビッ
ト割り当て方法を用いた処理により、ビット割り当てを
行うようにしたので、他の帯域に比べ符号化データとし
てそれほど重要でない信号に対し、処理負荷を割くこと
なく、符号化データの高音質化を図ることができる符号
化を行うことができるという効果がある。 According to the speech coding system of claim 14 of the present invention, in the speech coding system of claim 5 , the signal levels belonging to each group have a predetermined level.
For high level signals above the threshold value, the high efficiency
Bit allocation by processing using the bit allocation method
The signal level belonging to each group is
For low level signals below
Bit allocation by processing using the
Since it was done, it is treated as encoded data compared to other bands.
Processing load on less important signals
Code that can improve the sound quality of encoded data
There is an effect that it is possible to perform the reduction.

【０１０３】また、本発明の請求項１５にかかる音声符
号化方式によれば、上記請求項１４記載の音声符号化方
式において、上記各グループに属する信号レベルを、入
力デジタルオーディオ信号にサブバンド分析を施して得
られるサブバンド信号レベルとすることにより、効果的
なビット割り当て処理を行うことができるという効果が
ある。[0103] Further, according to the speech encoding method according to claim 15 of the present invention, in the audio coding method of the claim 14, the signal level belonging to each group, enter
Obtained by subjecting the digital audio signal to subband analysis
Effective by setting the sub-band signal level
There is an effect that various bit allocation processing can be performed.

【０１０４】また、本発明の請求項１６にかかる音声符
号化方式によれば、上記請求項１４記載の音声符号化方
式において、上記各グループに属する信号レベルを、入
力デジタルオーディオ信号に線形変換を施して得られる
スペクトル信号レベルとしたので、効果的なビット割り
当て処理を行うことができるという効果がある。According to a sixteenth aspect of the present invention, in the voice encoding method of the fourteenth aspect , the signal levels belonging to each group are linearly converted into an input digital audio signal. Obtained by applying
Since the spectral signal level, there is an effect that it is possible to perform effective bit allocation process.

【０１０５】また、本発明の請求項１７にかかる音声符
号化方式によれば、上記請求項１４記載の音声符号化方
式において、上記各グループに属する信号レベルを、所
定の各帯域毎の最小可聴限界値としたので、上記人間が
聴く際に意味の有る効果的なビット割り当て処理を行う
ことができるという効果がある。[0105] Further, according to the speech encoding method according to claim 17 of the present invention, in the audio coding method of the claim 14, the signal level belonging to each group, where
Since the minimum audible limit value for each fixed band is set,
There is an effect that it is possible to perform an effective bit allocation process that makes sense when listening .

【０１０６】また、本発明の請求項１８にかかる音声符
号化方式によれば、上記請求項３,９, １４のいずれか
に記載の音声符号化方式において、上記高効率ビット割
り当て方法を用いた処理は、所定の聴覚心理モデルに基
づく信号対マスク比値の関係を使用して行われ、上記低
負荷ビット割り当て方法を用いた処理は、複数の周波数
帯域に分割された信号レベルに各帯域毎の所定の最小可
聴限界値を加味して行われるものとしたので、全体とし
て人間の耳で聞いた限りは音質を損なうことなく、シス
テムの処理量を軽減することができるという効果があ
る。Further, according to the speech coding system of claim 18 of the present invention , any one of claims 3, 9 and 14 above .
In the speech coding method described in Section 1,
The processing using the assignment method is based on a specified psychoacoustic model.
The signal-to-mask ratio value relationship
Processing using the load bit allocation method is
The signal level divided into bands has a predetermined minimum value for each band.
Since it was supposed to be done with consideration of the listening limit value,
As long as it is heard by the human ear,
There is an effect that the processing amount of the system can be reduced .

【０１０７】また、本発明の請求項１９にかかる音声符
号化方式は、上記請求項１８記載の音声符号化方式にお
いて、上記聴覚心理モデルがＭＰＥＧによって指定され
た聴覚心理モデルとすることにより、ＭＰＥＧ（Motion
Picture Experts Group）を用いた音声符号化処理にお
いても上記同様の効果を得ることができるという効果が
ある。According to a nineteenth aspect of the present invention, in the speech encoding system according to the eighteenth aspect , the auditory psychological model is designated by MPEG.
It is possible to use MPEG (Motion
For audio coding processing using Picture Experts Group)
However, there is an effect that the same effect as described above can be obtained .

【０１０８】また、本発明の請求項２０にかかる音声符
号化方式は、上記請求項４または請求項７記載の音声符
号化方式において、上記オーディオ信号に復号可能な最
小単位であるフレームが、ＭＰＥＧ（Motion Picture E
xperts Group）によって指定されたフレームとすること
により、ＭＰＥＧを用いた音声符号化処理においても上
記同様の効果を得ることができるという効果がある。The speech coding system according to claim 20 of the present invention is the speech coding system according to claim 4 or claim 7, wherein the audio signal can be decoded into the audio signal.
A frame that is a small unit is MPEG (Motion Picture E).
xperts Group)
As a result, the same effect as described above can be obtained even in the audio encoding process using MPEG .

【０１０９】また、本発明の請求項２１にかかる音声符
号化方式によれば、上記請求項１記載の音声符号化方式
において、上記ビット割り当て手段は、分割化された各
帯域に対し、所定の聴覚心理モデルから出力される情報
に基づいてビット割り当て情報を生成するものであり、
N(N＝１，２，３...)フレームに１度、上記所定の聴覚
心理モデルから出力される情報に基づいてビット割り当
て情報を生成し、上記ビット割り当て情報を生成しなか
ったフレームに対しては、上記聴覚心理モデルから出力
された情報と上記分割された各帯域の信号情報に基づい
てビット割り当て情報を生成し、符号化を行なうように
したので、時間軸方向でのＣＰＵ負荷を低減することが
できるという効果がある。[0109] According to the speech coding system of claim 21 of the present invention, in the speech coding system of claim 1 , the bit allocation means is divided into each of the divided parts.
Information output from the specified psychoacoustic model for the band
To generate bit allocation information based on
Once in N (N = 1, 2, 3 ...) frames, the specified hearing
Bit allocation based on the information output from the psychological model
To generate the information and not the above bit allocation information.
Output from the above psychoacoustic model
Based on the divided information and the signal information of each divided band above
To generate bit allocation information and encode it.
Therefore, there is an effect that the CPU load in the time axis direction can be reduced .

【０１１０】また、本発明の請求項２２にかかる音声符
号化方式によれば、上記請求項１記載の音声符号化方式
において、段階的に処理量の制御可能な聴覚心理モデル
を有し、外部からの制御情報に基づいて、上記聴覚心理
モデルの処理量制御を行ない、所定の処理量の聴覚心理
モデルを用いて処理がなされるように、各帯域に対する
ビット割り当て情報を生成するようにしたので、聴覚的
な効果を加味したＣＰＵ負荷制御を行うことができると
いう効果がある。According to the speech coding system of claim 22 of the present invention, in the speech coding system of claim 1, the psychoacoustic model in which the processing amount can be controlled step by step.
Based on the control information from the outside,
By controlling the processing amount of the model, the auditory psychology of a predetermined processing amount
For each band so that it can be processed using the model
Since the bit allocation information is generated,
There is an effect that CPU load control can be performed in consideration of various effects .

【０１１１】また、本発明の請求項２３にかかる音声符
号化方式によれば、上記請求項１記載の音声符号化方式
において、それぞれ処理量の異なる聴覚心理モデルを複
数有し、外部からの制御情報に基づいて、上記複数の聴
覚心理モデルの中から、所定の聴覚心理モデルを用いて
処理が成されるように、使用する聴覚心理モデルを切り
替えて、各帯域に対するビット割り当て情報を生成する
ようにしたので、より簡単に聴覚的な効果を加味したＣ
ＰＵ負荷制御を行うことができるという効果がある。Further, according to the speech coding method of claim 23 of the present invention, in the speech coding method of claim 1, a plurality of psychoacoustic models having different processing amounts are combined.
The number of the above-mentioned multiple listenings based on control information from the outside.
From the psychoacoustic model, using a predetermined auditory psychological model
Cut the psychoacoustic model used so that it can be processed.
Instead, since the bit allocation information for each band is generated, it is easier to add the auditory effect to C.
There is an effect that PU load control can be performed.

【０１１２】また、本発明の請求項２４にかかる音声符
号化方式によれば、上記請求項１記載の音声符号化方式
において、符号化処理が実行される中央演算処理装置の
性能に応じて、符号化処理動作前の初期化時に複数のビ
ット割り当て手段、または複数の聴覚心理モデルの各処
理負荷値情報を外部へ出力するようにしたので、実際に
符号化を行う前に、使用されるＣＰＵの性能に関する情
報を取得することができ、ＣＰＵの処理負荷を効率的に
低減することができるという効果がある。According to a twenty-fourth aspect of the speech coding system of the present invention, in the speech coding system according to the first aspect, the central processing unit for performing the coding process is
Depending on the performance, multiple bytes may be used during initialization before the encoding processing operation.
Allocation means, or each location of multiple psychoacoustic models.
Since the physical load value information is output to the outside, actually
Before encoding, information about the performance of the CPU used
Information can be obtained and the processing load of the CPU can be efficiently
There is an effect that it can be reduced .

【０１１３】また、本発明の請求項２５にかかる音声符
号化方式によれば、上記請求項２４記載の音声符号化方
式において、外部への情報として出力される複数のビッ
ト割り当て手段、または複数の聴覚心理モデルの各処理
負荷値情報が、降順、あるいは昇順で出力されるように
したので、符号化処理手段の選択を迅速に行うことがで
きるという効果がある。According to a twenty-fifth aspect of the present invention, there is provided a voice encoding method according to the twenty-fourth aspect.
In an expression, multiple bits output as information to the outside
Allocation means or each processing of multiple psychoacoustic models
Load value information is now output in descending or ascending order
Since the effect is Ru Oh that it is possible to select the encoding means quickly.

【０１１４】また、本発明の請求項２６にかかる音声符
号化装置によれば、上記請求項１ないし請求項２５のい
ずれかに記載の音声符号化方式を用いて音声符号化を行
うようにしたので、該音声符号化方式を組み込んだＶＴ
Ｒカメラなどの機器においても上記同様の効果を得るこ
とができるという効果がある。According to a twenty-sixth aspect of the speech coding apparatus of the present invention, the speech coding apparatus according to any one of the first to twenty-fifth aspects can be used.
Line speech encoding using a speech coding method of Zurekani wherein
Therefore, the VT incorporating the voice encoding system
The same effect as described above can be obtained in a device such as an R camera .

【０１１５】また、本発明の請求項２７にかかるデータ
記憶媒体は、上記請求項１ないし請求項２５のいずれか
に記載の音声符号化方式のステップが記録されているの
で、該記憶媒体を用いて上記音声符号化方式を装置に組
み込むことにより、上記同様の効果を得ることができる
という効果がある。The data according to claim 27 of the present invention
The storage medium is any one of claims 1 to 25.
The steps of the speech encoding method described is recorded in the
Then, by using the storage medium, the above audio coding method is set in the device.
By incorporating, there is an effect that the same effect as described above can be obtained .

【０１１６】[0116]

【０１１７】[0117]

【０１１８】[0118]

【０１１９】[0119]

【０１２０】[0120]

【０１２１】[0121]

【０１２２】[0122]

【０１２３】[0123]

[Brief description of drawings]

【図１】本発明の実施の形態１による音声符号化方式を
用いた音声符号化装置を実現するために、パーソナルコ
ンピュータを用いた全体的なシステムとしての構成を示
すブロック図である。FIG. 1 is a block diagram showing a configuration of an entire system using a personal computer in order to realize a voice encoding device using a voice encoding method according to a first embodiment of the present invention.

【図２】上記実施の形態１による音声符号化装置を構成
する符号化器の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of an encoder that constitutes the speech encoding apparatus according to the first embodiment.

【図３】上記符号化器を構成する高帯域符号化処理手段
の詳細な構成を示すブロック図である。FIG. 3 is a block diagram showing a detailed configuration of high-band encoding processing means that constitutes the encoder.

【図４】上記実施の形態１による音声符号化装置を構成
する符号化器のより詳細な構成を示すブロック図であ
る。FIG. 4 is a block diagram showing a more detailed configuration of an encoder that constitutes the speech encoding apparatus according to the first embodiment.

【図５】上記実施の形態１による音声符号化方式で使用
する、各グループに対するビット割り当て処理の一例を
示した模式図である。FIG. 5 is a schematic diagram showing an example of a bit allocation process for each group, which is used in the speech coding method according to the first embodiment.

【図６】上記実施の形態１による音声符号化方式で使用
する、各グループに対するビット割り当て処理の他の一
例を示した模式図である。FIG. 6 is a schematic diagram showing another example of the bit allocation process for each group used in the speech coding system according to the first embodiment.

【図７】上記実施の形態１による音声符号化装置を構成
する符号化器の符号化動作を説明するためのフローを示
す図である。FIG. 7 is a diagram showing a flow for explaining an encoding operation of an encoder forming the speech encoding apparatus according to the first embodiment.

【図８】上記実施の形態１による音声符号化方式で使用
する、各グループに対するビット割り当て処理の、閾値
を用いて処理を行う例を示した模式図である。FIG. 8 is a schematic diagram showing an example of performing a bit allocation process for each group using a threshold, which is used in the speech coding method according to the first embodiment.

【図９】本発明の上記実施の形態１による音声符号化装
置を構成する符号化器の変形例の詳細な構成を示すブロ
ック図である。[Fig. 9] Fig. 9 is a block diagram illustrating a detailed configuration of a modified example of the encoder forming the speech encoding apparatus according to the first embodiment of the present invention.

【図１０】本発明の実施の形態２によるデータ記憶媒体
及び該記憶媒体を用いて音声符号化装置を構成する場合
の構成を示すブロック図である。FIG. 10 is a block diagram showing a data storage medium according to a second embodiment of the present invention and a configuration when a speech encoding apparatus is configured using the storage medium.

【図１１】従来の音声符号化装置を構成する符号化器の
構成を示すブロック図である。[Fig. 11] Fig. 11 is a block diagram illustrating a configuration of an encoder that configures a conventional speech encoding device.

【図１２】本発明の実施の形態３による音声符号化装置
を構成する低帯域符号化処理手段の詳細な構成を示す図
である。[Fig. 12] Fig. 12 is a diagram illustrating a detailed configuration of low-band encoding processing means that constitutes a speech encoding apparatus according to Embodiment 3 of the present invention.

【図１３】上記実施の形態３による音声符号化装置によ
る低帯域符号化時の各フレームにおける聴覚心理モデル
の状態を説明するための図である。FIG. 13 is a diagram for explaining a state of a psychoacoustic model in each frame at the time of low-band encoding by the speech encoding apparatus according to the third embodiment.

【図１４】本発明の実施の形態４による音声符号化装置
を構成する低帯域符号化処理手段の詳細な構成を示す図
である。[Fig. 14] Fig. 14 is a diagram illustrating a detailed configuration of low-band encoding processing means that constitutes a speech encoding apparatus according to Embodiment 4 of the present invention.

【図１５】上記実施の形態４による音声符号化装置を用
いたビット割り当て処理の一例を示す図である。FIG. 15 is a diagram showing an example of a bit allocation process using the speech coding apparatus according to the fourth embodiment.

【図１６】本発明の実施の形態５による音声符号化装置
を構成する符号化器の構成を示すブロック図である。[Fig. 16] Fig. 16 is a block diagram showing the structure of an encoder forming the speech coding apparatus according to Embodiment 5 of the present invention.

【図１７】音声信号とともに映像信号を取り扱う場合の
符号化器の構成を示すブロック図である。[Fig. 17] Fig. 17 is a block diagram illustrating a configuration of an encoder when a video signal is handled together with an audio signal.

【図１８】時間／周波数変換方式のコーディングを行う
符号化処理装置における符号化処理において本発明を適
用した場合の構成を示すブロック図である。[Fig. 18] Fig. 18 is a block diagram illustrating a configuration in the case where the present invention is applied to an encoding process in an encoding device that performs time / frequency conversion system coding.

[Explanation of symbols]

１パーソナルコンピュータ（音声符号化装置）１１ＨＤＤ１２ａＰＤＤ１２ｂＦＤＤ１３メモリ１４ＣＰＵ（中央演算処理装置）１５データバス１６ビデオキャプチャーカード１７カメラ１８サウンドカード１９マイク２０符号化器２１ＣＰＵ負荷監視情報２２符号化手段制御手段２３低帯域符号化処理手段２４高帯域符号化処理手段２５ビットストリーム形成処理手段２６符号化モード指定信号１０１符号化器１０２サブバンド分析手段１０３スケールファクタ抽出手段１０４ＦＦＴ手段１０５聴覚心理分析手段１０６量子化／符号化手段１０７補助情報符号化手段１０８ビットストリーム形成手段１０９帯域出力適応ビット割り当て手段１１０聴覚心理モデルビット割り当て手段１１１グループ分け手段１１２ビット割り当て処理制御手段１２１処理量制御情報１６０〜１６２符号化処理手段Ａ〜Ｃ１６３処理負荷値格納バッファ１６４サンプルデータバッファ１７０映像符号化処理手段１７１音声符号化処理手段１７２システムストリーム形成処理手段１８０量子化手段制御手段１８１第１の量子化情報算出手段１８２第２の量子化情報算出手段ＦＣフロッピーディスクケースＦＤフロッピーディスクＤフロッピーディスク本体ＳｅセクタＴｒトラックＣｓコンピュータシステムＦDD フロッピーディスクドライブ 1 Personal computer (speech encoder) 11 HDD 12a PDD 12b FDD 13 memory 14 CPU (Central processing unit) 15 data bus 16 video capture cards 17 cameras 18 sound card 19 microphone 20 encoder 21 CPU load monitoring information 22 Encoding means control means 23 Low Band Coding Processing Means 24 High Band Coding Processing Means 25 bit stream forming processing means 26 Encoding mode designation signal 101 encoder 102 Subband analysis means 103 scale factor extraction means 104 FFT means 105 Auditory psychological analysis means 106 quantization / encoding means 107 auxiliary information coding means 108 bit stream forming means 109 band output adaptive bit allocation means 110 Auditory Psychological Model Bit Allocation Means 111 Grouping means 112-bit allocation processing control means 121 throughput control information 160-162 Encoding processing means A-C 163 Processing load value storage buffer 164 sample data buffer 170 Video coding processing means 171 Speech coding processing means 172 system stream forming processing means 180 Quantization means control means 181 First Quantization Information Calculation Means 182 Second quantized information calculation means FC floppy disk case FD floppy disk D floppy disk body Se sector Tr truck Cs computer system FDD floppy disk drive

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) H03M 7/30 G10L 19/00 ─────────────────────────────────────────────────── ─── Continuation of front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) H03M 7/30 G10L 19/00

Claims

(57) [Claims]

1. A voice coding method in which a digital audio signal is divided into a plurality of frequency bands, and coding is performed for each band, wherein bit allocation information for each of the divided bands is generated and a processing amount is respectively set. A plurality of different bit allocating means, and based on external control information, the bit allocating means to be used is selected from among the plurality of bit allocating means to be used. run the bit allocation switching, performs encoding, as control information from the external, occupied when performing coding
The load value used to represent the amount of processing that can be a central processing unit, based on the load value, the bi in the central processing unit on
Processing amount when encoding is performed using the bit allocation means
Can be occupied by referring to the data table that stores in advance
The above-mentioned bit should not exceed the processing capacity of the central processing unit.
A voice coding method characterized in that the allocation means is selected .

2. The speech encoding system according to claim 1, wherein the load value can be occupied for performing encoding processing.
From the monitoring means for monitoring the throughput of the central processing unit
A speech coding method characterized by using the processing amount control information of .

3. The voice coding system according to claim 1, wherein the bit allocation processing is performed by the bit allocation means.
The bit rate with high efficiency, which can realize high-quality encoded data.
Processing using a highly efficient bit allocation method that allocates
If, compared to the treatment with the high efficiency bit allocation method
A low-load task that allocates bits with a low load and low processing load.
The audio coding method is characterized in that processing is performed using a bit allocation method .

4. The voice coding method according to claim 1, wherein the bit allocation means used in the coding is switched.
Frame, which is the smallest unit that can be decoded into an audio signal.
A voice coding method characterized by being performed in units of frames .

5. The speech coding method according to claim 1 , wherein the sub-band signal of each band is divided into a plurality of frequency bands.
From a predetermined number of subband signals
Groups are divided into groups and each group is
Independent bit allocation processing is performed on each
A voice coding method characterized by generating bit allocation information for a band .

6. The voice coding method according to claim 5, wherein the grouping is performed continuously in the number of groups or in the frequency axis direction within the group.
The number of sub-band signals
Specified, or the processing amount control information from the above monitoring means.
Variable so that the specified number is based on the report
Speech coding, characterized in that it is.

7. The voice encoding system according to claim 6, wherein the change in the number of subband signals is restored to an audio signal.
A voice coding method that is performed in frame units, which is the smallest unit that can be signaled .

8. The voice coding method according to claim 7 , wherein a bit is not assigned at the time of grouping.
A speech coding method characterized in that at least one group is provided .

9. The voice coding system according to claim 5, wherein the subband signals are grouped into a low band by grouping.
Sub-band signals grouped into
However, the processing using the above high-efficiency bit allocation method
Bit assignment to the subband signals in the group.
On the other hand, it is divided into groups that belong to the high band.
Low load bit allocation method for sub-band signals
To the subband signals in the group by
Speech coding method characterized by bit allocation
Expression .

10. The voice coding system according to claim 5 , wherein a bit allocation unit independent for each group is allocated.
Assignable bit operation that determines the number of assignable bits
Providing means and ratio of each group to the whole group
, Weights based on the characteristics of each band belonging to each group
Using the one with the addition,
Bit of a hit can be the number of bits, was independent for each of the groups Ri
Speech coding method characterized by assigning to allocation means .

11. The speech coding method according to claim 10 , wherein weighting is performed based on a characteristic of each band belonging to each group.
Is weighted based on a predetermined minimum audible limit for each band.
Speech coding characterized by Ketosu Rukoto.

12. The audio encoding method according to claim 10 , wherein weighting based on characteristics of each band belonging to each group is performed on the input digital audio signal by subband analysis.
Subs of each frequency band belonging to each group
A speech coding method characterized by weighting based on band signal levels .

13. The audio encoding method according to claim 10 , wherein weighting based on characteristics of each band belonging to each group is obtained by linearly converting an input digital audio signal.
It is, speech coding, characterized in that a weighting based on spectrum signal levels belonging to each group.

14. The speech coding method according to claim 5 , wherein the signal level belonging to each group is equal to or higher than a predetermined threshold value.
The high-efficiency bit allocation for
The bit level is assigned by the processing using the method described above, and the signal level belonging to each group is below a predetermined threshold.
The low load bit allocation above is applied to the low level signal of
A voice coding method characterized in that bit allocation is performed by processing using this method.

15. The speech encoding system of claim 14, the signal level belonging to each group, the input Dejitaruo
Sub-band obtained by sub-band analysis of audio signal
A voice coding method characterized by setting the audio signal level .

16. The speech encoding system of claim 14, wherein the signal levels that belong to each group, a spectrum signal <br/> No. levels obtained by applying a linear transformation to the input digital audio signal And audio coding method.

17. The voice encoding system according to claim 14 , wherein the signal levels belonging to each of the groups are set for each predetermined band.
A speech coding method characterized by setting the minimum audible limit value of .

18. The speech coding method according to claim 3, wherein the processing using the high-efficiency bit allocation method is a predetermined method.
Use signal-to-mask ratio value relationship based on psychoacoustic model
The process using the above low-load bit allocation method
The signal level divided into frequency bands has a predetermined value for each band.
A speech coding method that is performed with the minimum audible limit taken into consideration .

19. The audio encoding system according to claim 18, wherein the psychoacoustic model is MPEG (Motion Picture Exp).
erts group) is a psychoacoustic model specified by the audio coding method.

20. The voice encoding system according to claim 4 or 7, wherein the audio signal is a minimum unit that can be decoded into a frame.
The audio coding method is characterized in that the frame is a frame specified by MPEG (Motion Picture Experts Group).

21. The voice encoding system according to claim 1, wherein the bit allocation means is provided for each divided band.
Based on the information output from the specified psychoacoustic model
Bit allocation information is generated by using the above-mentioned predetermined auditory information once in every N (N = 1, 2, 3 ...)
Bit allocation based on the information output from the psychological model
Information for each frame that did not generate the above bit allocation information.
The information output from the above psychoacoustic model and the above
Bit allocation based on the signal information of each divided band
A speech coding method characterized by generating information and coding it.

22. The audio encoding method according to claim 1 , further comprising a psychoacoustic model whose processing amount can be controlled in stages, and the psychoacoustic model of the psychoacoustic model is generated based on control information from the outside.
By controlling the amount of processing, a psychoacoustic model with a predetermined amount of processing
Bit allocation for each band so that
A speech coding method characterized by generating allocation information .

23. The speech coding method according to claim 1, wherein a plurality of psychoacoustic models each having a different processing amount are provided , and a predetermined psychoacoustic model is selected from the plurality of psychoacoustic models based on control information from the outside . as treatment with hearing psychological model is formed <br/>, switch the psychoacoustic model used,
A speech coding method characterized by generating bit allocation information for each band.

24. The speech coding system according to claim 1, wherein the coding processing is executed according to the performance of a central processing unit.
Multiple bit allocations during initialization before encoding operation.
Method or each processing load value information of multiple psychoacoustic models
A voice coding method characterized by outputting information to the outside .

25. The voice encoding system according to claim 24,
And multiple bit assigners that are output as external information.
Step or multiple processing load value information of the psychoacoustic model
Are output in descending order or in ascending order .

26. claims 1 to any one of claims 25
Speech coding apparatus and performing speech encoding using a speech coding method according to.

27. Any one of claims 1 to 25.
Data recording medium which step of the speech coding method is characterized and this <br/> recorded according to.