JP2010136037A

JP2010136037A - Video code amount control method, video-encoding device, video code amount control program, and recording medium for the program

Info

Publication number: JP2010136037A
Application number: JP2008309277A
Authority: JP
Inventors: Takeshi Nakamura; 健中村; Atsushi Shimizu; 淳清水; Ryuichi Tanida; 隆一谷田; Noboru Harada; 登原田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-12-04
Filing date: 2008-12-04
Publication date: 2010-06-17
Anticipated expiration: 2028-12-04
Also published as: JP4755239B2

Abstract

<P>PROBLEM TO BE SOLVED: To enhance subjective picture quality, while making variation in the picture quality quiet as a whole by utilizing surplus code amount of sound in video encoding. <P>SOLUTION: In a video-encoding device, a surplus code amount calculation part 131, when determining a target code amount of a GOP to be encoded, calculates the sound surplus code amount allowed to be utilized for the encoding of the GOP from a sound generation code amount. A condition determining section 132 determines the utilization condition of a predetermined surplus code amount; and when the calculated surplus code amount agrees with the condition, a GOP target code amount calculating part 133 adds all or a part of the sound surplus code amount to an original GOP target code amount. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は，音声ストリームのビットレートと映像ストリームのビットレートとの和が所定のビットレートの範囲内になるように，音声の発生符号量に応じて映像の符号量を制御する映像符号量制御方法，映像符号化装置，映像符号量制御プログラムおよびその記録媒体に関するものである。 The present invention provides video code amount control for controlling the video code amount in accordance with the generated code amount of audio so that the sum of the bit rate of the audio stream and the bit rate of the video stream falls within a predetermined bit rate range. The present invention relates to a method, a video encoding device, a video code amount control program, and a recording medium thereof.

一般に，映像ストリームと，音声やデータなどの他のストリームとを多重化して送るような伝送システムでは，映像は，他のストリームとは独立にビットレート制御を行っていた（非特許文献１，２参照）。 In general, in a transmission system in which a video stream and other streams such as audio and data are multiplexed and transmitted, the video is subjected to bit rate control independently of other streams (Non-Patent Documents 1 and 2). reference).

一般に映像・音声符号化では，複雑で変化の多い入力信号に対しては符号量が多くなり，単調で変化の少ない入力信号に対しては符号量が少なくなる。 In general, in video / audio coding, the amount of code increases for an input signal that is complex and changes a lot, and the amount of code decreases for an input signal that is monotonous and changes little.

ロッシー符号化ストリームは，固定ビットレート（ＣＢＲ：Constant Bit Rate ）で伝送するために，信号の劣化の程度を変化させることにより，発生符号量を所定の範囲内に収めるように制御する。一方，ロスレス符号化ストリームでは，発生符号量の変化を制御することができないため，一般に可変ビットレート（ＶＢＲ：Variable Bit Rate ）で伝送される。
ISO/IEC 13818-2 Annex C, ITU-T Recomendation H.264 Annex C 映像情報メディア学会編，“総合マルチメディア選書ＭＰＥＧ”，社団法人映像情報メディア学会，（株）オーム社発行，1996.4.20 Since the lossy coded stream is transmitted at a constant bit rate (CBR), the generated code amount is controlled to fall within a predetermined range by changing the degree of signal degradation. On the other hand, in a lossless encoded stream, since the change in the amount of generated code cannot be controlled, it is generally transmitted at a variable bit rate (VBR).
ISO / IEC 13818-2 Annex C, ITU-T Recomendation H.264 Annex C The Institute of Image Information and Television Engineers, “Multi-media selection MPEG”, The Institute of Image Information and Television Engineers, published by Ohm Corporation, 1996.4.20

ＭＰＥＧ−４ＡＬＳのようなロスレス音声符号化ストリームと映像符号化ストリーム（以下，符号化ストリームを単に「ストリーム」ともいう）とを多重化する場合，音声伝送ビットレートの変動が映像伝送ビットレートに比べて無視できないほど大きいことが多い。 When a lossless audio encoded stream such as MPEG-4 ALS and a video encoded stream (hereinafter, the encoded stream is also simply referred to as “stream”) are multiplexed, the fluctuation of the audio transmission bit rate is changed to the video transmission bit rate. It is often too large to ignore.

したがって，映像と音声の多重化ストリームにおいて，固定で割り当てられていた音声のビットレートのうち，音声符号化データの発生符号量が小さい余剰符号量分を映像符号化に利用することができれば，ビットレートに無駄が生じることがなく，画質の向上に寄与すると考えられる。 Therefore, in the multiplexed stream of video and audio, out of the fixed bit rate of audio, if the surplus code amount with small generated code amount of audio encoded data can be used for video encoding, The rate is not wasted, and it is thought that it contributes to the improvement of image quality.

しかし，このように音声符号化データの発生符号量の変動分を，映像符号化に有効に利用することを考えた場合，映像の全体としての画質は向上するものの，音声符号化データの発生符号量が多い場面では画質の劣化が大きく，音声符号化データの発生符号量が少ない場面では画質が鮮明になり過ぎることがあり，そのため音声符号化データの変動に応じて画質の変動が目立ってしまうことがあるという，新たな問題が生じる。 However, considering that the fluctuation of the generated code amount of the audio encoded data is effectively used for video encoding in this way, the image quality of the entire video is improved, but the generated code of the audio encoded data is improved. Image quality is greatly degraded in scenes with a large amount of data, and image quality may be too clear in scenes with a small amount of generated code of speech encoded data. Therefore, image quality fluctuations become conspicuous according to changes in speech encoded data. A new problem arises.

図９は，本発明の課題を説明する図である。従来，符号化ビットレートが固定の音声符号化ストリームと映像符号化ストリームを多重化する場合，図９（Ａ）に示すように，映像は，音声のストリームとは独立にビットレート制御が行われていた。なお，ＧＯＰ（Group Of Pictures ）は，一般に十数フレームから数十フレームのピクチャ群からなる映像符号化データの単位であり，ここで，ＧＯＰ（ｎ）はｎ番目のＧＯＰを表している。 FIG. 9 is a diagram illustrating the problem of the present invention. Conventionally, when an audio encoded stream and a video encoded stream with a fixed encoding bit rate are multiplexed, as shown in FIG. 9A, the video is bit-rate controlled independently of the audio stream. It was. Note that GOP (Group Of Pictures) is a unit of video encoded data generally composed of a group of pictures of dozens to tens of frames, where GOP (n) represents the nth GOP.

ロスレス音声符号化ストリームの場合，音声の発生符号量は変動し，音声符号化データのデータ量が，与えられた音声ビットレートで送信可能なデータ量よりもかなり少なくなることがある。このビットレートの余裕分の符号量を，ここでは余剰符号量という。図９（Ｂ）に斜線を付して示している部分が余剰符号量である。 In the case of a lossless audio encoded stream, the generated code amount of audio fluctuates, and the data amount of audio encoded data may be considerably smaller than the data amount that can be transmitted at a given audio bit rate. The amount of code for this bit rate margin is referred to as surplus code amount here. The portion indicated by hatching in FIG. 9B is the surplus code amount.

図９（Ｃ）は，図９（Ｂ）に示した音声の余剰符号量を映像符号化データの伝送に利用することを考えた場合の音声ビットレートと映像ビットレートとの関係を示している。音声の余剰符号量を映像符号化データの伝送に利用することにより，映像符号化時におけるＧＯＰ単位の目標符号量を，元の映像ビットレートによる目標符号量よりも大きくすることができる。 FIG. 9C shows the relationship between the audio bit rate and the video bit rate when it is considered to use the audio surplus code amount shown in FIG. 9B for transmission of video encoded data. . By using the surplus audio code amount for transmission of video encoded data, the target code amount in GOP units during video encoding can be made larger than the target code amount based on the original video bit rate.

図９（Ｃ）に示すように，音声の余剰符号量分を映像符号化データの伝送に利用したとすると，映像の発生符号量を増加させることができるので，映像の画質が向上すると考えられるが，音声ビットレートの変動に応じて映像ビットレートが変動することになり，画質の変動が目につく可能性があるという，新たな問題が発生する。 As shown in FIG. 9C, if the extra audio code amount is used for transmission of video encoded data, the generated code amount of video can be increased, so that the image quality of the video is improved. However, the video bit rate fluctuates according to the fluctuation of the audio bit rate, and a new problem that the fluctuation of the image quality may be noticeable occurs.

本発明は，上記課題の解決を図り，音声の余剰符号量分を映像符号化に有効に利用し，画質を向上させるとともに，全体として画質の変動を目立たせなくすることを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and to effectively use the excess code amount of audio for video coding, improve the image quality, and make the fluctuation of the image quality inconspicuous as a whole.

本発明は，上記課題を解決するため，ロスレス音声符号化ストリームやデータストリーム等の符号化ビットレートを制御できないストリームと，ロッシー映像符号化ストリームとをＣＢＲに多重化する場合に，次のように音声の発生符号量に応じて映像の符号量を制御する。 In order to solve the above-described problem, the present invention provides the following when multiplexing a lossy audio encoded stream, a data stream, etc., which cannot control the encoding bit rate, and a lossy video encoded stream into CBR as follows: The code amount of the video is controlled according to the generated code amount of the audio.

（１）映像符号化時の目標符号量と発生符号量の差，量子化ステップサイズ，アクティビティなどのパラメータが，所定の条件に合致した場合にのみ，音声の余剰符号量を映像符号化に利用する。 (1) The excess audio code amount is used for video encoding only when the parameters such as the difference between the target code amount and the generated code amount at the time of video encoding, the quantization step size, and the activity meet the predetermined conditions. To do.

（２）映像符号化に利用可能な音声の余剰符号量は，その時点で符号化済みの音声符号化データの余剰符号量や伝送タイミングから求める。また，余剰符号量の利用判断，利用量を決めるタイミングは，ＧＯＰの先頭またはフレーム単位とする。 (2) The amount of surplus audio that can be used for video encoding is obtained from the amount of surplus code of the encoded audio data already encoded at that time and the transmission timing. In addition, the use determination of the surplus code amount and the timing for determining the use amount are set at the head of the GOP or in units of frames.

（３）映像のＧＯＰ符号化開始時に，ＧＯＰ期間分音声を先行して符号化した場合には，同一ＧＯＰ期間の，そうでない場合には直前のＧＯＰ期間の音声の余剰符号量分を求める。余剰符号量の利用が決定された場合，その余剰符号量を上限とする。画質改善のために必要な符号量は，量子化ステップサイズ，アクティビティから計算する。 (3) At the start of video GOP encoding, if the audio for the GOP period is encoded in advance, the excess code amount of the audio for the same GOP period, otherwise the immediately preceding GOP period is obtained. When the use of the surplus code amount is determined, the surplus code amount is set as the upper limit. The amount of code required for image quality improvement is calculated from the quantization step size and activity.

以上の点を踏まえて，本発明は，次のように映像の符号量を制御する。まず，符号化済みの音声発生符号量から映像符号化制御単位（例えばＧＯＰまたはフレーム）の期間における音声の余剰符号量を算出する。その余剰符号量を映像符号化制御単位における目標符号量に加えるかどうかの，あらかじめ定められた映像の発生符号量に影響する映像の特徴または符号化の条件を判定する。条件が満たされる場合には，余剰符号量の一部または全部を映像符号化制御単位における目標符号量に加え，条件が満たされない場合には，余剰符号量を映像符号化制御単位における目標符号量に加えないで，次に符号化する映像符号化制御単位における目標符号量を決定する。決定された目標符号量に従って，映像符号化制御単位における映像信号を符号化する。 Based on the above points, the present invention controls the video code amount as follows. First, a surplus audio code amount in a period of a video encoding control unit (for example, GOP or frame) is calculated from the encoded audio generation code amount. A video feature or a coding condition that affects a predetermined video generated code amount is determined as to whether or not the surplus code amount is added to the target code amount in the video coding control unit. If the condition is satisfied, part or all of the surplus code amount is added to the target code amount in the video coding control unit. If the condition is not satisfied, the surplus code amount is added to the target code amount in the video coding control unit. In addition, the target code amount in the video encoding control unit to be encoded next is determined. According to the determined target code amount, the video signal in the video encoding control unit is encoded.

前記条件として，例えば次のような条件のいずれかを用いる実施が好適である。
・過去の所定数の映像フレームの平均量子化パラメータが所定の閾値以上であるという条件，
・過去の所定数の映像フレームの平均量子化ステップが所定の閾値以上であるという条件，
・これから符号化する映像符号化制御単位のアクティビティ平均値が所定の閾値以上であるという条件，
・これから符号化する映像符号化制御単位にシーンチェンジがあるという条件，
・過去の所定数の映像フレームの目標符号量と発生符号量との差の総和が所定の閾値以上であるという条件。 For example, it is preferable to use one of the following conditions as the condition.
A condition that the average quantization parameter of a predetermined number of past video frames is equal to or greater than a predetermined threshold;
A condition that the average quantization step of a predetermined number of video frames in the past is equal to or greater than a predetermined threshold;
-The condition that the average activity value of the video encoding control unit to be encoded is equal to or greater than a predetermined threshold,
-The condition that there is a scene change in the video encoding control unit to be encoded from now on,
A condition that the sum of the differences between the target code amount and the generated code amount of a predetermined number of past video frames is equal to or greater than a predetermined threshold.

映像符号化に利用する音声の余剰符号量の算出方法としては，次のような方法を用いることができる。
・音声の余剰符号量を，映像符号化制御単位と同一表示期間の音声符号化制御単位群における，各音声符号化制御単位の所定の最大発生符号量と実際の発生符号量との差の総和から算出する方法，
・音声の余剰符号量を，映像符号化制御単位より所定の映像フレーム数だけ前の映像符号化制御単位と同じ長さの期間の映像フレーム群と同一表示期間の音声符号化制御単位群における，各音声符号化制御単位の所定の最大発生符号量と実際の発生符号量との差の総和から算出する方法。 The following method can be used as a method for calculating the excess code amount of audio used for video encoding.
The sum of the difference between the predetermined maximum generated code amount of each audio encoding control unit and the actual generated code amount in the audio encoding control unit group in the same display period as the video encoding control unit To calculate from
In the audio encoding control unit group having the same display period as the video frame group having the same length as that of the video encoding control unit preceding the video encoding control unit by a predetermined number of video frames, A method of calculating from a sum of differences between a predetermined maximum generated code amount and an actual generated code amount in each speech encoding control unit.

本発明は，映像符号化データと，映像符号化データ以外の符号化データとを多重化して伝送する場合に，前述した音声符号化データの余剰符号量と同様に，映像符号化データ以外の符号化データの余剰符号量をもとに，追加目標符号量を決定し，映像符号化の符号量を制御することもできる。 In the present invention, when video encoded data and encoded data other than video encoded data are multiplexed and transmitted, codes other than video encoded data are transmitted in the same manner as the excess code amount of audio encoded data described above. It is also possible to determine the additional target code amount based on the surplus code amount of the encoded data and control the code amount of video encoding.

本発明によれば，映像と音声とを多重化した符号化ストリームが所定のビットレートの範囲内になるように映像を符号化する場合に，映像の符号化が困難なシーンでは画質を向上させ，全体としては画質の変動を目立たせなくすることで，主観的な画質を向上させることができるようになる。 According to the present invention, when video is encoded so that an encoded stream obtained by multiplexing video and audio is within a predetermined bit rate range, the image quality is improved in a scene where video encoding is difficult. As a whole, subjective image quality can be improved by making image quality fluctuations less noticeable.

以下，本発明の実施の形態を図面を用いながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は，本発明の実施例に係る装置の構成例を示す図である。映像符号化装置１０は，映像・音声重畳信号のうち映像信号を入力し，映像信号を符号化して映像ストリームを出力する。音声符号化装置２０は，映像・音声重畳信号のうち音声信号を入力し，音声ストリームを出力する。多重化部３０は，音声ストリームと映像ストリームとを所定のビットレートとなるように多重化して出力する。 FIG. 1 is a diagram illustrating a configuration example of an apparatus according to an embodiment of the present invention. The video encoding device 10 receives a video signal from the video / audio superimposed signal, encodes the video signal, and outputs a video stream. The audio encoding device 20 inputs an audio signal among the video / audio superimposed signals and outputs an audio stream. The multiplexing unit 30 multiplexes and outputs the audio stream and the video stream so as to have a predetermined bit rate.

本実施例において，音声符号化装置２０は，フレーム毎もしくは逐次，音声発生符号量を映像符号化装置１０に伝達する機能を持つほかは従来技術と同様に構成できるので，その内部構成についての詳しい説明は省略する。 In this embodiment, the audio encoding device 20 can be configured in the same manner as the prior art except that it has a function of transmitting the audio generation code amount to the video encoding device 10 for each frame or sequentially. Description is omitted.

映像符号化装置１０は，映像信号を入力する映像入力部１１，入力した映像信号を蓄積する映像入力バッファ１２，ＧＯＰ毎の映像の目標符号量を決定する映像ＧＯＰ目標符号化量決定部１３，映像フレーム毎の目標符号量を決定する映像フレーム目標符号量決定部１４，映像信号をフレーム目標符号量に従って符号化する映像フレーム符号化処理部１５，映像ストリームを出力する映像ストリーム出力部１６，映像ストリームを一時的に蓄積する映像ストリーム出力バッファ１７を備える。 The video encoding device 10 includes a video input unit 11 that inputs a video signal, a video input buffer 12 that stores the input video signal, a video GOP target encoding amount determination unit 13 that determines a target code amount of a video for each GOP, Video frame target code amount determination unit 14 that determines a target code amount for each video frame, video frame encoding processing unit 15 that encodes a video signal according to the frame target code amount, video stream output unit 16 that outputs a video stream, and video A video stream output buffer 17 for temporarily storing the stream is provided.

映像ＧＯＰ目標符号量決定部１３は，ＧＯＰ毎に音声発生符号量から映像符号化に利用することができる余剰符号量を算出する余剰符号量算出部１３１と，算出した余剰符号量を映像符号化に利用するかどうかを判定する条件判定部１３２と，条件の判定結果に従ってＧＯＰ目標符号量を算出するＧＯＰ目標符号量算出部１３３とを備える。 The video GOP target code amount determination unit 13 performs a surplus code amount calculation unit 131 that calculates a surplus code amount that can be used for video encoding from the audio generation code amount for each GOP, and video encodes the calculated surplus code amount. And a GOP target code amount calculation unit 133 that calculates a GOP target code amount according to the determination result of the condition.

アクティビティ・シーンチェンジ解析部１８は，入力映像信号のうち符号化に先立って解析される先行解析対象映像データについて，アクティビティ値の算出もしくはシーンチェンジの有無を検出するものであり，後述する実施例３，４で用いられる。他の実施例では省略することができる。 The activity / scene change analysis unit 18 calculates activity values or detects the presence / absence of a scene change in the video data to be analyzed prior to encoding in the input video signal. , 4 are used. In other embodiments, it can be omitted.

〔実施例１〕
図２は，本発明の実施例１の映像符号化処理のフローチャートである。実施例１では，過去のいくつかの数の映像フレームの平均量子化パラメータＱＰ_ave（ｎ）が，ある閾値ＴＨ_qp以上の場合に，音声の余剰符号量を映像符号化に利用する。これは，映像符号化に利用できる符号量の不足により平均量子化パラメータＱＰ_ave（ｎ）がある値以上になると，画質の粗くなる程度が大きくなり，画質向上の必要性が増すとともに，音声の余剰符号量の追加による画質の変動が目立ちにくくなるからである。 [Example 1]
FIG. 2 is a flowchart of the video encoding process according to the first embodiment of the present invention. In the first embodiment, when the average quantization parameter QP _ave (n) of several past video frames is equal to or greater than a certain threshold TH _qp , the surplus audio code amount is used for video coding. This is because when the average quantization parameter QP _ave (n) exceeds a certain value due to a lack of code amount that can be used for video coding, the degree of coarsening of the image increases, the need for improving the image quality increases, This is because the change in image quality due to the addition of the excess code amount is less noticeable.

ｎ番目のＧＯＰ(Group Of Pictures) を，ＧＯＰ（ｎ）とする。映像と音声の符号化は並行して行われる。ただし，本実施例では，ＧＯＰ（ｎ）の映像フレーム符号化開始より前に，ＧＯＰ（ｎ）の映像フレームと同一表示期間の音声フレームの符号化が完了しているものとする。 The nth GOP (Group Of Pictures) is GOP (n). Video and audio are encoded in parallel. However, in this embodiment, it is assumed that encoding of an audio frame having the same display period as that of a GOP (n) video frame is completed before the start of GOP (n) video frame encoding.

まず，映像ＧＯＰ目標符号量決定部１３は，以下のステップＳ１０１〜Ｓ１０５の処理を実行する。音声符号化装置２０から音声発生符号量の情報を受け取り，ＧＯＰ（ｎ）の映像フレームと同一表示期間の音声フレームｊ〜ｊ＋Ｍ−１について，ＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）を，次式によって求める（ステップＳ１０１）。 First, the video GOP target code amount determination unit 13 executes the following steps S101 to S105. The information on the amount of generated speech code is received from the speech encoding device 20, and the surplus amount of code G _r (n) in the GOP (n) period for the speech frames j to j + M−1 in the same display period as the video frame of GOP (n). Is obtained by the following equation (step S101).

Ｇ_r（ｎ）＝Σ_x=j ^j+M-1｛Ｓ_amax−Ｓ_a（ｘ）｝
ここで，Ｍは，ＧＯＰ（ｎ）期間の音声フレーム数，Ｓ_amaxは，各音声フレームの最大発生符号量，Ｓ_a（ｘ）は，ｘ番目の音声フレームの発生符号量である。Ｓ_amaxとしては，あらかじめ定めた値を用いてよい。 G _r (n) = Σx _{= j} ^{j + M−1} {S _amax −S _a (x)}
Here, M is the number of speech frames in the GOP (n) period, S _amax is the maximum generated code amount of each speech frame, and S _a (x) is the generated code amount of the xth speech frame. A predetermined value may be used as _Samax .

次に，過去の数映像フレームの平均量子化パラメータＱＰ_ave（ｎ）が，ある閾値ＴＨ_qp以上かどうかを判定する（ステップＳ１０２）。平均をとる映像フレーム数は任意の数でよく，あらかじめ定められる。平均量子化パラメータＱＰ_ave（ｎ）が，ある閾値ＴＨ_qp以上の場合には，余剰符号量Ｇ_r（ｎ）を超えない範囲で，平均量子化パラメータＱＰ_ave（ｎ）の閾値ＴＨ_qp超過分に比例して，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を決定する（ステップＳ１０３）。この場合のＧ_d（ｎ）は，次式によって求められる。 Next, it is determined whether the average quantization parameter QP _ave (n) of the past several video frames is equal to or greater than a certain threshold TH _qp (step S102). The number of video frames to be averaged may be any number and is determined in advance. If the average quantization parameter QP _ave (n) is greater than or equal to a certain threshold TH _qp , the average quantization parameter QP _ave (n) exceeds the threshold TH _qp within a range not exceeding the surplus code amount G _r (n). The additional GOP target code amount G _d (n) is determined in proportion to (step S103). In this case, G _d (n) is obtained by the following equation.

Ｇ_d（ｎ）＝ｍｉｎ（Ｇ_r（ｎ），ａ・（ＱＰ_ave（ｎ）−ＴＨ_qp））
ｍｉｎ（）は，最小値を返す関数であり，ａは予め定められた定数である。 G _d (n) = min (G _r (n), a · (QP _ave (n) −TH _qp ))
min () is a function that returns a minimum value, and a is a predetermined constant.

一方，平均量子化パラメータＱＰ_ave（ｎ）が，ある閾値ＴＨ_qp以上でない場合には，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を０とする（ステップＳ１０４）。 On the other hand, if the average quantization parameter QP _ave (n) is not equal to or greater than a certain threshold TH _qp , the additional GOP target code amount G _d (n) is set to 0 (step S104).

次に，ＧＯＰ目標符号量Ｇ（ｎ）を，ＧＯＰ内フレーム数Ｎ×基準映像ビットレートＲ／映像フレームレートＰと，追加ＧＯＰ目標符号量Ｇ_d（ｎ）と，前のＧＯＰからの繰越符号量の和で求める（ステップＳ１０５）。ＧＯＰ目標符号量Ｇ（ｎ）は，次のようになる。 Next, the GOP target code amount G (n) is calculated as follows: NOP frame number N × reference video bit rate R / video frame rate P, additional GOP target code amount G _d (n), and carry-over code from the previous GOP The sum is obtained (step S105). The GOP target code amount G (n) is as follows.

Ｇ（ｎ）＝Ｎ・Ｒ／Ｐ＋Ｇ_d（ｎ）＋前のＧＯＰからの繰越符号量
なお，基準映像ビットレートＲとは，音声フレームの発生符号量が常に最大の場合に確保できる映像ビットレートを指す。 Carryover code amount from G (n) = N · R / P + G d (n) + previous GOP Incidentally, the reference image bit rate R, the video bit rate can be ensured when always maximum amount of codes generated in the speech frame Point to.

次に，映像フレーム目標符号量決定部１４および映像フレーム符号化処理部１５は，以下に説明するステップＳ１０６〜Ｓ１０９の処理を，ＧＯＰ（ｎ）内のフレームの符号化が終了するまで繰り返す（ステップＳ１１０）。 Next, the video frame target code amount determination unit 14 and the video frame encoding processing unit 15 repeat the processing of steps S106 to S109 described below until the encoding of the frame in GOP (n) is completed (step S106). S110).

映像フレーム目標符号量決定部１４は，ＧＯＰ（ｎ）の目標符号量をＧＯＰ内の各フレームに対して配分し，次フレームの目標符号量Ｔ（ｉ）を求める（ステップＳ１０６）。ＧＯＰ（ｎ）の目標符号量から，次フレームの目標符号量Ｔ（ｉ）を算出する方法としては，例えば次の参考文献１に開示されている方法などがあり，周知技術であるため，ここでの詳細な説明は省略する。
［参考文献１］International Organisation for Standardisation, Test Model Editing Committee, 1993, Test Model 5, April, ISO-IEC/JTC1/SC29/WG11/N0400
次に，映像フレーム目標符号量決定部１４は，伝送先の映像復号装置におけるデコーダ受信バッファのバッファサイズや転送済みのデータ量から算出したバッファ占有量をもとに，デコーダ受信バッファが破綻しない限界の符号量を求め，目標符号量がその値を超えないようにクリッピングする（ステップＳ１０７）。なお，デコーダ受信バッファのバッファサイズが十分大きい場合や，デコードまでの遅延時間を大きくしてもよい場合には，このクリッピングの処理を省略する実施も可能である。 The video frame target code amount determination unit 14 allocates the target code amount of GOP (n) to each frame in the GOP and obtains the target code amount T (i) of the next frame (step S106). As a method for calculating the target code amount T (i) of the next frame from the target code amount of GOP (n), for example, there is a method disclosed in the following reference 1, and this is a well-known technique. The detailed description in is omitted.
[Reference 1] International Organization for Standardization, Test Model Editing Committee, 1993, Test Model 5, April, ISO-IEC / JTC1 / SC29 / WG11 / N0400
Next, the video frame target code amount determination unit 14 determines the limit that the decoder reception buffer will not fail based on the buffer occupancy calculated from the buffer size of the decoder reception buffer and the transferred data amount in the transmission destination video decoding device. Is clipped so that the target code amount does not exceed the value (step S107). If the buffer size of the decoder reception buffer is sufficiently large, or if the delay time until decoding may be increased, the clipping process can be omitted.

映像フレーム符号化処理部１５は，映像フレーム目標符号量決定部１４が算出した目標符号量に従って，フレームｉを符号化する（ステップＳ１０８）。なお，インターレースの指定があればフィールド単位で符号化するが，説明を簡単にするため，フィールド単位の符号化の場合も単にフレームの符号化として説明する。 The video frame encoding processing unit 15 encodes the frame i according to the target code amount calculated by the video frame target code amount determining unit 14 (step S108). If interlace is specified, encoding is performed in units of fields. However, in order to simplify the description, the case of encoding in units of fields will be described simply as frame encoding.

フレームの符号化が終了すると，映像フレーム目標符号量決定部１４は，ＧＯＰ（ｎ）の目標符号量から発生符号量Ｓ（ｉ）を減算する（ステップＳ１０９）。その後，ＧＯＰ（ｎ）内の全フレームについて符号化が終了したかどうかを判定し，終了していなければ，ステップＳ１０６に戻って，全フレームについて符号化が終了まで同様に処理を繰り返す（ステップＳ１１０）。 When the frame encoding is completed, the video frame target code amount determination unit 14 subtracts the generated code amount S (i) from the target code amount of GOP (n) (step S109). Thereafter, it is determined whether or not encoding has been completed for all frames in GOP (n). If not, the process returns to step S106, and the same processing is repeated until encoding is completed for all frames (step S110). ).

映像ＧＯＰ目標符号量決定部１３は，ＧＯＰ開始時のＧＯＰ目標符号量Ｇ（ｎ）と，ＧＯＰ発生符号量との差から，次のＧＯＰへの繰越符号量を決定する（ステップＳ１１１）。以下，ステップＳ１０１に戻って，次のＧＯＰについて同様に符号化処理を行う。 The video GOP target code amount determination unit 13 determines the carry-over code amount to the next GOP from the difference between the GOP target code amount G (n) at the start of GOP and the GOP generated code amount (step S111). Thereafter, the process returns to step S101, and the encoding process is similarly performed for the next GOP.

〔実施例２〕
図３は，本発明の実施例２の映像符号化処理のフローチャートである。実施例２では，過去のいくつかの数の映像フレームの平均量子化ステップＱ_ave（ｎ）が，ある閾値ＴＨ_q以上の場合に，音声の余剰符号量を映像符号化に利用する。これは，映像符号化に利用できる符号量の不足により平均量子化ステップＱ_ave（ｎ）がある値以上になると，画質の粗くなる程度が大きくなり，画質向上の必要性が増すとともに，音声の余剰符号量の追加による画質の変動が目立ちにくくなるからである。 [Example 2]
FIG. 3 is a flowchart of the video encoding process according to the second embodiment of the present invention. In the second embodiment, when the average quantization step Q _ave (n) of several past video frames is greater than or equal to a certain threshold value TH _q , the excess code amount of speech is used for video coding. This becomes a shortage of code amount available for video encoding more than a certain value average quantization step Q _ave (n), the extent of a rough picture quality is increased, along with the need for improved image quality increases, the voice This is because the change in image quality due to the addition of the excess code amount is less noticeable.

前述した実施例１と実施例２との違いは，音声の余剰符号量を映像符号化に利用するか否かの条件の判定を，平均量子化パラメータＱＰ_ave（ｎ）で行うか平均量子化ステップＱ_ave（ｎ）で行うかという点である。量子化パラメータと量子化ステップの換算式は，符号化規格によって異なるが，一般に「（符号量）＝ａ／（量子化ステップ）＋ｂ」（ａ，ｂは定数）で近似できるため，実施例１より実施例２のほうが追加目標符号量の計算精度がよい。 The difference between the first embodiment and the second embodiment described above is that the condition for determining whether or not to use the audio surplus code amount for video coding is determined by the average quantization parameter QP _ave (n) or the average quantization. This is whether it is performed in step Q _ave (n). Although the conversion formula between the quantization parameter and the quantization step varies depending on the encoding standard, it can be approximated by “(code amount) = a / (quantization step) + b” (a and b are constants) in general. The calculation accuracy of the additional target code amount is better in the second embodiment.

実施例２では，図２に示すフローチャートのうち，ステップＳ１０２〜Ｓ１０４が，図３のステップＳ２０２〜Ｓ２０４に置き換わり，他の部分については同様であるので，以下では，ステップＳ２０２〜Ｓ２０４の部分についてだけ説明する。 In the second embodiment, steps S102 to S104 in the flowchart shown in FIG. 2 are replaced with steps S202 to S204 in FIG. 3 and the other portions are the same. Therefore, only the steps S202 to S204 will be described below. explain.

図２に示すステップＳ１０１を実行した後，映像ＧＯＰ目標符号量決定部１３の条件判定部１３２は，過去の数映像フレームの平均量子化ステップＱ_ave（ｎ）が，ある閾値ＴＨ_q以上かどうかを判定する（ステップＳ２０２）。ＧＯＰ目標符号量算出部１３３は，平均量子化パラメータＱ_ave（ｎ）が閾値ＴＨ_q以上の場合には，余剰符号量Ｇ_r（ｎ）を超えない範囲で，平均量子化パラメータＱ_ave（ｎ）の閾値ＴＨ_q超過分に応じて，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を決定する（ステップＳ２０３）。この場合のＧ_d（ｎ）は，次式によって求められる。 After executing step S101 shown in FIG. 2, the condition determination unit 132 of the video GOP target code amount determination unit 13 determines whether the average quantization step Q _ave (n) of the past several video frames is greater than a certain threshold TH _q . Is determined (step S202). When the average quantization parameter Q _ave (n) is greater than or equal to the threshold value TH _{q, the} GOP target code amount calculation unit 133 does not exceed the excess code amount G _r (n), and the average quantization parameter Q _ave (n ) according to the threshold TH _q excess of, determining an additional GOP target code amount G _d (n) (step S203). In this case, G _d (n) is obtained by the following equation.

Ｇ_d（ｎ）＝ｍｉｎ（Ｇ_r（ｎ），ａ・（１／ＴＨ_q−１／Ｑ_ave（ｎ））
ｍｉｎ（）は，最小値を返す関数であり，ａは予め定められた定数である。 G _d (n) = min (G _r (n), a · (1 / TH _q −1 / Q _ave (n))
min () is a function that returns a minimum value, and a is a predetermined constant.

一方，平均量子化パラメータＱ_ave（ｎ）が閾値ＴＨ_q以上でない場合には，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を０とする（ステップＳ２０４）。 On the other hand, if the average quantization parameter Q _ave (n) is not equal to or greater than the threshold TH _q , the additional GOP target code amount G _d (n) is set to 0 (step S204).

以降の処理は，実施例１における図２のステップＳ１０５〜Ｓ１１１と同様である。 The subsequent processing is the same as steps S105 to S111 in FIG.

〔実施例３〕
図４は，本発明の実施例３の映像符号化処理のフローチャートである。実施例３では，これから符号化するＧＯＰのアクティビティ平均値ａｃｔ（ｎ）が，ある閾値ＴＨ_act以上の場合に，音声の余剰符号量を映像符号化に利用する。これは，アクティビティ平均値ａｃｔ（ｎ）が大きいと発生符号量が増え，画質が粗くなり，画質向上の必要性が増すとともに，音声の余剰符号量の追加による画質の変動が目立ちにくくなるからである。 Example 3
FIG. 4 is a flowchart of the video encoding process according to the third embodiment of the present invention. In the third embodiment, when the activity average value act (n) of the GOP to be encoded is equal to or greater than a certain threshold TH _act , the excess audio code amount is used for video encoding. This is because if the activity average value act (n) is large, the amount of generated code increases, the image quality becomes coarse, the need for improving the image quality increases, and the change in image quality due to the addition of the extra code amount of audio becomes less noticeable. is there.

実施例３では，図１に示す映像符号化装置１０において，アクティビティ・シーンチェンジ解析部１８が設けられ，アクティビティ・シーンチェンジ解析部１８は，映像符号化より１ＧＯＰ分先行して，映像のアクティビティ解析を行い，アクティビティ平均値ａｃｔ（ｎ）を事前に算出する。ここで，アクティビティとは，符号化対象領域の画素値分布の特徴を示すものであり，画素値の変動具合を表す値である。例えばアクティビティは，マクロブロックをさらに分割した小ブロックの分散値の最小値などで定義される。小ブロックの分散値とは，小ブロック内の輝度値の平均値とその小ブロック内の各画素の輝度値との差分の絶対値の総和を取ったものである。 In the third embodiment, an activity / scene change analysis unit 18 is provided in the video encoding apparatus 10 shown in FIG. 1, and the activity / scene change analysis unit 18 precedes the video encoding by 1 GOP and performs video activity analysis. The activity average value act (n) is calculated in advance. Here, the activity indicates the characteristics of the pixel value distribution in the encoding target area, and is a value representing the degree of variation of the pixel value. For example, an activity is defined by a minimum variance value of small blocks obtained by further dividing a macroblock. The variance value of the small block is a sum of absolute values of differences between the average value of the luminance values in the small block and the luminance value of each pixel in the small block.

実施例３では，図２に示すフローチャートのうち，ステップＳ１０２〜Ｓ１０４が，図４のステップＳ３０２〜Ｓ３０４に置き換わり，他の部分については同様であるので，以下では，ステップＳ３０２〜Ｓ３０４の部分についてだけ説明する。 In the third embodiment, steps S102 to S104 in the flowchart shown in FIG. 2 are replaced with steps S302 to S304 in FIG. 4 and the other portions are the same, so only the steps S302 to S304 will be described below. explain.

図２に示すステップＳ１０１を実行した後，映像ＧＯＰ目標符号量決定部１３の条件判定部１３２は，これから符号化するＧＯＰのアクティビティ平均値ａｃｔ（ｎ）が，ある閾値ＴＨ_act以上かどうかを判定する（ステップＳ３０２）。ＧＯＰ目標符号量算出部１３３は，アクティビティ平均値ａｃｔ（ｎ）が閾値ＴＨ_act以上の場合には，余剰符号量Ｇ_r（ｎ）を超えない範囲で，アクティビティ平均値ａｃｔ（ｎ）の閾値ＴＨ_act超過分に比例して，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を決定する（ステップＳ３０３）。この場合のＧ_d（ｎ）は，次式によって求められる。 After executing step S101 shown in FIG. 2, the condition determination unit 132 of the video GOP target code amount determination unit 13 determines whether the activity average value act (n) of the GOP to be encoded is equal to or greater than a certain threshold TH _act. (Step S302). When the activity average value act (n) is equal to or greater than the threshold TH _act , the GOP target code amount calculation unit 133 does not exceed the surplus code amount G _r (n), and the threshold TH of the activity average value act (n). _The additional GOP target code amount G _d (n) is determined in proportion to the excess of _act (step S303). In this case, G _d (n) is obtained by the following equation.

Ｇ_d（ｎ）＝ｍｉｎ（Ｇ_r（ｎ），ａ・（ａｃｔ（ｎ）−ＴＨ_act））
ｍｉｎ（）は，最小値を返す関数であり，ａは予め定められた定数である。 G _d (n) = min (G _r (n), a · (act (n) −TH _act ))
min () is a function that returns a minimum value, and a is a predetermined constant.

一方，アクティビティ平均値ａｃｔ（ｎ）が閾値ＴＨ_act以上でない場合には，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を０とする（ステップＳ３０４）。 On the other hand, if the activity average value act (n) is not equal to or greater than the threshold TH _act , the additional GOP target code amount G _d (n) is set to 0 (step S304).

〔実施例４〕
図５は，本発明の実施例４の映像符号化処理のフローチャートである。実施例４では，これから符号化するＧＯＰにシーンチェンジがあった場合に，音声の余剰符号量を映像符号化に利用する。これは，これから符号化するＧＯＰにシーンチェンジがあると，シーンチェンジ直後のフレームにおいて動き補償が当たらず，同じ画質を保つためには，通常より多くの符号量を必要とするためである。 Example 4
FIG. 5 is a flowchart of the video encoding process according to the fourth embodiment of the present invention. In the fourth embodiment, when there is a scene change in the GOP to be encoded, the surplus amount of audio is used for video encoding. This is because if a GOP to be encoded has a scene change, motion compensation is not performed in the frame immediately after the scene change, and a larger amount of code is required than usual in order to maintain the same image quality.

実施例４では，図１に示す映像符号化装置１０において，アクティビティ・シーンチェンジ解析部１８が設けられ，アクティビティ・シーンチェンジ解析部１８は，映像符号化より１ＧＯＰ分先行して，映像のアクティビティ解析およびシーンチェンジ解析を行い，シーンチェンジの有無と，シーンチェンジがある場合のシーンチェンジフレームのアクティビティａｃｔ_scを事前に求める。 In the fourth embodiment, an activity / scene change analysis unit 18 is provided in the video encoding device 10 shown in FIG. 1, and the activity / scene change analysis unit 18 performs video activity analysis by 1 GOP before video encoding. In addition, scene change analysis is performed to determine in advance the presence / absence of a scene change and the activity act _sc of the scene change frame when there is a scene change.

シーンチェンジの検出方法は，例えば次の参考文献２に記載されているような方法など，従来から種々の方法が知られているので，ここでの詳しい説明は省略する。
［参考文献２］大辻，外村，“映像カット自動検出方式の検討”，社団法人映像情報メディア学会，テレビジョン学会技術報告ITEJ Technical Report, Vol.16, No.43(19920710), pp.7-12
実施例４では，図２に示すフローチャートのうち，ステップＳ１０２〜Ｓ１０４が，図５のステップＳ４０２〜Ｓ４０４に置き換わり，他の部分については同様であるので，以下では，ステップＳ４０２〜Ｓ４０４の部分についてだけ説明する。 Various methods for detecting a scene change have been conventionally known, such as the method described in Reference Document 2 below, and will not be described in detail here.
[Reference 2] Otsuki, Tonomura, “Examination of automatic video cut detection method”, The Institute of Image Information and Television Engineers, ITJ Technical Report, Vol.16, No.43 (19920710), pp.7 -12
In the fourth embodiment, steps S102 to S104 in the flowchart shown in FIG. 2 are replaced with steps S402 to S404 in FIG. 5 and the other portions are the same, so only the steps S402 to S404 will be described below. explain.

図２に示すステップＳ１０１を実行した後，映像ＧＯＰ目標符号量決定部１３の条件判定部１３２は，アクティビティ・シーンチェンジ解析部１８による解析結果から，これから符号化するＧＯＰにシーンチェンジがあるかどうかを判定する（ステップＳ４０２）。ＧＯＰ目標符号量算出部１３３は，シーンチェンジがある場合には，余剰符号量Ｇ_r（ｎ）を超えない範囲で，シーンチェンジフレームのアクティビティａｃｔ_scに比例して，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を決定する（ステップＳ４０３）。この場合のＧ_d（ｎ）は，次式によって求められる。 After executing step S101 shown in FIG. 2, the condition determination unit 132 of the video GOP target code amount determination unit 13 determines from the analysis result by the activity / scene change analysis unit 18 whether there is a scene change in the GOP to be encoded. Is determined (step S402). When there is a scene change, the GOP target code amount calculation unit 133 adds the additional GOP target code amount G _d in proportion to the activity act _sc of the scene change frame within a range not exceeding the surplus code amount G _r (n). (N) is determined (step S403). In this case, G _d (n) is obtained by the following equation.

Ｇ_d（ｎ）＝ｍｉｎ（Ｇ_r（ｎ），ａ・ａｃｔ_sc＋ｂ））
ｍｉｎ（）は，最小値を返す関数であり，ａ，ｂは予め定められた定数である。 G _d (n) = min (G _r (n), a · act _sc + b))
min () is a function that returns the minimum value, and a and b are predetermined constants.

一方，これから符号化するＧＯＰにシーンチェンジがない場合には，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を０とする（ステップＳ４０４）。 On the other hand, if there is no scene change in the GOP to be encoded, the additional GOP target code amount G _d (n) is set to 0 (step S404).

〔実施例５〕
図６は，本発明の実施例５の映像符号化処理のフローチャートである。実施例５では，過去のいくつかの数の映像フレームの目標符号量と発生符号量との差の総和ΔＢが，ある閾値ＴＨ_B以上の場合に，音声の余剰符号量を映像符号化に利用する。これは，目標符号量と発生符号量との差が大きい状態が続く場合には，符号量が不足していると考えられるためである。 Example 5
FIG. 6 is a flowchart of the video encoding process according to the fifth embodiment of the present invention. In the fifth embodiment, when the sum ΔB of the difference between the target code amount and the generated code amount of several past video frames is equal to or greater than a certain threshold TH _B , the surplus audio code amount is used for video encoding. To do. This is because it is considered that the code amount is insufficient when the difference between the target code amount and the generated code amount continues.

実施例５では，図２に示すフローチャートのうち，ステップＳ１０２〜Ｓ１０４が，図６のステップＳ５０２〜Ｓ５０４に置き換わり，他の部分については同様であるので，以下では，ステップＳ５０２〜Ｓ５０４の部分についてだけ説明する。 In the fifth embodiment, steps S102 to S104 in the flowchart shown in FIG. 2 are replaced with steps S502 to S504 in FIG. 6 and the other portions are the same, so only the steps S502 to S504 will be described below. explain.

図２に示すステップＳ１０１を実行した後，映像ＧＯＰ目標符号量決定部１３の条件判定部１３２は，過去の数映像フレームの目標符号量と実際の発生符号量との差の総和ΔＢが，ある閾値ＴＨ_B以上かどうかを判定する（ステップＳ５０２）。ＧＯＰ目標符号量算出部１３３は，差の総和ΔＢが閾値ＴＨ_q以上の場合には，余剰符号量Ｇ_r（ｎ）を超えない範囲で，閾値ＴＨ_B超過分に比例して，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を決定する（ステップＳ５０３）。この場合のＧ_d（ｎ）は，次式によって求められる。 After executing step S101 shown in FIG. 2, the condition determination unit 132 of the video GOP target code amount determination unit 13 has the sum ΔB of the difference between the target code amount of the past several video frames and the actual generated code amount. It is determined whether or not the threshold value TH _{B is} exceeded (step S502). When the total difference ΔB is equal to or greater than the threshold value TH _q , the GOP target code amount calculation unit 133 adds the additional GOP target in proportion to the excess of the threshold value TH _B within a range not exceeding the surplus code amount G _r (n). The code amount G _d (n) is determined (step S503). In this case, G _d (n) is obtained by the following equation.

Ｇ_d（ｎ）＝ｍｉｎ（Ｇ_r（ｎ），ａ・（ΔＢ−ＴＨ_B））
ｍｉｎ（）は，最小値を返す関数であり，ａは予め定められた定数である。 G _d (n) = min (G _r (n), a · (ΔB−TH _B ))
min () is a function that returns a minimum value, and a is a predetermined constant.

一方，差の総和ΔＢが閾値ＴＨ_B以上でない場合には，追加ＧＯＰ目標符号量Ｇ_d（ｎ）を０とする（ステップＳ５０４）。 On the other hand, if the total difference ΔB is not equal to or greater than the threshold value TH _B , the additional GOP target code amount G _d (n) is set to 0 (step S504).

〔実施例６〕
図７は，本発明の実施例６の映像符号化処理のフローチャートである。実施例６と，前述した実施例１〜５との違いは，余剰符号量算出部１３１におけるＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）の算出方法である。 Example 6
FIG. 7 is a flowchart of the video encoding process according to the sixth embodiment of the present invention. The difference between the sixth embodiment and the first to fifth embodiments described above is the method of calculating the surplus code amount G _r (n) in the GOP (n) period in the surplus code amount calculation unit 131.

実施例６においても，映像と音声の符号化は並行して行われる。ただし，実施例１〜５では，ＧＯＰ（ｎ）の映像フレームと同一表示期間の音声フレームの発生符号量をもとにＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）を算出するのに対し，実施例６では，ＧＯＰ（ｎ）の最後の映像フレームよりＡ個の映像フレーム分だけ前の期間と，同じ長さの同一表示期間の音声フレームの発生符号量をもとに，ＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）を算出する。ここで，Ａは予め定められた非負の整数である。 Also in the sixth embodiment, video and audio are encoded in parallel. However, in the first to fifth embodiments, the surplus code amount G _r (n) in the GOP (n) period is calculated based on the generated code amount of the audio frame in the same display period as the video frame of GOP (n). On the other hand, in the sixth embodiment, based on the generated code amount of the audio frame in the same display period having the same length as that of the A video frame before the last video frame of GOP (n), n) The surplus code amount G _r (n) for the period is calculated. Here, A is a predetermined non-negative integer.

したがって，本実施例では，ＧＯＰ（ｎ）の映像フレーム符号化開始より前に，音声符号化装置２０により，このＧＯＰ（ｎ）の最後の映像フレームのＡ映像フレーム前の期間（長さはＧＯＰ（ｎ）期間と同じ）と同一表示期間の音声フレームの符号化が完了しているものとする。 Therefore, in this embodiment, before the start of video frame encoding of GOP (n), the audio encoding device 20 makes a period (length is GOP) of the last video frame of GOP (n) before the A video frame. (N) Same as period) It is assumed that the encoding of the audio frame in the same display period is completed.

実施例６は，ＧＯＰ（ｎ）の映像符号化期間と音声発生符号量算出期間との時間的なずれにより，前述した実施例と比べて余剰符号量Ｇ_r（ｎ）の算出精度が多少悪くなる場合があるが，音声を先行して符号化する期間を短くすることが可能になるため，符号化の遅延時間を前述した実施例に比べて短縮することができるようになる。 In the sixth embodiment, the calculation accuracy of the surplus code amount G _r (n) is slightly worse than that of the above-described embodiment due to the time lag between the video coding period of GOP (n) and the sound generation code amount calculation period. However, since it is possible to shorten the time period for encoding speech in advance, the encoding delay time can be shortened as compared with the above-described embodiment.

実施例６では，図２に示すフローチャートのうち，ステップＳ１０１が，図７のステップＳ６０１に置き換わる。 In the sixth embodiment, step S101 in the flowchart shown in FIG. 2 is replaced with step S601 in FIG.

ステップＳ６０１では，映像ＧＯＰ目標符号量決定部１３の余剰符号量算出部１３１が，音声符号化装置２０から通知された音声発生符号量の情報をもとに，ＧＯＰ（ｎ）の映像フレーム期間よりＡ映像フレーム前の同一長期間と同じ表示期間の音声フレームｊ〜ｊ＋Ｍ−１（Ｍは前記表示期間の音声フレーム数）について，ＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）を，次式によって求める。 In step S601, the surplus code amount calculation unit 131 of the video GOP target code amount determination unit 13 starts from the video frame period of GOP (n) based on the information of the audio generation code amount notified from the audio encoding device 20. For audio frames j to j + M−1 (M is the number of audio frames in the display period) in the same display period as the same long period before the A video frame, the surplus code amount G _r (n) in the GOP (n) period is Obtained by the formula.

Ｇ_r（ｎ）＝Σ_x=j ^j+M-1｛Ｓ_amax−Ｓ_a（ｘ）｝
ここで，Ｓ_amaxは，各音声フレームの最大発生符号量，Ｓ_a（ｘ）は，ｘ番目の音声フレームの発生符号量である。 G _r (n) = Σx _{= j} ^{j + M−1} {S _amax −S _a (x)}
Here, S _amax is the maximum generated code amount of each audio frame, and S _a (x) is the generated code amount of the x-th audio frame.

以降の処理は，実施例１における図２のステップＳ１０２〜Ｓ１１１と同様であるが，余剰符号量利用条件の判定を，前述した実施例２〜５の方法を用いて実施してもよい。この場合，図２のステップＳ１０２〜Ｓ１０４が，実施例２の場合には図３のステップＳ２０２〜Ｓ２０４に，実施例３の場合には図４のステップＳ３０２〜Ｓ３０４に，実施例４の場合には図５のステップＳ４０２〜Ｓ４０４に，実施例５の場合には図６のステップＳ５０２〜Ｓ５０４に，さらに置き換わることになる。 The subsequent processing is the same as steps S102 to S111 of FIG. 2 in the first embodiment, but the determination of the surplus code amount use condition may be performed using the method of the second to fifth embodiments described above. In this case, steps S102 to S104 in FIG. 2 are performed in steps S202 to S204 in FIG. 3 in the case of the second embodiment, steps S302 to S304 in FIG. 4 in the case of the third embodiment, and in the case of the fourth embodiment. 5 further replaces steps S402 to S404 in FIG. 5 and, in the case of the fifth embodiment, steps S502 to S504 in FIG.

〔実施例７〕
図８は，本発明の実施例７の映像符号化処理のフローチャートである。実施例７と，前述した実施例１〜６との違いは，前述した実施例１〜６では，音声の余剰符号量を映像符号化に利用していたのに対し，実施例７では，符号化ストリームに重畳する映像ストリーム以外のデータストリームの余剰符号量を，余剰符号量算出部１３１におけるＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）として算出している点である。 Example 7
FIG. 8 is a flowchart of the video encoding process according to the seventh embodiment of the present invention. The difference between the seventh embodiment and the first to sixth embodiments described above is that, in the first to sixth embodiments described above, the audio extra code amount is used for video coding, whereas in the seventh embodiment, the code is changed. The surplus code amount of the data stream other than the video stream to be superimposed on the encoded stream is calculated as the surplus code amount G _r (n) in the GOP (n) period in the surplus code amount calculation unit 131.

本実施例では，ＧＯＰ（ｎ）の映像フレーム符号化開始より前に，ＧＯＰ（ｎ）の映像フレームと同一デコード時刻の符号化ストリーム中に多重化するデータのデータビット量が判明しているものとする。この場合の基準映像ビットレートは，データフレームのビット量が常に最大の場合に確保できる映像ビットレートを指す。 In this embodiment, the amount of data bits of data to be multiplexed in the encoded stream at the same decoding time as the video frame of GOP (n) is known before the start of video frame encoding of GOP (n). And The reference video bit rate in this case refers to a video bit rate that can be secured when the bit amount of the data frame is always the maximum.

実施例７では，図２に示すフローチャートのうち，ステップＳ１０１が，図８のステップＳ７０１に置き換わる。 In the seventh embodiment, step S101 in the flowchart shown in FIG. 2 is replaced with step S701 in FIG.

ステップＳ７０１では，映像ＧＯＰ目標符号量決定部１３の余剰符号量算出部１３１が，ＧＯＰ（ｎ）の映像フレームのデコード時刻と同一時刻に受信側で必要とされるデータフレームｊ〜ｊ＋Ｍ−１（ＭはＧＯＰ（ｎ）と同一期間のデータフレーム数）について，ＧＯＰ（ｎ）期間の余剰符号量Ｇ_r（ｎ）を，次式によって求める。 In step S701, the surplus code amount calculation unit 131 of the video GOP target code amount determination unit 13 uses the data frames j to j + M−1 (required on the receiving side at the same time as the video frame decoding time of GOP (n). For M) (the number of data frames in the same period as GOP (n)), the surplus code amount G _r (n) in the GOP (n) period is obtained by the following equation.

Ｇ_r（ｎ）＝Σ_x=j ^j+M-1｛Ｓ_dmax−Ｓ_d（ｘ）｝
ここで，Ｓ_dmaxは，各データフレームの最大発生符号量，Ｓ_d（ｘ）は，ｘ番目のデータフレームの発生符号量である。 G _r (n) = Σx _{= j} ^{j + M−1} {S _dmax −S _d (x)}
Here, S _dmax is the maximum generated code amount of each data frame, and S _d (x) is the generated code amount of the x-th data frame.

以降の処理は，実施例１における図２のステップＳ１０２〜Ｓ１１１と同様であるが，実施例６の場合と同じように，余剰符号量利用条件の判定を，実施例２〜５の方法を用いて行ってもよい。 The subsequent processing is the same as steps S102 to S111 in FIG. 2 in the first embodiment. However, as in the sixth embodiment, the surplus code amount use condition is determined using the method of the second to fifth embodiments. You may go.

以上の映像符号化の処理は，コンピュータとソフトウェアプログラムとによって実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録することも，ネットワークを通して提供することも可能である。 The above video encoding processing can be realized by a computer and a software program, and the program can be recorded on a computer-readable recording medium or provided through a network.

本発明の実施例に係る装置の構成例を示す図である。It is a figure which shows the structural example of the apparatus which concerns on the Example of this invention. 本発明の実施例１の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 1 of this invention. 本発明の実施例２の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 2 of this invention. 本発明の実施例３の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 3 of this invention. 本発明の実施例４の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 4 of this invention. 本発明の実施例５の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 5 of this invention. 本発明の実施例６の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 6 of this invention. 本発明の実施例７の映像符号化処理のフローチャートである。It is a flowchart of the video encoding process of Example 7 of this invention. 本発明の課題を説明する図である。It is a figure explaining the subject of this invention.

Explanation of symbols

１０映像符号化装置
１１映像入力部
１２映像入力バッファ
１３映像ＧＯＰ目標符号化量決定部
１３１余剰符号量算出部
１３２条件判定部
１３３ＧＯＰ目標符号量算出部
１４映像フレーム目標符号量決定部
１５映像フレーム符号化処理部
１６映像ストリーム出力部
１７映像ストリーム出力バッファ
１８アクティビティ・シーンチェンジ解析部
２０音声符号化装置
３０多重化部 DESCRIPTION OF SYMBOLS 10 Video coding apparatus 11 Video input part 12 Video input buffer 13 Video GOP target coding amount determination part 131 Surplus code amount calculation part 132 Condition determination part 133 GOP target code amount calculation part 14 Video frame target code amount determination part 15 Video frame Encoding processing unit 16 Video stream output unit 17 Video stream output buffer 18 Activity / scene change analysis unit 20 Audio encoding device 30 Multiplexing unit

Claims

A video code amount control method for controlling a video code amount so that an encoded stream obtained by multiplexing video encoded data and audio encoded data falls within a predetermined bit rate range,
A process of calculating a surplus code amount of audio in a period of a video encoding control unit from a coded audio generation code amount;
Determining whether or not to add a surplus code amount to a target code amount in a video encoding control unit, and determining a video feature or encoding condition that affects a predetermined generated video amount;
When it is determined that the condition is satisfied, a part or all of the excess code amount is added to the target code amount in the video encoding control unit, and when it is determined that the condition is not satisfied, the excess code amount Determining the target code amount in the video encoding control unit to be encoded next, without adding to the target code amount in the video encoding control unit;
A video code amount control method comprising: encoding a video signal in a video encoding control unit according to the determined target code amount.

In the video code amount control method according to claim 1,
The condition is that the average quantization parameter of a predetermined number of past video frames is greater than or equal to a predetermined threshold, or that the average quantization step of a past predetermined number of video frames is greater than or equal to a predetermined threshold, or The condition that the average activity value of the video encoding control unit to be encoded is greater than or equal to a predetermined threshold, the condition that there is a scene change in the video encoding control unit to be encoded, or the past predetermined number of video frames A video code amount control method, characterized in that the total sum of the differences between the target code amount and the generated code amount is equal to or greater than a predetermined threshold.

In the video code amount control method according to claim 1 or 2,
The surplus code amount of the audio is determined as a difference between a predetermined maximum generated code amount of each audio encoding control unit and an actual generated code amount in the audio encoding control unit group in the same display period as the video encoding control unit. A video code amount control method, characterized by being calculated from a sum.

In the video code amount control method according to claim 1 or 2,
The audio encoding control unit group having the same display period as the video frame group having the same length as the video encoding control unit preceding the video encoding control unit by a predetermined number of video frames. A video code amount control method, comprising: calculating a sum of differences between a predetermined maximum generated code amount and an actual generated code amount in each audio encoding control unit.

A video code amount control method for controlling a video code amount so that an encoded stream obtained by multiplexing video encoded data and other encoded data is within a predetermined bit rate range,
Calculating a surplus code amount of the other encoded data in a period of a video encoding control unit from a generated code amount of the other encoded data that has been encoded;
Determining whether or not to add a surplus code amount to a target code amount in a video encoding control unit, and determining a video feature or encoding condition that affects a predetermined generated video amount;
When it is determined that the condition is satisfied, a part or all of the excess code amount is added to the target code amount in the video encoding control unit, and when it is determined that the condition is not satisfied, the excess code amount Determining the target code amount in the video encoding control unit to be encoded next, without adding to the target code amount in the video encoding control unit;
A video code amount control method comprising: encoding a video signal in a video encoding control unit according to the determined target code amount.

A video encoding device that controls the amount of video code and encodes a video signal so that an encoded stream obtained by multiplexing video encoded data and audio encoded data falls within a predetermined bit rate range. And
Means for obtaining an encoded audio generation code amount and calculating a surplus code amount of audio in a period of a video encoding control unit;
Means for determining whether or not to add a surplus code amount to a target code amount in a video encoding control unit, a video feature or encoding condition that affects a predetermined generated code amount of video;
When it is determined that the condition is satisfied, a part or all of the excess code amount is added to the target code amount in the video encoding control unit, and when it is determined that the condition is not satisfied, the excess code amount Means for determining the target code amount in the video encoding control unit to be encoded next, without adding to the target code amount in the video encoding control unit;
A video encoding device, comprising: means for encoding a video signal in a video encoding control unit according to the determined target code amount.

A video code amount control program for causing a computer to execute the video code amount control method according to any one of claims 1 to 5.

A computer-readable recording medium having recorded thereon a video code amount control program for causing a computer to execute the video code amount control method according to any one of claims 1 to 5.