JP4844456B2

JP4844456B2 - Video signal hierarchical encoding apparatus, video signal hierarchical encoding method, and video signal hierarchical encoding program

Info

Publication number: JP4844456B2
Application number: JP2007108016A
Authority: JP
Inventors: 和博嶋内; 智坂爪; 徹熊倉; 基晴上田
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2006-06-15
Filing date: 2007-04-17
Publication date: 2011-12-28
Anticipated expiration: 2027-04-17
Also published as: JP2008022531A

Description

本発明は、映像信号の階層符号化を行う映像信号階層符号化装置、映像信号階層符号化方法、及び映像信号階層符号化プログラムに関する。 The present invention relates to a video signal hierarchical encoding device, a video signal hierarchical encoding method, and a video signal hierarchical encoding program that perform hierarchical encoding of a video signal.

従来、映像符号化において空間解像度、時間解像度およびSNRそれぞれのスケーラビリティを実現する符号化方式が数多く提案されており、さまざまな分野でこれらの実用化がなされている。なかでも、空間解像度のスケーラビリティに関しては、静止画像の符号化を含め、その適用範囲が広い。 Conventionally, many coding schemes have been proposed for realizing spatial resolution, temporal resolution, and SNR scalability in video coding, and these have been put to practical use in various fields. In particular, the spatial resolution scalability includes a wide range of applications including still image coding.

映像の空間解像度スケーラビリティを実現する従来技術として、例えば、ベースレイヤとエンハンスメントレイヤの２層の階層符号化装置において、エンハンスメントレイヤと同じ空間解像度の入力映像信号をベースレイヤの空間解像度に縮小（デジメーション）処理した後、ベースレイヤにて符号化し、そのベースレイヤ符号化の際のデコード信号を空間的にインターポレーション（補間）してエンハンスメントレイヤと同じ空間解像度にした信号とエンハンスメントレイヤと同じ空間解像度の入力映像信号との間で相関を利用した予測を行い、その予測誤差信号を符号化して、そこで得た符号化ビットストリームとベースレイヤ符号化で得られたビットストリームを多重化したものを復号装置へ伝送し、復号装置ではその多重化された符号化ビットストリームをその逆に復号するものがある（例えば、特許文献１参照。）。
特開平7-162870号公報 As a conventional technique for realizing the spatial resolution scalability of a video, for example, in a two-layer hierarchical encoding device of a base layer and an enhancement layer, an input video signal having the same spatial resolution as that of the enhancement layer is reduced to the spatial resolution of the base layer (decimation). ) After processing, the signal is encoded in the base layer, and the decoded signal at the time of base layer encoding is spatially interpolated to obtain the same spatial resolution as the enhancement layer and the same spatial resolution as the enhancement layer. Prediction using the correlation with the input video signal, encoding the prediction error signal, and decoding the multiplexed bit stream obtained there and the bit stream obtained by base layer coding Transmitted to the device, and the decoding device uses the multiplexed code. There is to decode the bitstream vice versa (for example, see Patent Document 1.).
Japanese Unexamined Patent Publication No. 7-16870

ところで、上述の特許文献１に記載の背景技術では、ベースレイヤのデコード信号をインターポレーションし、それをエンハンスメントレイヤ符号化における予測信号として用いる。これは、エンハンスメントレイヤに入力されるオリジナルの映像信号とベースレイヤの信号との間にある程度の相関がある、すなわち、オリジナルの映像信号の一部の周波数成分をベースレイヤの信号がもっていることを利用したものである。したがって、ベースレイヤのデコード信号とエンハンスメントレイヤに入力されるオリジナルの映像信号との間の相関がより高ければ、符号化効率は高くなる。 By the way, in the background art described in Patent Document 1 described above, a base layer decoded signal is interpolated and used as a prediction signal in enhancement layer coding. This means that there is a certain degree of correlation between the original video signal input to the enhancement layer and the base layer signal, that is, the base layer signal has some frequency components of the original video signal. It is used. Therefore, the higher the correlation between the base layer decoded signal and the original video signal input to the enhancement layer, the higher the coding efficiency.

しかし、ベースレイヤのデコード信号は、入力映像信号を縮小（デジメーション）処理した劣化した信号であり、本来の高い周波数成分をもたず、また、量子化の程度が荒い場合には、オリジナルの映像信号との相関が低くなった信号となる場合があるため、より効率的な符号化を実現する為には、ベースレイヤのデコード信号を単純にインターポレーションして予測信号を得るのではなく、よりオリジナルの映像信号に近づけるような推定処理（高解像度化処理）をおこなって予測信号を得ることが必要であると考えられる。 However, the base layer decode signal is a deteriorated signal obtained by reducing (decimating) the input video signal, does not have the original high frequency component, and if the degree of quantization is rough, Since it may be a signal with a low correlation with the video signal, in order to achieve more efficient encoding, instead of simply interpolating the base layer decoded signal to obtain the prediction signal Thus, it is considered necessary to obtain a prediction signal by performing an estimation process (high resolution process) so as to be closer to the original video signal.

そこで、本発明は、予測信号の適確な高解像度化処理を行って、より効率的な映像階層符号化を実現することを目的とする。 Therefore, an object of the present invention is to realize more efficient video hierarchical coding by performing accurate high resolution processing of a prediction signal.

そこで、上記課題を解決するために本発明は、以下の装置、方法、及びプログラムを提供するものである。
（１）入力映像信号を解像度の異なる階層に分解して得た前記入力映像信号よりも解像度の低い映像信号を符号化すると共に、前記解像度の低い映像信号から予測信号を生成し、その予測信号を用いて解像度の高い側の前記入力映像信号を空間解像度間予測により符号化し、異なる解像度の映像信号の各符号化データを得る映像信号階層符号化装置であって、
入力映像信号に対して空間的縮小を行って前記入力映像信号よりも解像度の低い第１の映像信号を得る空間的縮小手段と、
前記第１の映像信号を、復号化処理を含む符号処理を用いて符号化した第１の符号化データを得る第１の符号化手段と、
前記復号化処理で得られた復号信号から、前記復号信号の空間解像度以上の空間解像度で表現できる高周波数成分を推定し、高周波数成分推定信号を生成する高周波数推定手段と、
前記高周波数成分推定信号を生成する過程において、前記高周波数成分推定信号の過強調を抑制するための過強調抑制手段と、
前記高周波数推定手段における高周波数成分推定量の程度、及び前記過強調抑制手段における過強調を抑制する程度の内の少なくとも一方の程度を、前記第１の符号化手段で用いた量子化パラメータに応じて制御する高解像度化処理を行って、前記高周波数成分推定信号に基づき前記復号信号を空間的に拡大した高解像度化拡大映像信号である第２の映像信号を得る空間的拡大手段と、
予測信号を用いて前記入力映像信号を空間解像度間予測により符号化した、解像度の高い側の映像信号の符号化データである第２の符号化データを得る第２の符号化手段と、
前記第２の符号化手段において用いる前記予測信号として、前記第２の符号化手段で符号化対象となっている空間解像度をもつ階層において得られた所定の予測信号と、解像度の低い側の階層から前記高周波数成分推定信号に基づき得た予測信号である前記第２の映像信号との内の、いずれか一つを選択する予測信号選択手段と、
前記第１及び第２の各符号化データと前記量子化パラメータのデータとを多重化する多重化手段と、
を備えることを特徴とする映像信号階層符号化装置。
（２）入力映像信号を解像度の異なる階層に分解して得た前記入力映像信号よりも解像度の低い映像信号を符号化すると共に、前記解像度の低い映像信号から予測信号を生成し、その予測信号を用いて解像度の高い側の前記入力映像信号を空間解像度間予測により符号化し、異なる解像度の映像信号の各符号化データを得る映像信号階層符号化方法であって、
入力映像信号に対して空間的縮小を行って前記入力映像信号よりも解像度の低い第１の映像信号を得る空間的縮小ステップと、
前記第１の映像信号を、復号化処理を含む符号処理を用いて符号化した第１の符号化データを得る第１の符号化ステップと、
前記復号化処理で得られた復号信号から、前記復号信号の空間解像度以上の空間解像度で表現できる高周波数成分を推定し、高周波数成分推定信号を生成する高周波数推定ステップと、
前記高周波数成分推定信号を生成する過程において、前記高周波数成分推定信号の過強調を抑制するための過強調抑制ステップと、
前記高周波数推定ステップにおける高周波数成分推定量の程度、及び前記過強調抑制ステップにおける過強調を抑制する程度の内の少なくとも一方の程度を、前記第１の符号化ステップで用いた量子化パラメータに応じて制御する高解像度化処理を行って、前記高周波数成分推定信号に基づき前記復号信号を空間的に拡大した高解像度化拡大映像信号である第２の映像信号を得る空間的拡大ステップと、
予測信号を用いて前記入力映像信号を空間解像度間予測により符号化した、解像度の高い側の映像信号の符号化データである第２の符号化データを得る第２の符号化ステップと、
前記第２の符号化ステップにおいて用いる前記予測信号として、前記第２の符号化ステップで符号化対象となっている空間解像度をもつ階層において得られた所定の予測信号と、解像度の低い側の階層から前記高周波数成分推定信号に基づき得た予測信号である前記第２の映像信号との内の、いずれか一つを選択する予測信号選択ステップと、
前記第１及び第２の各符号化データと前記量子化パラメータのデータとを多重化する多重化ステップと、
を備えることを特徴とする映像信号階層符号化方法。
（３）入力映像信号を解像度の異なる階層に分解して得た前記入力映像信号よりも解像度の低い映像信号を符号化すると共に、前記解像度の低い映像信号から予測信号を生成し、その予測信号を用いて解像度の高い側の前記入力映像信号を空間解像度間予測により符号化し、異なる解像度の映像信号の各符号化データを得る動作をコンピュータに実行させるための映像信号階層符号化プログラムであって、
入力映像信号に対して空間的縮小を行って前記入力映像信号よりも解像度の低い第１の映像信号を得る空間的縮小手段と、
前記第１の映像信号を、復号化処理を含む符号処理を用いて符号化した第１の符号化データを得る第１の符号化手段と、
前記復号化処理で得られた復号信号から、前記復号信号の空間解像度以上の空間解像度で表現できる高周波数成分を推定し、高周波数成分推定信号を生成する高周波数推定手段と、
前記高周波数成分推定信号を生成する過程において、前記高周波数成分推定信号の過強調を抑制するための過強調抑制手段と、
前記高周波数推定手段における高周波数成分推定量の程度、及び前記過強調抑制手段における過強調を抑制する程度の内の少なくとも一方の程度を、前記第１の符号化手段で用いた量子化パラメータに応じて制御する高解像度化処理を行って、前記高周波数成分推定信号に基づき前記復号信号を空間的に拡大した高解像度化拡大映像信号である第２の映像信号を得る空間的拡大手段と、
予測信号を用いて前記入力映像信号を空間解像度間予測により符号化した、解像度の高い側の映像信号の符号化データである第２の符号化データを得る第２の符号化手段と、
前記第２の符号化手段において用いる前記予測信号として、前記第２の符号化手段で符号化対象となっている空間解像度をもつ階層において得られた所定の予測信号と、解像度の低い側の階層から前記高周波数成分推定信号に基づき得た予測信号である前記第２の映像信号との内の、いずれか一つを選択する予測信号選択手段と、
前記第１及び第２の各符号化データと前記量子化パラメータのデータとを多重化する多重化手段と、
してコンピュータを機能させるための映像信号階層符号化プログラム。
Therefore, in order to solve the above problems, the present invention provides the following apparatus, method, and program.
(1) A video signal having a resolution lower than that of the input video signal obtained by decomposing the input video signal into layers having different resolutions is encoded, a prediction signal is generated from the video signal having a low resolution, and the prediction signal A video signal hierarchical encoding device that encodes the input video signal on the higher resolution side using spatial prediction and obtains encoded data of video signals of different resolutions,
Spatial reduction means for spatially reducing the input video signal to obtain a first video signal having a resolution lower than that of the input video signal;
First encoding means for obtaining first encoded data obtained by encoding the first video signal using an encoding process including a decoding process;
High frequency estimation means for estimating a high frequency component that can be expressed with a spatial resolution equal to or higher than the spatial resolution of the decoded signal from the decoded signal obtained by the decoding process, and generating a high frequency component estimated signal;
In the process of generating the high frequency component estimation signal, overemphasis suppressing means for suppressing overemphasis of the high frequency component estimation signal;
The quantization parameter used in the first encoding means is at least one of the degree of the high frequency component estimator in the high frequency estimation means and the degree of suppression of overemphasis in the overemphasis suppression means. Spatial enlargement means for obtaining a second video signal which is a high-resolution enlarged video signal obtained by performing a high-resolution processing controlled in response, and spatially enlarging the decoded signal based on the high-frequency component estimation signal ;
Second encoding means for obtaining second encoded data that is encoded data of a video signal on the higher resolution side, wherein the input video signal is encoded by prediction between spatial resolutions using a prediction signal ;
As the prediction signal used in the second encoding means, a predetermined prediction signal obtained in a hierarchy having a spatial resolution to be encoded by the second encoding means, and a lower resolution hierarchy Prediction signal selection means for selecting any one of the second video signal which is a prediction signal obtained based on the high frequency component estimation signal from:
Multiplexing means for multiplexing each of the first and second encoded data and the quantization parameter data ;
A video signal hierarchical encoding device comprising:
(2) Encoding a video signal having a resolution lower than that of the input video signal obtained by decomposing the input video signal into layers having different resolutions, generating a prediction signal from the video signal having a low resolution, and generating the prediction signal A video signal hierarchical encoding method for encoding the input video signal on the higher resolution side using spatial prediction and obtaining encoded data of video signals of different resolutions,
A spatial reduction step of spatially reducing the input video signal to obtain a first video signal having a lower resolution than the input video signal;
A first encoding step of obtaining first encoded data obtained by encoding the first video signal using an encoding process including a decoding process;
A high frequency estimation step for estimating a high frequency component that can be expressed with a spatial resolution equal to or higher than a spatial resolution of the decoded signal from the decoded signal obtained by the decoding process, and generating a high frequency component estimated signal;
In the process of generating the high frequency component estimation signal, an overemphasis suppressing step for suppressing overemphasis of the high frequency component estimation signal;
The quantization parameter used in the first encoding step is at least one of the degree of the high frequency component estimator in the high frequency estimation step and the degree of suppression of overemphasis in the overemphasis suppression step. A spatial enlargement step of obtaining a second video signal that is a high-resolution enlarged video signal obtained by performing a high-resolution process controlled in response, and spatially enlarging the decoded signal based on the high-frequency component estimation signal ;
A second encoding step of obtaining second encoded data which is encoded data of a video signal on the higher resolution side, wherein the input video signal is encoded by prediction between spatial resolutions using a prediction signal ;
As the prediction signal used in the second encoding step, a predetermined prediction signal obtained in a layer having a spatial resolution to be encoded in the second encoding step, and a layer on the lower resolution side A prediction signal selection step of selecting any one of the second video signal which is a prediction signal obtained from the high frequency component estimation signal from:
A multiplexing step for multiplexing each of the first and second encoded data and the quantization parameter data ;
A video signal hierarchical encoding method comprising:
(3) encoding a video signal having a lower resolution than the input video signal obtained by decomposing the input video signal into layers having different resolutions, generating a prediction signal from the video signal having a low resolution, and generating the prediction signal; A video signal hierarchical encoding program for causing the computer to execute an operation of encoding the input video signal on the higher resolution side using spatial prediction and obtaining encoded data of video signals of different resolutions ,
Spatial reduction means for spatially reducing the input video signal to obtain a first video signal having a resolution lower than that of the input video signal;
First encoding means for obtaining first encoded data obtained by encoding the first video signal using an encoding process including a decoding process;
High frequency estimation means for estimating a high frequency component that can be expressed with a spatial resolution equal to or higher than the spatial resolution of the decoded signal from the decoded signal obtained by the decoding process, and generating a high frequency component estimated signal;
In the process of generating the high frequency component estimation signal, overemphasis suppressing means for suppressing overemphasis of the high frequency component estimation signal;
The quantization parameter used in the first encoding means is at least one of the degree of the high frequency component estimator in the high frequency estimation means and the degree of suppression of overemphasis in the overemphasis suppression means. Spatial enlargement means for obtaining a second video signal which is a high-resolution enlarged video signal obtained by performing a high-resolution processing controlled in response, and spatially enlarging the decoded signal based on the high-frequency component estimation signal ;
Second encoding means for obtaining second encoded data that is encoded data of a video signal on the higher resolution side, wherein the input video signal is encoded by prediction between spatial resolutions using a prediction signal ;
As the prediction signal used in the second encoding means, a predetermined prediction signal obtained in a hierarchy having a spatial resolution to be encoded by the second encoding means, and a lower resolution hierarchy Prediction signal selection means for selecting any one of the second video signal which is a prediction signal obtained based on the high frequency component estimation signal from:
Multiplexing means for multiplexing each of the first and second encoded data and the quantization parameter data ;
Video signal hierarchical encoding program for causing a computer to function.

本発明によれば、低解像度である第１の符号化の復号信号を空間的に拡大する際、その第１の符号化の際の量子化パラメータに応じて高周波数成分推定の程度を制御した高解像度化処理を行って第２の映像信号を得て、その第２の映像信号を予測信号として用い入力映像信号を空間解像度間予測により符号化するようにしたので、従来の映像階層符号化における階層間予測の為の単純なインターポレーション（空間的拡大）とは異なり、第１の符号化の際の量子化パラメータに応じた適確な高解像度化処理を行うことができ、階層間予測誤差をより小さくすることができるので、効率的でより高品位な映像信号階層符号化を実現することが可能となる。 According to the present invention, when the decoded signal of the first encoding with low resolution is spatially expanded, the degree of high frequency component estimation is controlled according to the quantization parameter at the time of the first encoding. Since the second video signal is obtained by performing the high resolution processing, and the input video signal is encoded by the inter-spatial resolution prediction using the second video signal as the prediction signal, the conventional video hierarchical encoding is performed. Unlike simple interpolation (spatial expansion) for inter-layer prediction, the high-resolution processing appropriate for the quantization parameter at the time of the first encoding can be performed, and Since the prediction error can be further reduced, it is possible to realize efficient and higher-quality video signal hierarchical coding.

さらに、映像信号階層符号化装置内で低解像度信号の符号化特性を考慮して、低解像度信号から入力映像信号（高解像度信号）により近い予測信号を生成する構成がとれる為、予測信号の高解像度化理をより強化した効率的な映像階層符号化を実現することが可能となる。 Furthermore, since the video signal hierarchical encoding device can be configured to generate a prediction signal closer to the input video signal (high resolution signal) from the low resolution signal in consideration of the encoding characteristics of the low resolution signal, It is possible to realize efficient video hierarchical coding with further enhanced resolution.

また、本発明では、低解像度信号の符号化特性を量子化パラメータから判断することで、高解像度化処理の制御を少ない計算量で実現し、伝送時に新たなパラメータを追加する必要もないため、回路規模及び符号化効率の両方の観点から有用な効果を得ることが可能となる。 Further, in the present invention, by determining the encoding characteristics of the low resolution signal from the quantization parameter, the control of the high resolution processing is realized with a small amount of calculation, and it is not necessary to add a new parameter at the time of transmission. Useful effects can be obtained from the viewpoints of both circuit scale and encoding efficiency.

本発明は、従来の階層符号化に階層間の予測効率を上げるための推定処理を導入することがまずひとつの新しい概念であり、それに加えて、入力映像信号を解像度の異なる階層に分解して得た前記入力映像信号よりも解像度の低い映像信号を符号化する過程で得られる復号化信号（ベースレイヤデコード信号）を、ベースレイヤデコード信号の符号化特性に基づいて入力映像信号に近づけることがもうひとつの新しい概念である。これらを実現するための構成、方法及びプログラムの実施の形態を以下に示す。なお、以下に示す実施の形態は、説明を簡単にするためにベースレイヤとエンハンスメントレイヤの二階層の階層符号化・復号化を例に挙げているが、これを三階層以上の多階層で実現しても良い。 In the present invention, the introduction of an estimation process for increasing the prediction efficiency between layers in the conventional layer coding is one new concept. In addition, the input video signal is decomposed into layers having different resolutions. A decoded signal (base layer decoded signal) obtained in the process of encoding a video signal having a resolution lower than that of the obtained input video signal may be brought close to the input video signal based on the encoding characteristics of the base layer decoded signal. Another new concept. Embodiments of a configuration, a method, and a program for realizing these will be described below. In the embodiment shown below, for the sake of simplicity, two layers of hierarchical encoding / decoding of the base layer and the enhancement layer are taken as an example, but this is realized with multiple layers of three or more layers. You may do it.

［実施の形態１］
図1に、本発明の実施の形態１を適用した空間解像度スケーラビリティを実現する映像信号階層符号化・復号化装置の構成例を示す。 [Embodiment 1]
FIG. 1 shows a configuration example of a video signal hierarchical encoding / decoding apparatus that realizes spatial resolution scalability to which Embodiment 1 of the present invention is applied.

図１において、映像信号階層符号化装置101にはオリジナルの映像信号が入力され、映像信号階層符号化装置101で生成されたビットストリームが電話回線や通信回線等のネットワーク102を介して映像信号階層復号化装置103に伝送されるように接続されている。映像信号階層復号化装置103では供給されたビットストリームから必要な情報を取り出して、ディスプレイ等の性能に合った空間解像度のデコード映像信号を出力する。なお、ネットワーク102は、有線でも、無線でも良く、さらには、ネットワーク102の代わりにＤＶＤやメモリ等の記録媒体を介して映像信号階層符号化装置101と映像信号階層復号化装置103とがビットストリームをやり取りするようにしても勿論よい。 In FIG. 1, an original video signal is input to a video signal hierarchical encoding apparatus 101, and a bit stream generated by the video signal hierarchical encoding apparatus 101 is transmitted through a network 102 such as a telephone line or a communication line. It is connected so as to be transmitted to the decoding device 103. The video signal hierarchical decoding apparatus 103 extracts necessary information from the supplied bit stream and outputs a decoded video signal having a spatial resolution suitable for the performance of a display or the like. The network 102 may be wired or wireless, and the video signal hierarchical encoding device 101 and the video signal hierarchical decoding device 103 are bitstreamed via a recording medium such as a DVD or a memory instead of the network 102. Of course, it is also possible to exchange them.

映像信号階層符号化装置101は、空間デシメーション部（空間的縮小手段）104、ベースレイヤエンコード部（第１の符号化手段）105、高解像度推定信号生成部（空間的拡大手段、第３の符号化手段）106、エンハンスメントレイヤ符号化部（第２の符号化手段）107および多重化部108を少なくとも有している。 The video signal hierarchical encoding apparatus 101 includes a spatial decimation unit (spatial reduction unit) 104, a base layer encoding unit (first encoding unit) 105, and a high-resolution estimated signal generation unit (spatial expansion unit, third code). Encoding means) 106, an enhancement layer encoding unit (second encoding unit) 107, and a multiplexing unit 108.

空間デシメーション部104は、オリジナルの映像信号を入力として受け付け、入力された信号を所望の空間解像度に空間デシメーションする機能（解像度を低くする機能）を有する。ここで、空間デシメーションの方法はいくつか考えられるが、ラプラシアンピラミッドと同様の関係を利用するために後述する高解像度推定信号生成部106で扱うフィルタに対応した方法を用いることが望ましい。そして、任意縮小率にも対応していることが望ましい。また、空間デシメーション部104は、所望の空間解像度に空間解像度デシメーションされた信号をベースレイヤエンコード部105に出力する機能を有する。 The spatial decimation unit 104 has a function of receiving an original video signal as an input and spatially decimating the input signal to a desired spatial resolution (a function of reducing the resolution). Here, several spatial decimation methods can be considered, but in order to use the same relationship as the Laplacian pyramid, it is desirable to use a method corresponding to a filter handled by the high-resolution estimated signal generation unit 106 described later. It is also desirable to support an arbitrary reduction ratio. In addition, the spatial decimation unit 104 has a function of outputting a signal subjected to spatial resolution decimation to a desired spatial resolution to the base layer encoding unit 105.

ベースレイヤエンコード部105は、空間デシメーション部104の出力を入力として受け付け、入力された信号を符号化してビットストリームを生成し、多重化部108へ出力する機能を有する。ここで、エンコードの方法は、いくつか考えられるが、例えば、MPEG-2やH.264などのクローズドループのエンコーダなどが用いられる。時間方向のスケーラビリティやSN比スケーラビリティなどの機能を含んでいても良い。オープンループのエンコーダを用いた場合、そのエンコーダにはデコード(リコンストラクト)機能を含むものとする。また、ベースレイヤエンコード部105内においてデコード（復号）をおこなった信号及び符号化に用いた量子化パラメータを空間インターポレーション（空間的拡大部）機能を有する高解像度推定信号生成部106へ出力する機能を有する。なお、ベースレイヤエンコード部105の詳細な構成図は示していないが、例えば、クローズドループのエンコーダにより構成した場合、図４に詳細に構成を示すエンハンスメントレイヤエンコード部107とほぼ同様に構成されるが、ベースレイヤエンコード部105のフレームメモリには空間デシメーション部104にて入力映像信号をデシメーションしたベースレイヤの映像信号が入力される一方、ベースレイヤエンコード部105の予測信号選択部には高解像度推定信号生成部からの高解像度推定信号が入力せず、ベースレイヤエンコード部105の予測信号選択部は、イントラ予測部からの予測信号と動き補償部からの予測信号とのいずれか一方を選択することになる。 The base layer encoding unit 105 has a function of receiving the output of the spatial decimation unit 104 as an input, encoding the input signal to generate a bit stream, and outputting the bit stream to the multiplexing unit 108. Here, several encoding methods can be considered. For example, a closed loop encoder such as MPEG-2 or H.264 is used. Functions such as scalability in the time direction and S / N ratio scalability may be included. When an open loop encoder is used, the encoder includes a decoding (reconstruction) function. Also, the signal decoded in the base layer encoding unit 105 and the quantization parameter used for encoding are output to the high-resolution estimated signal generation unit 106 having a spatial interpolation (spatial expansion unit) function. It has a function. Although a detailed configuration diagram of the base layer encoding unit 105 is not shown, for example, when it is configured by a closed loop encoder, it is configured in substantially the same manner as the enhancement layer encoding unit 107 whose configuration is shown in detail in FIG. The base layer video signal decimated by the spatial decimation unit 104 is input to the frame memory of the base layer encoding unit 105, while the high resolution estimation signal is input to the prediction signal selection unit of the base layer encoding unit 105. The high resolution estimation signal from the generation unit is not input, and the prediction signal selection unit of the base layer encoding unit 105 selects either the prediction signal from the intra prediction unit or the prediction signal from the motion compensation unit. Become.

高解像度推定信号生成部106は、ベースレイヤエンコード部105から出力されるデコード信号及び量子化パラメータを入力として受け付け、ベースレイヤのデコード信号からオリジナルの解像度の映像信号を推定する機能を有する。詳細については後述する。また、ベースレイヤのデコード信号からオリジナルの高解像度映像信号を推定した信号をエンハンスメントレイヤエンコード部107へ出力する機能を有する。 The high-resolution estimated signal generation unit 106 has a function of receiving the decoded signal and the quantization parameter output from the base layer encoding unit 105 as inputs, and estimating an original resolution video signal from the base layer decoded signal. Details will be described later. Further, it has a function of outputting a signal obtained by estimating the original high-resolution video signal from the base layer decoded signal to the enhancement layer encoding unit 107.

エンハンスメントレイヤエンコード部107は、オリジナルの映像信号と高解像度推定信号生成部106より出力される信号を入力として受け付ける機能を有する。入力されるそれぞれの信号を用いて、空間解像度間および時間の相関を利用した予測をおこない、それに伴って生じる予測誤差信号を符号化する機能を有する。詳細については後述する。また、符号化されて生成されるビットストリームを多重化部108に出力する機能を有する。 The enhancement layer encoding unit 107 has a function of receiving an original video signal and a signal output from the high resolution estimation signal generation unit 106 as inputs. Each input signal is used to perform prediction using correlation between spatial resolutions and time, and has a function of encoding a prediction error signal generated in association with the prediction. Details will be described later. Further, it has a function of outputting the bit stream generated by encoding to the multiplexing unit 108.

多重化部108は、ベースレイヤエンコード部105およびエンハンスメントレイヤエンコード部107より出力されるベースレイヤおよびエンハンスメントレイヤのビットストリームを入力として受け付け多重化し、例えば、後述する図11に示すような構造のひとつの多重化ビットストリームを生成し、映像信号階層符号化装置101の外部、例えば通信回線やメディアなどのネットワーク102へ出力する機能を有する。 The multiplexing unit 108 receives and multiplexes the base layer and enhancement layer bitstreams output from the base layer encoding unit 105 and the enhancement layer encoding unit 107 as inputs, for example, one of the structures as shown in FIG. It has a function of generating a multiplexed bit stream and outputting it to the outside of the video signal hierarchical coding apparatus 101, for example, the network 102 such as a communication line or media.

映像信号階層復号化装置103は、エクストラクト部（分離手段）109、ベースレイヤデコード部（第１の復号化手段）110、高解像度推定信号復元部（復元手段）111およびエンハンスメントレイヤデコード部（第２の復号化手段）112を少なくとも有している。 The video signal hierarchical decoding apparatus 103 includes an extract unit (separating unit) 109, a base layer decoding unit (first decoding unit) 110, a high resolution estimated signal restoring unit (restoring unit) 111, and an enhancement layer decoding unit (first decoding unit). 2 decoding means) 112 at least.

エクストラクト部109は、映像信号階層符号化装置101等に階層符号化され多重化された例えば、後述する図11に示すような構造の多重化ビットストリームを入力として受け付ける機能を有する。映像信号階層復号化装置103またはディスプレイ等の性能にあわせて、ビットストリーム全体から復号に必要なものを切り出し、分割してそれぞれをベースレイヤデコード部110、高解像度推定信号復元部111及びエンハンスメントレイヤデコード部112に出力する機能を有する。 The extractor 109 has a function of receiving, for example, a multiplexed bit stream having a structure as shown in FIG. 11 described later, which has been hierarchically encoded and multiplexed by the video signal hierarchical encoding device 101 or the like. In accordance with the performance of the video signal hierarchical decoding device 103 or the display, etc., what is necessary for decoding is cut out from the entire bit stream, divided and divided into the base layer decoding unit 110, the high resolution estimated signal restoration unit 111, and the enhancement layer decoding. A function of outputting to the unit 112.

ベースレイヤデコード部110は、エクストラクト部109で切り出されたベースレイヤのビットストリームを入力として受け付ける機能を有する。入力されたビットストリームを復号し、デコード映像信号を高解像度推定信号復元部111と必要に応じてディスプレイ等への出力を行う機能を有する。また、復号に用いた量子化パラメータを高解像度推定信号復元部111へ出力する機能を有する。ここで、復号には、例えばMPEG-2やH.264などを用いる。また、時間方向のスケーラビリティやSN比スケーラビリティなどの機能を含んでいても良い。 The base layer decoding unit 110 has a function of accepting the base layer bit stream extracted by the extract unit 109 as an input. It has a function of decoding the input bit stream and outputting the decoded video signal to the high-resolution estimated signal restoration unit 111 and, if necessary, a display. In addition, it has a function of outputting the quantization parameter used for decoding to the high-resolution estimated signal restoration unit 111. Here, for decoding, for example, MPEG-2, H.264, or the like is used. Also, it may include functions such as time direction scalability and SN ratio scalability.

高解像度推定信号復元部111は、ベースレイヤデコード部110から出力されるベースレイヤデコード信号及び量子化パラメータを入力として受け付ける機能を有する。また、量子化パラメータを用いて、ベースレイヤデコード信号から高解像度推定信号を復元し、その信号をエンハンスメントレイヤデコード部112へ出力する機能を有する。詳細については後述する。 The high-resolution estimated signal restoration unit 111 has a function of receiving the base layer decoded signal and the quantization parameter output from the base layer decoding unit 110 as inputs. In addition, the high-resolution estimation signal is restored from the base layer decoded signal using the quantization parameter, and the signal is output to the enhancement layer decoding unit 112. Details will be described later.

エンハンスメントレイヤデコード部112は、エクストラクト部109から得られるビットストリーム及び高解像度推定信号復元部111から出力される高解像度推定信号を入力として受け付ける機能を有する。ビットストリームを復号し、そこで得られる信号と、高解像度推定信号を用いて、オリジナル映像信号の空間解像度の信号を復号する機能を有する。復号された映像信号は、ディスプレイ等へ出力される。 The enhancement layer decoding unit 112 has a function of receiving the bit stream obtained from the extract unit 109 and the high resolution estimation signal output from the high resolution estimation signal restoration unit 111 as inputs. It has a function of decoding a bit stream and decoding a spatial resolution signal of the original video signal using a signal obtained there and a high resolution estimation signal. The decoded video signal is output to a display or the like.

図1に示した映像信号階層符号化装置101の構成例を用いて映像信号を空間スケーラブル符号化する手順を図2に示す。 FIG. 2 shows a procedure for spatially encoding a video signal using the configuration example of the video signal hierarchical encoding device 101 shown in FIG.

オリジナルの映像信号を、まず、空間デシメーション部104において空間解像度のデシメーションを行う[ステップS201]。空間解像度をデシメーションした信号を、ベースレイヤエンコード部105を用いて符号化し、ベースレイヤのビットストリームを生成する[ステップS202]。生成されたビットストリームを多重化部108へ送り、ベースレイヤのデコード信号及び量子化パラメータを高解像度推定信号生成部106へ送る。高解像度推定信号生成部106は、ベースレイヤのデコード信号及び量子化パラメータを用いて高解像度映像信号を推定する[ステップS203]。詳細については後述する。そして、ここで生成した高解像度推定信号をエンハンスメントレイヤエンコード部107へ送る。エンハンスメントレイヤエンコード部107では、オリジナルの映像信号と高解像度推定信号生成部106からの高解像度推定信号とを用いて空間解像度間および時間の相関を利用した予測を行い、それに伴って生じる予測誤差信号を符号化する[ステップS204]。そして、符号化により生成されたエンハンスメントレイヤのビットストリームを、多重化部108へ送る。多重化部108は、ベースレイヤエンコード部105及びエンハンスメントレイヤエンコード部107より得られたそれぞれのレイヤのビットストリームを多重化しひとつのビットストリームを生成する[ステップS205]。 First, spatial resolution decimation is performed on the original video signal in the spatial decimation unit 104 [step S201]. The signal with the spatial resolution decimated is encoded using the base layer encoding unit 105 to generate a base layer bit stream [step S202]. The generated bit stream is sent to multiplexing section 108, and the base layer decoded signal and quantization parameter are sent to high resolution estimation signal generating section 106. The high resolution estimation signal generation unit 106 estimates a high resolution video signal using the base layer decode signal and the quantization parameter [step S203]. Details will be described later. Then, the high-resolution estimation signal generated here is sent to the enhancement layer encoding unit 107. The enhancement layer encoding unit 107 uses the original video signal and the high resolution estimation signal from the high resolution estimation signal generation unit 106 to perform prediction using the correlation between the spatial resolutions and the time, and a prediction error signal generated accordingly Is encoded [step S204]. Then, the enhancement layer bitstream generated by encoding is sent to multiplexing section 108. The multiplexing unit 108 multiplexes the bitstreams of the respective layers obtained from the base layer encoding unit 105 and the enhancement layer encoding unit 107 to generate one bitstream [Step S205].

図1に示した映像信号階層復号化装置103の構成例を用いて空間スケーラブル構成のビットストリームを復号してデコード映像信号を得る手順を図3に示す。 FIG. 3 shows a procedure for obtaining a decoded video signal by decoding a spatially scalable bit stream using the configuration example of the video signal hierarchical decoding apparatus 103 shown in FIG.

通信回線やメディア等を含むネットワーク102からビットストリームをエクストラクト部109を用いて受信する。ビットストリームを解析し、映像信号階層復号化装置103およびディスプレイ等の性能に合わせて必要な符号データを抽出する。そして、ベースレイヤデコード部110及びエンハンスメントレイヤデコード部112それぞれのレイヤに対応したデータに分割して出力する[ステップS301]。 A bit stream is received from the network 102 including a communication line and media using the extract unit 109. The bit stream is analyzed, and necessary code data is extracted in accordance with the performance of the video signal hierarchical decoding device 103 and the display. Then, the data is divided into data corresponding to the respective layers of the base layer decoding unit 110 and the enhancement layer decoding unit 112 and output [step S301].

エクストラクト部109で分割したベースレイヤに対応するデータをベースレイヤデコード部110で復号する[ステップS302]。復号したベースレイヤデコード映像信号及び量子化パラメータを高解像度推定信号復元部111に出力し、必要があればベースレイヤデコード映像信号をディスプレイ等にも出力する。高解像度推定信号復元部111では、ベースレイヤデコード部110より得られるベースレイヤのデコード映像信号と量子化パラメータを用いて高解像度推定信号を復元する[ステップS303]。そして、復元した高解像度推定信号をエンハンスメントレイヤデコード部112に送る。エンハンスメントレイヤデコード部112では、エクストラクト部109から得られるエンハンスメントレイヤに対応するデータを復号し、そこで得られる信号と高解像度推定信号復元部111からの高解像度推定信号とを用いてオリジナルの映像信号の解像度の再生映像をデコードする[ステップS304]。そして、復号したデコード映像信号をディスプレイ等へ出力する。 Data corresponding to the base layer divided by the extractor 109 is decoded by the base layer decoder 110 [step S302]. The decoded base layer decoded video signal and quantization parameter are output to the high resolution estimated signal restoration unit 111, and if necessary, the base layer decoded video signal is also output to a display or the like. The high resolution estimation signal restoration unit 111 restores the high resolution estimation signal using the base layer decoded video signal obtained from the base layer decoding unit 110 and the quantization parameter [step S303]. Then, the restored high resolution estimation signal is sent to the enhancement layer decoding unit 112. The enhancement layer decoding unit 112 decodes the data corresponding to the enhancement layer obtained from the extract unit 109, and uses the signal obtained there and the high resolution estimation signal from the high resolution estimation signal restoration unit 111 to generate the original video signal. The playback video having the resolution of [3] is decoded [step S304]. Then, the decoded decoded video signal is output to a display or the like.

高解像度推定信号生成部106及びエンハンスメントレイヤエンコード部107の詳細な構成例を示したものが、図4である。 FIG. 4 shows a detailed configuration example of the high-resolution estimated signal generation unit 106 and the enhancement layer encoding unit 107.

高解像度推定信号生成部106は、第1のハイパスフィルタリング部403、第1のインターポレーション部404、振幅制限・定数倍処理部405、第2のハイパスフィルタリング部406、第2のインターポレーション部407、加算器408、推定度判断部409を少なくとも有している。 The high-resolution estimated signal generation unit 106 includes a first high-pass filtering unit 403, a first interpolation unit 404, an amplitude limiting / constant multiplication processing unit 405, a second high-pass filtering unit 406, and a second interpolation unit. 407, an adder 408, and an estimation degree determination unit 409.

第1のハイパスフィルタリング部403は、ベースレイヤのデコード信号を入力として受け付け、入力信号から高周波数成分としてラプラシアン成分を抽出する機能を有する。高周波数成分は次の式(1)、(2)によって求める。なお、下記に説明する式(1)〜(8)を利用した高周波数成分推定を伴う画像拡大処理は、例えば、「高周波成分推定を伴う任意倍率可能な画像拡大法」（信学論(A), vol. J84-A, no. 9, pp1192-1201, Sep. 2001.；高橋靖正, 田口亮著）を参考にしている。 The first high-pass filtering unit 403 has a function of receiving a base layer decoded signal as an input and extracting a Laplacian component as a high frequency component from the input signal. The high frequency component is obtained by the following equations (1) and (2). Note that the image enlargement process with high frequency component estimation using the equations (1) to (8) described below is, for example, “an image magnification method with high frequency component estimation” (Science theory (A ), vol. J84-A, no. 9, pp1192-1201, Sep. 2001. by Takamasa Takamasa and Taguchi Ryo).

入力信号のラプラシアン成分の抽出は次のように行う。ここで、説明を簡単にするために、1次元の信号モデルを例にして、入力信号をG₀(x)、入力信号から抽出されるラプラシアン成分をL₀(x)とする。

ここで、ρは、ガウシアンフィルタの帯域を調整するためのパラメータである。 The Laplacian component of the input signal is extracted as follows. Here, for the sake of simplicity, taking a one-dimensional signal model as an example, the input signal is G ₀ (x), and the Laplacian component extracted from the input signal is L ₀ (x).

Here, ρ is a parameter for adjusting the band of the Gaussian filter.

つまり、式(1)、(2)では、入力信号からガウシアン関数を用いて高周波数成分としてラプラシアン成分の信号を抽出しているが、これを他の方法に置き換えても良い。ただし、ここで用いるフィルタや補間関数等と、空間デシメーション部104、第1のインターポレーション部404、第2のハイパスフィルタリング部406及び第2のインターポレーション部407に用いるフィルタや補間関数等の関係は、ピラミッド構成を満たすものとなっていることが望ましい。例えば、空間デシメーション部にsinc関数を用いた場合、第1のインターポレーション部404、第2のハイパスフィルタリング部406及び第2のインターポレーション部407にもsinc関数を用いることでsinc関数によるピラミッド構成の関係が構築できる。また、第1のハイパスフィルタリング部403は、ここで得た高周波数成分を第1のインターポレーション部404へ出力する。 That is, in Equations (1) and (2), a Laplacian component signal is extracted from the input signal as a high frequency component using a Gaussian function, but this may be replaced with another method. However, the filters and interpolation functions used here, and the filters and interpolation functions used for the spatial decimation unit 104, the first interpolation unit 404, the second high-pass filtering unit 406, and the second interpolation unit 407, etc. It is desirable that the relationship satisfies the pyramid configuration. For example, when the sinc function is used for the spatial decimation unit, the sinc function is also used for the first interpolation unit 404, the second high-pass filtering unit 406, and the second interpolation unit 407, so that the pyramid based on the sinc function is used. A configuration relationship can be established. Also, the first high-pass filtering unit 403 outputs the high frequency component obtained here to the first interpolation unit 404.

第1のインターポレーション部404は、第1のハイパスフィルタリング部403より出力される高周波数成分であるラプラシアン成分の信号を入力として受け付け、その信号をエンハンスメントレイヤに入力されるオリジナルの映像信号の解像度となるように、空間デシメーション部１０４における縮小率の逆数となる拡大率r、すなわち（エンハンスメントレイヤの解像度／ベースレイヤの解像度）にインターポレーションを行う。インターポレーションは、次の式(3)、(4)、(5)で実現可能である。 The first interpolation unit 404 receives the Laplacian component signal, which is a high-frequency component output from the first high-pass filtering unit 403, as an input, and the resolution of the original video signal input to the enhancement layer Thus, the interpolation is performed to the enlargement ratio r that is the reciprocal of the reduction ratio in the spatial decimation unit 104, that is, (enhancement layer resolution / base layer resolution). Interpolation can be realized by the following equations (3), (4), and (5).

つまり、拡大率rにインターポレーションされた信号(EXPAND)_rL₀(x)は、入力ラプラシアン成分信号をL₀(x)とすると、 That is, interpolation signal to enlargement ratio _{_{r (EXPAND) r L 0 (}} x) , when the input Laplacian component signal L ₀ and (x),

で与えられる。ここでint(・)は整数部分を取り出す操作を示す。

Given in. Here, int (·) indicates an operation for extracting the integer part.

なお、ここでも、インターポレーションの方法(用いるフィルタ係数や補間関数など)は、式(3)、(4)、(5)以外のものを用いても良い。 Here again, interpolation methods (filter coefficients, interpolation functions, etc.) may be used other than equations (3), (4), and (5).

そして、第1のインターポレーション部404は、インターポレーションした信号を振幅制限・定数倍処理部405へ出力する。 Then, the first interpolation unit 404 outputs the interpolated signal to the amplitude limit / constant multiplication processing unit 405.

振幅制限・定数倍処理部405は、パラメータ及び第1のインターポレーション部404より出力される信号入力として受け付け、未知の高周波数成分を推定するための第1工程を実施する。未知の高周波数成分を推定するための第1工程は式(6)で与えられる。 The amplitude limiting / constant multiplication processing unit 405 receives a parameter and a signal input output from the first interpolation unit 404, and performs a first step for estimating an unknown high frequency component. The first step for estimating the unknown high frequency component is given by equation (6).

つまり、入力される信号に対して、振幅制限と定数倍処理を行うことで実現される。生成される信号Ｌ_rバー(x)は、入力される信号を(EXPAND)_rL₀(x)とすると、

で与えられる。 That is, it is realized by performing amplitude limitation and constant multiplication processing on the input signal. The generated signal L _r bar (x) is _expressed as (EXPAND) _r L ₀ (x).

Given in.

ここで、振幅制限のためのパラメータT及び定数倍処理のためのパラメータα_rは、拡大率rだけではなくベースレイヤの量子化の程度にも推定精度が影響を受けるため、本実施の形態では、適切なパラメータT，α_rが得られるように、そのパラメータT，α_rの決定を行う推定度判定部409に接続されている。 Here, the estimation accuracy of the parameter T for amplitude limitation and the parameter α _r for constant multiplication processing is affected not only by the enlargement ratio r but also by the degree of quantization of the base layer. In order to obtain appropriate parameters T and α _r , an estimation degree determination unit 409 that determines the parameters T and α _r is connected.

そのため、本実施の形態１の振幅制限・定数倍処理部405は、推定度判定部409より出力されるパラメータを用いて未知の高周波数成分を推定するための第1工程を実施する。また、振幅制限・定数倍処理部405は、振幅制限・定数倍処理した信号を第2のハイパスフィルタリング部406へ出力する。 Therefore, the amplitude limiting / constant multiplication processing unit 405 of the first embodiment performs a first step for estimating an unknown high frequency component using the parameter output from the estimation degree determining unit 409. In addition, the amplitude limit / constant multiplication processing unit 405 outputs the signal subjected to the amplitude limit / constant multiplication processing to the second high-pass filtering unit 406.

第2のハイパスフィルタリング部406は、振幅制限・定数倍処理部405より出力される信号を入力として受け付け、未知の高周波数成分を推定するための第2工程を実施する。未知の高周波数成分を推定するための第2工程は、次の式(7)で与えられる。 The second high-pass filtering unit 406 receives the signal output from the amplitude limiting / constant multiplication processing unit 405 as an input, and performs a second step for estimating an unknown high-frequency component. The second step for estimating the unknown high frequency component is given by the following equation (7).

つまり、未知の高周波数成分を推定するための第2工程は、振幅制限・定数倍処理部405によって振幅制限・定数倍処理された信号から低域成分を取り除き、本来求めようとしている高周波数成分のみを得るものである。これは、入力される信号に対してハイパスフィルタリングを行うことで実現される。ハイパスフィルタリングされた信号、すなわち、推定された未知の高周波数成分Ｌ_rハット(x)は、入力される信号をＬ_rバー(x)とすると、

で与えられる。ここで、W(i)は式(2)に示したものである。 In other words, the second step for estimating the unknown high frequency component is to remove the low frequency component from the signal subjected to the amplitude limiting / constant multiplication processing by the amplitude limiting / constant multiplication processing unit 405, and to originally obtain the high frequency component. Only get. This is realized by performing high-pass filtering on the input signal. The high-pass filtered signal, that is, the estimated unknown high-frequency component L _r hat (x) is defined as L _r bar (x).

Given in. Here, W (i) is shown in Equation (2).

なお、ここでも、高周波数成分の抽出方法は式(7)以外のものを用いても良い。また、第2のハイパスフィルタリング部406は、推定された高周波数成分を加算器408へ出力する。 In this case as well, a method other than Equation (7) may be used as the high frequency component extraction method. Second high-pass filtering section 406 outputs the estimated high-frequency component to adder 408.

第2のインターポレーション部407は、ベースレイヤのデコード信号を入力として受け付け、その信号をエンハンスメントレイヤに入力されるオリジナルの映像信号の解像度となるように、拡大率r（エンハンスメントレイヤの解像度／ベースレイヤの解像度）にてインターポレーションを行う。インターポレーションは、次の式(8)で実現可能である。 The second interpolation unit 407 accepts the base layer decoded signal as an input, and the enlargement factor r (enhancement layer resolution / base so that the signal becomes the resolution of the original video signal input to the enhancement layer. Interpolation at layer resolution). Interpolation can be realized by the following equation (8).

つまり、拡大率rにインターポレーションされた信号(EXPAND)_rG₀(x)は、入力信号をG₀(x)とすると、 In other words, the signal (EXPAND) _r G ₀ (x) interpolated to the enlargement ratio r is G ₀ (x).

で与えられる。ここで、W_r(i)は式(4)と式(5)で示したものである。なお、ここでも、インターポレーションの方法(用いるフィルタ係数や補間関数など)は、式(8)以外のものを用いても良い。

Given in. Here, W _r (i) is represented by Expression (4) and Expression (5). In this case as well, interpolation methods (such as filter coefficients and interpolation functions to be used) other than Equation (8) may be used.

また、第2のインターポレーション部907は、インターポレーションした信号を加算器408へ出力する。 Second interpolation section 907 outputs the interpolated signal to adder 408.

加算器408は、第2のハイパスフィルタリング部406より出力される信号と、第2のインターポレーション部407より出力される信号を入力として受け付け、それぞれの信号を足し合わせて出力する。 The adder 408 receives as input the signal output from the second high-pass filtering unit 406 and the signal output from the second interpolation unit 407, and outputs the sum of the signals.

推定度判断部409は、ベースレイヤエンコード部105から出力されるベースレイヤにおける量子化ステップまたは量子化の幅を制御する量子化パラメータを入力として受け付ける。そして、入力された量子化パラメータから適切な高周波数成分推定のためのパラメータα_rとTを決定する。前述のように、本発明による高周波数成分の推定は、ベースレイヤエンコード部105における量子化の程度によってその精度が異なる。つまり、量子化パラメータが大きくなると、それに伴ってベースレイヤデコード信号の劣化が大きくなるため、高周波数成分の推定精度が悪くなり、かえって符号化効率の低下を招くことになる。そこで、量子化パラメータと、推定のためのパラメータ、すなわち定数倍処理のためのパラメータα_rおよび振幅制限のためのパラメータTとの適切な関係をあらかじめ推定度判定部409に与えておき、これをもとにして、入力された量子化パラメータを適切な推定のためのパラメータα_rとTに変換する。 The estimation degree determination unit 409 receives, as an input, a quantization parameter for controlling a quantization step or a quantization width in the base layer output from the base layer encoding unit 105. Then, parameters α _r and T for appropriate high frequency component estimation are determined from the input quantization parameter. As described above, the accuracy of high frequency component estimation according to the present invention varies depending on the degree of quantization in the base layer encoding unit 105. That is, when the quantization parameter is increased, the degradation of the base layer decoded signal is increased accordingly, so that the estimation accuracy of the high frequency component is deteriorated and the encoding efficiency is lowered. Therefore, an appropriate relationship between the quantization parameter and the parameter for estimation, that is, the parameter α _r for constant multiplication processing and the parameter T for amplitude limitation is given to the estimation degree determination unit 409 in advance, Based on the input, the input quantization parameter is converted into parameters α _r and T for appropriate estimation.

例えば、本実施の形態１では、ベースレイヤエンコード部105における量子化パラメータと、定数倍処理のためのパラメータα_rと振幅制限のためのパラメータTの関係を図10（ａ）〜（ｄ）に示すように定義しておく。図10（ａ）〜（ｄ）に示すように、基本的にはベースレイヤエンコード部105における量子化パラメータが大きくなるにつれ、パラメータα_rとTの値を小さくなるようにする。パラメータα_rとTの値を小さくすることにより、振幅制限・定数倍処理部405において、符号化劣化を含んだ高周波数成分信号が増幅されるのを防ぎ、符号化劣化による誤推定が引き起こす符号化効率低下を防ぐことができる。量子化パラメータと、パラメータα_rとTの関係は、図10（ａ）〜（ｄ）の例のように、図10（ａ），（ｂ）に示すように2次曲線であったり、図10（ｃ）に示すように線形であったり、または、図10（ｄ）に示すようにステップ的な関係でもよく、要は、量子化パラメータが大きくなるにつれパラメータα_rとTの値が小さなっていれば良い。なお、パラメータTは振幅制限のための閾値のパラメータであるため、量子化パラメータによらず一定値にしても良い。 For example, in the first embodiment, the relationship between the quantization parameter in the base layer encoding unit 105, the parameter α _r for constant multiplication processing, and the parameter T for amplitude limitation is shown in FIGS. Define as shown. As shown in FIGS. 10A to 10D, basically, the values of the parameters α _r and T are made smaller as the quantization parameter in the base layer encoding unit 105 becomes larger. By reducing the values of the parameters α _r and T, the amplitude limiting / constant multiplication processing unit 405 prevents the high frequency component signal including the coding degradation from being amplified, and the code that causes erroneous estimation due to the coding degradation. A reduction in conversion efficiency can be prevented. The relationship between the quantization parameter and the parameters α _r and T is a quadratic curve as shown in FIGS. 10 (a) and 10 (b), as shown in FIGS. It may be linear as shown in FIG. 10 (c), or may have a step-like relationship as shown in FIG. 10 (d). In short, the values of the parameters α _r and T become smaller as the quantization parameter becomes larger. It only has to be. Since the parameter T is a threshold parameter for limiting the amplitude, the parameter T may be a constant value regardless of the quantization parameter.

以上のようにして、推定度判断部409は、ベースレイヤエンコード部105からの量子化パラメータに基づいて大きさを変えた定数倍処理のためのパラメータα_rと振幅制限のためのパラメータTとを振幅制限・定数倍処理部405へ出力する。なお、推定度判断部409は、映像信号階層符号化装置101と映像信号階層復号化装置103とで同一の量子化パラメータとパラメータα_rとTの関係を保つため、映像信号階層符号化装置101と映像信号階層復号化装置103との間で図10（ａ）〜（ｄ）に示すどの対応関係を使用するか決めておくか、あるいは図10（ａ）〜（ｄ）に示す対応関係自体や、さらには図10（ａ）〜（ｄ）に示す複数の対応関係を両装置が記憶していてどの対応関係を使用したかを示す情報をベースレイヤの量子化パラメータと共に映像信号階層符号化装置101から映像信号階層復号化装置103へ送信するようにしても良い。 As described above, the estimation degree determination unit 409 obtains the parameter α _r for constant multiplication processing and the parameter T for amplitude limitation that have been changed in magnitude based on the quantization parameter from the base layer encoding unit 105. Output to the amplitude limit / constant multiplication processing unit 405. Note that the estimation degree determination unit 409 maintains the same relationship between the quantization parameter and the parameters α _r and T in the video signal hierarchical encoding device 101 and the video signal hierarchical decoding device 103, so that the video signal hierarchical encoding device 101 10A to 10D, which correspondence relationship shown in FIGS. 10A to 10D is to be used, or the correspondence relationship itself shown in FIGS. 10A to 10D. Furthermore, the video signal hierarchical coding is performed together with the base layer quantization parameter and information indicating which correspondence is used by both apparatuses storing the plural correspondences shown in FIGS. 10 (a) to 10 (d). The image may be transmitted from the device 101 to the video signal hierarchy decoding device 103.

一方、エンハンスメントレイヤエンコード部107は、フレームメモリ1・411、フレームメモリ2・412、動き推定部413、動き補償部414、イントラ予測部415、予測信号選択部416、予測誤差信号生成部417、直交変換・量子化部418、エントロピー符号化部419、逆量子化・逆直交変換部420、加算器421及びデブロッキングフィルタ部422を少なくとも有している。この構成例は、H.264エンコーダの一部を変更したものであり、各部分は従来技術でほぼ実現可能であり、この点ではベースレイヤエンコード部105も図示しないが同様である。 On the other hand, the enhancement layer encoding unit 107 includes a frame memory 1/411, a frame memory 2/412, a motion estimation unit 413, a motion compensation unit 414, an intra prediction unit 415, a prediction signal selection unit 416, a prediction error signal generation unit 417, an orthogonal It has at least a transform / quantization unit 418, an entropy coding unit 419, an inverse quantization / inverse orthogonal transform unit 420, an adder 421, and a deblocking filter unit 422. This configuration example is obtained by changing a part of the H.264 encoder, and each part can be substantially realized by the conventional technique. In this respect, the base layer encoding unit 105 is not shown, but is the same.

フレームメモリ1・411は、オリジナルの映像信号を入力として受け付け、信号を格納できる機能を有する。また、格納した信号を予測信号生成部417、動き推定部413へ、エンハンスメントレイヤエンコード部107と高解像度推定信号生成部106の処理の同期が取れるように対応するフレームの信号を出力する。 The frame memories 1 and 411 have a function of receiving an original video signal as an input and storing the signal. In addition, the stored signal is output to the prediction signal generation unit 417 and the motion estimation unit 413 as a signal of a corresponding frame so that the processing of the enhancement layer encoding unit 107 and the high resolution estimation signal generation unit 106 can be synchronized.

フレームメモリ2・412は、デブロッキングフィルタ部422より出力される信号を入力として受け付け、格納する機能を有する。そして、動き推定に必要なフレームの信号を動き推定部413へ、動き補償に必要なフレームの信号を動き補償部414へ出力する。 The frame memories 2 and 412 have a function of receiving and storing a signal output from the deblocking filter unit 422 as an input. Then, a frame signal necessary for motion estimation is output to the motion estimation unit 413, and a frame signal necessary for motion compensation is output to the motion compensation unit 414.

動き推定部413は、フレームメモリ1・411及びフレームメモリ2・412より出力される信号を入力として受け付け、例えばH.264のような動き推定を行う。動き推定によって得られた動き情報を動き補償部414及びエントロピー符号化部419へ出力する。 The motion estimation unit 413 receives signals output from the frame memories 1 and 411 and the frame memories 2 and 412 as inputs, and performs motion estimation such as H.264. The motion information obtained by the motion estimation is output to the motion compensation unit 414 and the entropy coding unit 419.

動き補償部414は、フレームメモリ2・412より出力される信号及び動き情報を入力として受け付け、例えばH.264のような動き補償を行う。また、動き補償によって得られた信号を予測信号選択部416へ出力する。 The motion compensation unit 414 receives as input signals and motion information output from the frame memories 2 and 412 and performs motion compensation such as H.264. In addition, a signal obtained by motion compensation is output to the prediction signal selection unit 416.

イントラ予測部415は、加算器421より出力される信号を入力として受け付け、例えばH.264のようなイントラ予測を行う。また、イントラ予測して得られた信号を予測信号選択部416へ出力する。 The intra prediction unit 415 receives the signal output from the adder 421 as input, and performs intra prediction such as H.264. In addition, a signal obtained by intra prediction is output to prediction signal selection section 416.

予測信号選択部416は、動き補償部414、イントラ予測部415よりそれぞれから出力される信号及び高解像度推定信号を受け付け、入力される信号のうち、いずれかひとつを選択する、または、それぞれの信号に重みを与えて合成する。信号の選択、合成の判断基準は任意である。例えば、符号化効率を重視する場合は、予測誤差信号の二乗平均が小さくなるように、信号を選択、合成する。また、予測信号選択部416は、選択または合成した信号を予測誤差信号生成部417及び加算器421へ出力する。 The prediction signal selection unit 416 receives a signal and a high resolution estimation signal output from the motion compensation unit 414 and the intra prediction unit 415, and selects any one of the input signals or each signal. Is given a weight. The criteria for selecting and combining signals are arbitrary. For example, when importance is placed on coding efficiency, signals are selected and synthesized so that the mean square of the prediction error signal becomes small. Further, the prediction signal selection unit 416 outputs the selected or synthesized signal to the prediction error signal generation unit 417 and the adder 421.

予測誤差信号生成部417は、フレームメモリ1・411より出力される信号及び予測信号選択部416より出力される予測信号を入力として受け付ける機能を有する。また、フレームメモリ1・411より出力される信号から予測信号を差し引いて予測誤差信号を生成し、それを直交変換・量子化部418へ出力する。 The prediction error signal generation unit 417 has a function of receiving a signal output from the frame memories 1 and 411 and a prediction signal output from the prediction signal selection unit 416 as inputs. Also, a prediction error signal is generated by subtracting the prediction signal from the signals output from the frame memories 1 and 411, and the prediction error signal is output to the orthogonal transform / quantization unit 418.

直交変換・量子化部418は、予測誤差信号生成部417より出力される信号を入力として受け付け、その信号を直交変換及び量子化する。直交変換には、DCTやアダマール変換、ウェーブレットなどが用いられる。H.264のように、直交変換と量子化を合成した手段を採用しても良い。また、直交変換及び量子化した信号をエントロピー符号化部419及び逆量子化・逆直交変換部420へ出力する。また、直交変換・量子化部418は、量子化の際の量子化ステップまたは量子化幅を制御する量子化パラメータをエントロピー符号化部419へ出力する。 The orthogonal transform / quantization unit 418 receives the signal output from the prediction error signal generation unit 417 as an input, and performs orthogonal transform and quantization on the signal. For the orthogonal transform, DCT, Hadamard transform, wavelet, or the like is used. As in H.264, a method that combines orthogonal transformation and quantization may be employed. Further, the orthogonally transformed and quantized signal is output to the entropy coding unit 419 and the inverse quantization / inverse orthogonal transform unit 420. Further, the orthogonal transform / quantization unit 418 outputs a quantization parameter for controlling the quantization step or quantization width in the quantization to the entropy coding unit 419.

エントロピー符号化部419は、直交変換・量子化部418から出力されるエンハンスメントレイヤの符号化信号と、動き推定部913より出力される動き情報や、直交変換・量子化部418からのエンハンスメントレイヤの量子化パラメータ、予測信号選択部416がどの信号を予測信号として選択したかを示す予測信号選択情報等の符号化パラメータを入力として受け付け、それらをエントロピー符号化してエンハンスメントレイヤの符号化ビットストリームとして出力する。また、エントロピー符号化の結果生成されるビットストリームをエンハンスメントレイヤエンコード部107の外部へ出力する。なお、図示はしていないが、ベースレイヤエンコード部105のエントロピー符号化部でも、エンハンスメントレイヤエンコード部107のエントロピー符号化部419と同様に、ベースレイヤの符号化信号と、ベースレイヤの動き情報、ベースレイヤの量子化パラメータ、ベースレイヤの予測信号選択情報等の符号化パラメータを入力として受け付け、それらをエントロピー符号化してベースレイヤの符号化ビットストリームとして出力している。なお、エンハンスメントレイヤとベースレイヤとで符号化パラメータが等しい場合は、一方のレイヤの符号化パラメータは省略しても良い。 The entropy encoding unit 419 includes the enhancement layer encoded signal output from the orthogonal transform / quantization unit 418, the motion information output from the motion estimation unit 913, and the enhancement layer output from the orthogonal transform / quantization unit 418. Quantization parameters, encoding parameters such as prediction signal selection information indicating which signal is selected as a prediction signal by the prediction signal selection unit 416 are received as input, and are entropy encoded and output as an enhancement layer encoded bitstream To do. Also, the bit stream generated as a result of entropy coding is output to the outside of the enhancement layer encoding unit 107. Although not shown, in the entropy encoding unit of the base layer encoding unit 105, similarly to the entropy encoding unit 419 of the enhancement layer encoding unit 107, the base layer encoded signal, the base layer motion information, Coding parameters such as a base layer quantization parameter and base layer prediction signal selection information are accepted as input, and are entropy-coded and output as a base layer coded bitstream. When the enhancement layer and the base layer have the same encoding parameter, the encoding parameter of one layer may be omitted.

そして、多重化部108は、エンハンスメントレイヤエンコード部107からのエンハンスメントレイヤの符号化ビットストリームと、ベースレイヤエンコード部105からのベースレイヤの符号化ビットストリームとを多重化して多重化ビットストリームとして、ネットワーク102を介し映像信号階層復号化装置103へ出力する。 Then, the multiplexing unit 108 multiplexes the enhancement layer encoded bit stream from the enhancement layer encoding unit 107 and the base layer encoded bit stream from the base layer encoding unit 105 as a multiplexed bit stream, and The video signal is output to the video signal hierarchical decoding apparatus 103 via 102.

図11は、本実施の形態１による映像信号階層符号化装置、映像信号階層符号化方法から出力される多重化ビットストリームの構成例を示している。ベースレイヤエンコード部105及びエンハンスメントレイヤエンコード部107で生成された映像情報のビットストリームはそれぞれ、図11(a)中のベースレイヤビットストリームとエンハンスメントレイヤビットストリームに相当する。映像情報以外の符号化に必要なベースレイヤエンコード部105で用いたパラメータ、すなわちベースレイヤエンコードの際の動き情報や、量子化パラメータ、予測信号選択情報等はベースレイヤ用ヘッダ部に格納され、エンハンスメントレイヤエンコード部107で用いたパラメータ、すなわちエンハンスメントレイヤエンコードの際の動き情報や、量子化パラメータ、予測信号選択情報等は、エンハンスメントレイヤ用ヘッダ部に格納される。なお、それらをまとめて先頭のヘッダ部に格納しても良い。また、ビットストリームの構成順序は、図11(b)のようにしてもよい。先頭のヘッダ部にまとめてパラメータを格納した場合は、図11(c)のようにしてもよい。さらに、図示はしていないが、パラメータの格納場所については、ベースレイヤビットストリーム及びエンハンスメントレイヤビットストリーム内のフレーム(ピクチャ)ごとのヘッダ、スライスやマクロブロック等のヘッダに格納しても良い。なお、多重化ビットストリームの構成例は、後述する他の実施の形態２,３でも同様である。 FIG. 11 shows a configuration example of a multiplexed bit stream output from the video signal hierarchical encoding apparatus and the video signal hierarchical encoding method according to the first embodiment. The video information bit streams generated by the base layer encoding unit 105 and the enhancement layer encoding unit 107 correspond to the base layer bit stream and the enhancement layer bit stream in FIG. 11 (a), respectively. Parameters used in the base layer encoding unit 105 other than video information, that is, motion information at the time of base layer encoding, quantization parameters, prediction signal selection information, and the like are stored in the base layer header unit for enhancement. Parameters used in the layer encoding unit 107, that is, motion information at the time of enhancement layer encoding, quantization parameters, prediction signal selection information, and the like are stored in the enhancement layer header unit. They may be stored together in the header section at the top. Further, the bitstream configuration order may be as shown in FIG. If the parameters are stored together in the header section at the beginning, it may be as shown in FIG. Further, although not shown, the parameter storage location may be stored in a header for each frame (picture) in the base layer bit stream and the enhancement layer bit stream, a header such as a slice or a macro block. The configuration example of the multiplexed bit stream is the same in other embodiments 2 and 3 to be described later.

一方、逆量子化・逆直交変換部420は、直交変換・量子化された状態の信号を入力として受け付け、その信号を逆量子化・逆直交変換する。また、逆量子化・逆直交変換した信号を加算器421へ出力する。 On the other hand, the inverse quantization / inverse orthogonal transform unit 420 receives an orthogonally transformed / quantized signal as an input and performs inverse quantization / inverse orthogonal transform on the signal. In addition, a signal obtained by inverse quantization and inverse orthogonal transform is output to adder 421.

加算器421は、予測信号選択部416より出力される信号及び逆量子化・逆直交変換部420より出力される信号を入力として受け付け、2つの信号を合成する。また、合成した信号をイントラ予測部415及びデブロッキングフィルタ部422へ出力する。 The adder 421 receives the signal output from the prediction signal selection unit 416 and the signal output from the inverse quantization / inverse orthogonal transform unit 420 as inputs, and synthesizes the two signals. Also, the synthesized signal is output to the intra prediction unit 415 and the deblocking filter unit 422.

デブロッキングフィルタ部422は、加算器421より出力される信号を入力として受け付け、入力された信号に対してデブロッキングフィルタ処理を行う機能を有する。ここで、デブロッキングフィルタは、例えばH.264で用いられているものなどがある。また、デブロッキングフィルタ処理した信号をフレームメモリ2・412へ出力する。 The deblocking filter unit 422 has a function of receiving a signal output from the adder 421 as an input and performing a deblocking filter process on the input signal. Here, examples of the deblocking filter include those used in H.264. Further, the deblocking filtered signal is output to the frame memories 2 and 412.

図4に示した高解像度推定信号生成部106の構成例を用いて高解像度推定信号を生成する手順を図5に示す。 FIG. 5 shows a procedure for generating a high resolution estimation signal using the configuration example of the high resolution estimation signal generation unit 106 shown in FIG.

まず、第2のインターポレーション部407は、入力信号をインターポレーションする[ステップS501]。 First, the second interpolation unit 407 interpolates the input signal [step S501].

次に、推定度判定部409は、量子化パラメータを推定パラメータα_rとTに変換する[ステップS507]。 Next, the estimation degree determination unit 409 converts the quantization parameter into the estimation parameters α _r and T [Step S507].

一方、第1のハイパスフィルタリング部403は、ベースレイヤデコード信号から高周波数成分信号を抽出する[ステップS502]。そして、抽出した高周波数成分信号を第1のインターポレーション部404がインターポレーションする[ステップS503]。インターポレーションされた信号に対して振幅制限・定数倍処理部405が振幅制限及び定数倍処理を行う[ステップS504]。ここで、振幅制限及び定数倍処理に伴うパラメータは、推定度判断部409から与えられたものを用いる。次に、第2のハイパスフィルタリング部406は、振幅制限及び定数倍処理した信号から推定した高周波数成分を抽出する[ステップS505]。そして、加算器408は、第2のインターポレーション部407によってインターポレーションされた信号と、第2のハイパスフィルタリング部406を介した推定された高周波数成分とを足し合わせ、高解像度推定信号を得る[ステップS506]。 On the other hand, the first high-pass filtering unit 403 extracts a high-frequency component signal from the base layer decoded signal [step S502]. Then, the first interpolation unit 404 interpolates the extracted high frequency component signal [step S503]. The amplitude limiting / constant multiplication processing unit 405 performs the amplitude limiting and constant multiplication processing on the interpolated signal [step S504]. Here, the parameters given from the estimation degree determination unit 409 are used as parameters associated with the amplitude limitation and constant multiplication processing. Next, the second high-pass filtering unit 406 extracts a high frequency component estimated from the signal subjected to the amplitude limitation and constant multiplication processing [Step S505]. Then, the adder 408 adds the signal interpolated by the second interpolation unit 407 and the high frequency component estimated through the second high-pass filtering unit 406 to obtain a high resolution estimation signal. Obtain [step S506].

図4に示したエンハンスメントレイヤエンコード部107の構成例を用いてオリジナルの映像信号の解像度の信号(エンハンスメントレイヤ)を符号化する手順を図6に示す。 FIG. 6 shows a procedure for encoding a signal (enhancement layer) having the resolution of the original video signal using the configuration example of the enhancement layer encoding unit 107 shown in FIG.

イントラ予測部415は、加算器421に復元された信号をイントラ予測を行う[ステップS601]。イントラ予測した信号を予測信号選択部416へ送る。 The intra prediction unit 415 performs intra prediction on the signal restored to the adder 421 [step S601]. The intra-predicted signal is sent to the prediction signal selection unit 416.

一方、動き推定部413及び動き補償部414は、フレームメモリ1・411からの入力信号と、フレームメモリ2・412からの参照信号とを基に動き推定及び動き補償(動き補償予測)を行う[ステップS602]。動き補償予測した信号を予測信号選択部416へ送る。 On the other hand, the motion estimation unit 413 and the motion compensation unit 414 perform motion estimation and motion compensation (motion compensation prediction) based on the input signals from the frame memories 1 and 411 and the reference signals from the frame memories 2 and 412 [ Step S602]. The motion compensation predicted signal is sent to the prediction signal selection unit 416.

また、高解像度推定信号生成部106は、図５に示すような手順で高解像度推定信号を生成する[ステップS603]。生成した高解像度推定信号を予測信号選択部416へ送る。 Further, the high resolution estimation signal generation unit 106 generates a high resolution estimation signal according to the procedure shown in FIG. 5 [step S603]. The generated high resolution estimation signal is sent to the prediction signal selection unit 416.

予測信号選択部416は、イントラ予測した信号と、動き補償予測された信号と、高解像度推定信号とのうちからひとつを選択、または、それぞれの信号に重みを与えて合成する[ステップS604]。ここで、この３つの信号の選択または合成は、例えば、符号化効率が高くなるように選択または合成する。これには従来技術を用いても良い。例えば、予測誤差信号生成部417より出力される予測誤差信号の絶対値のブロック内総和が最も小さくなるいずれかの信号を選択したり、あるいは予測誤差信号生成部417より出力される予測誤差信号の絶対値のブロック内総和が小さくなるような信号が生成されるような比率で、動き補償部414、イントラ予測部415よりそれぞれから出力される信号及び高解像度推定信号生成部106からの高解像度推定信号を合成するように判断する。もしくは、予測誤差信号を直交変換したのちの信号(直交変換係数)のブロック内絶対値総和が小さくなるように判断しても良いし、エントロピー符号化部419より出力される符号量が小さくなるように判断しても良い。 The prediction signal selection unit 416 selects one of the intra-predicted signal, the motion-compensated prediction signal, and the high-resolution estimation signal, or combines each signal with a weight [step S604]. Here, the selection or synthesis of the three signals is performed, for example, so as to increase the coding efficiency. Conventional techniques may be used for this. For example, one of the signals having the smallest sum of the absolute values of the prediction error signals output from the prediction error signal generation unit 417 in the block is selected, or the prediction error signal output from the prediction error signal generation unit 417 The signals output from the motion compensation unit 414 and the intra prediction unit 415 and the high resolution estimation from the high resolution estimation signal generation unit 106 at such a ratio that a signal that reduces the sum of the absolute values in the block is generated. Judge to synthesize signals. Alternatively, it may be determined that the sum of absolute values in the block of the signal (orthogonal transform coefficient) after orthogonal transform of the prediction error signal is small, or the code amount output from the entropy coding unit 419 is small. You may judge.

予測誤差信号生成部417は、予測信号選択部416にて選択、または、合成された予測信号をフレームメモリ1・411から出力される信号から差し引いて予測誤差信号を生成する[ステップS605]。直交変換・量子化部418は、予測誤差信号を直交変換及び量子化する[ステップS606]。エントロピー符号化部419は、直交変換及び量子化された信号、及び動き推定部413からのエンハンスメントレイヤの動き情報や、直交変換・量子化部418からのエンハンスメントレイヤの量子化パラメータ、予測信号選択部416がどの信号を予測信号として選択したかを示す予測信号選択情報等の符号化パラメータをエントロピー符号化してエンハンスメントレイヤの符号化ビットストリームとして出力する[ステップS607]。なお、上述したように、ベースレイヤエンコード部105のエントロピー符号化部でも、エンハンスメントレイヤエンコード部107のエントロピー符号化部419と同様に、ベースレイヤの符号化信号と、ベースレイヤの動き情報、ベースレイヤの量子化パラメータ、ベースレイヤの予測信号選択情報等の符号化パラメータをエントロピー符号化してベースレイヤの符号化ビットストリームとして出力する。 The prediction error signal generation unit 417 generates a prediction error signal by subtracting the prediction signal selected or synthesized by the prediction signal selection unit 416 from the signals output from the frame memories 1 and 411 [step S605]. The orthogonal transform / quantization unit 418 performs orthogonal transform and quantization on the prediction error signal [step S606]. The entropy encoding unit 419 includes an orthogonal transform and quantized signal, enhancement layer motion information from the motion estimation unit 413, an enhancement layer quantization parameter from the orthogonal transform / quantization unit 418, and a prediction signal selection unit. Encoding parameters such as prediction signal selection information indicating which signal 416 has selected as a prediction signal are entropy-encoded and output as an enhancement layer encoded bitstream [step S607]. As described above, in the entropy encoding unit of the base layer encoding unit 105, similarly to the entropy encoding unit 419 of the enhancement layer encoding unit 107, the base layer encoded signal, the base layer motion information, and the base layer Are entropy-encoded and output as a base layer encoded bitstream.

エンハンスメントレイヤエンコード部107は、符号化対象の信号を全て符号化した場合、ここで処理を終了する。そうでない場合は、現在符号化している信号が他の信号の符号化時に参照されることが可能となるように、次に示す手順によってデコード及びデブロッキング処理する[ステップS608]。 When the enhancement layer encoding unit 107 has encoded all the signals to be encoded, the enhancement layer encoding unit 107 ends the process. Otherwise, decoding and deblocking are performed according to the following procedure so that the currently encoded signal can be referred to when other signals are encoded [step S608].

つまり、逆量子化・逆直交変換部420は、ステップS606にて直交変換及び量子化した信号を逆量子化及び逆直交変換する[ステップS609]。加算器421は、逆量子化及び逆直交変換された信号と、予測信号選択部416にて選択された予測信号とを加算し、デコード信号を得て[ステップS610]、イントラ予測部415及びデブロッキングフィルタ部422へ送る。そして、デブロッキングフィルタ部422は、そのデコード信号をデブロッキングフィルタ処理し[ステップS611]、デブロッキングフィルタ処理した信号をフレームメモリ2・412に格納する[ステップS612]。 That is, the inverse quantization / inverse orthogonal transform unit 420 performs inverse quantization and inverse orthogonal transform on the signal that has been orthogonally transformed and quantized in step S606 [step S609]. The adder 421 adds the signal subjected to inverse quantization and inverse orthogonal transform and the prediction signal selected by the prediction signal selection unit 416 to obtain a decoded signal [Step S610], and adds the intra prediction unit 415 and the decoding signal. The data is sent to the blocking filter unit 422. Then, the deblocking filter unit 422 performs deblocking filtering on the decoded signal [Step S611], and stores the deblocking filtered signal in the frame memories 2 and 412 [Step S612].

このように本実施の形態１の映像信号階層符号化装置101によれば、低解像度であるベースレイヤエンコード部105の復号信号を空間的に拡大する際、そのベースレイヤエンコード部105の量子化パラメータに応じて高周波数成分推定の程度を制御した高解像度化処理を行い高解像度推定信号を得て、その高解像度推定信号を予測信号として用い入力映像信号を空間解像度間予測により符号化するようにしたので、従来の映像階層符号化における階層間予測の為の単純なインターポレーション（空間的拡大）とは異なり、ベースレイヤエンコード部105の量子化パラメータに応じた適確な高解像度化処理を行うことができ、階層間予測誤差をより小さくすることができるのでるので、効率的でより高品位な映像信号階層符号化を実現することが可能となる。 As described above, according to the video signal hierarchical encoding apparatus 101 of the first embodiment, when the decoded signal of the base layer encoding unit 105 having a low resolution is spatially expanded, the quantization parameter of the base layer encoding unit 105 is expanded. To obtain a high-resolution estimated signal by controlling the degree of high-frequency component estimation according to the input signal, and to encode the input video signal by inter-resolution prediction using the high-resolution estimated signal as a prediction signal Therefore, unlike the simple interpolation (spatial expansion) for inter-layer prediction in the conventional video layer coding, an accurate high resolution processing according to the quantization parameter of the base layer encoding unit 105 is performed. Since the inter-layer prediction error can be further reduced, it is possible to realize efficient and higher-quality video signal hierarchical coding. That.

特に、本実施の形態１の映像信号階層符号化装置101内では、ベースレイヤの低解像度信号の符号化特性を考慮して、低解像度信号から入力映像信号（高解像度信号）により近い予測信号を生成する構成がとれる為、予測信号の高解像度化理をより強化した効率的な映像階層符号化を実現することが可能となる。 In particular, in the video signal hierarchical encoding apparatus 101 according to the first embodiment, a prediction signal closer to the input video signal (high resolution signal) from the low resolution signal is considered in consideration of the encoding characteristics of the low resolution signal of the base layer. Since the generation configuration can be taken, it is possible to realize efficient video hierarchical encoding that further enhances the high resolution of the prediction signal.

また、本実施の形態１の映像信号階層符号化装置101では、ベースレイヤの低解像度信号の符号化特性を量子化パラメータのみから判断することで、高解像度化処理の制御を少ない計算量で実現し、伝送時に新たなパラメータを追加する必要もないため、回路規模及び符号化効率の両方の観点から有用な効果を得ることが可能となる。 Further, the video signal hierarchical encoding apparatus 101 according to the first embodiment realizes control of high resolution processing with a small amount of calculation by judging the encoding characteristic of the low resolution signal of the base layer only from the quantization parameter. In addition, since it is not necessary to add a new parameter at the time of transmission, it is possible to obtain a useful effect from the viewpoints of both circuit scale and coding efficiency.

次に、映像信号階層復号化装置103側について説明する。 Next, the video signal hierarchy decoding apparatus 103 side will be described.

映像信号階層復号化装置103の高解像度推定信号復元部111及びエンハンスメントレイヤデコード部112の詳細な構成例を示したものが、図7である。 FIG. 7 shows a detailed configuration example of the high-resolution estimated signal restoration unit 111 and the enhancement layer decoding unit 112 of the video signal hierarchical decoding apparatus 103.

高解像度推定信号復元部111は、第1のハイパスフィルタリング部403、第1のインターポレーション部404、振幅制限・定数倍処理部405、第2のハイパスフィルタリング部406、第2のインターポレーション部407、加算器408、推定度判断部409を少なくとも有している。すなわち、高解像度推定信号復元部111は、符号化側の高解像度推定信号生成部106と同じもので実現できる。このため、図7の高解像度推定信号復元部111の各部分には、図4と同じ番号で示してある。なお、図7の高解像度推定信号復元部111の構成例を用いて高解像度推定信号を復元する手順を図9示したが、これについても符号化側における高解像度推定信号を生成する手順(図5)と同じである。 The high-resolution estimated signal restoration unit 111 includes a first high-pass filtering unit 403, a first interpolation unit 404, an amplitude limiting / constant multiplication unit 405, a second high-pass filtering unit 406, and a second interpolation unit. 407, an adder 408, and an estimation degree determination unit 409. That is, the high resolution estimated signal restoration unit 111 can be realized by the same one as the high resolution estimated signal generation unit 106 on the encoding side. Therefore, each part of the high resolution estimated signal restoration unit 111 in FIG. 7 is denoted by the same number as in FIG. The procedure for restoring the high resolution estimation signal using the configuration example of the high resolution estimation signal restoration unit 111 in FIG. 7 is shown in FIG. 9, and this is also the procedure for generating the high resolution estimation signal on the encoding side (FIG. Same as 5).

エンハンスメントレイヤデコード部112は、エントロピー復号化部710、フレームメモリ2・412、動き補償部414、イントラ予測部415、予測信号選択部416’、逆量子化・逆直交変換部420、加算器420、加算器421及びデブロッキングフィルタ部422を少なくとも有している。ここで、エントロピー復号化部710以外の各部分が備える機能は、図4におけるものと同じもので実現できるため、同じ番号で示してある。 The enhancement layer decoding unit 112 includes an entropy decoding unit 710, a frame memory 2 and 412, a motion compensation unit 414, an intra prediction unit 415, a prediction signal selection unit 416 ′, an inverse quantization / inverse orthogonal transform unit 420, an adder 420, At least an adder 421 and a deblocking filter unit 422 are included. Here, functions provided in each part other than the entropy decoding unit 710 can be realized by the same functions as those in FIG.

エントロピー復号化部710は、エクストラクト部109にて分離されたエンハンスメントレイヤの符号化ビットストリームを入力として受け付け復号し、復号したエンハンスメントレイヤの信号と、映像信号階層符号化装置101の直交変換・量子化部418からのエンハンスメントレイヤの量子化パラメータを逆量子化・逆直交変換部420へ出力する。また、復号したエンハンスメントレイヤの動き情報を動き補償部414へ出力し、予測信号選択部416がどの信号を予測信号として選択したかを示す予測信号選択情報を予測信号選択部416’へ出力する。なお、図示はしていないが、ベースレイヤデコード部110のエントロピー復号化部でも、エンハンスメントレイヤデコード部112のエントロピー復号化部710と同様に、エクストラクト部109にて分離されたベースレイヤの符号化ビットストリームから、ベースレイヤの符号化信号と、ベースレイヤの動き情報や、ベースレイヤの量子化パラメータ、ベースレイヤの予測信号選択情報等の符号化パラメータをエントロピー復号化して、それぞれ、ベースレイヤデコード部110内の動き補償部や、逆量子化・逆直交変換部、予測信号選択部等へ出力する。なお、ベースレイヤの量子化パラメータは、逆量子化・逆直交変換部を介して、あるいは逆量子化・逆直交変換部を介さずに、高解像度推定信号復元部111へも出力される。 The entropy decoding unit 710 receives and decodes the enhancement layer encoded bitstream separated by the extractor 109 as an input, decodes the enhancement layer signal, and the orthogonal transform / quantum of the video signal hierarchical encoding device 101. The quantization parameter of the enhancement layer from the quantization unit 418 is output to the inverse quantization / inverse orthogonal transform unit 420. Also, the decoded enhancement layer motion information is output to the motion compensation unit 414, and prediction signal selection information indicating which signal the prediction signal selection unit 416 has selected as a prediction signal is output to the prediction signal selection unit 416 '. Although not shown, the entropy decoding unit of the base layer decoding unit 110 also encodes the base layer separated by the extractor 109 in the same manner as the entropy decoding unit 710 of the enhancement layer decoding unit 112. A base layer encoded signal, base layer motion information, base layer quantization parameters, base layer prediction signal selection information and other encoding parameters are entropy-decoded from the bitstream, and each base layer decoding unit The data is output to a motion compensation unit 110, an inverse quantization / inverse orthogonal transform unit, a prediction signal selection unit, and the like. Note that the base layer quantization parameter is also output to the high-resolution estimated signal restoration unit 111 via the inverse quantization / inverse orthogonal transform unit or without going through the inverse quantization / inverse orthogonal transform unit.

図7に示したエンハンスメントレイヤデコード部702の構成例を用いてオリジナルの映像信号の解像度の信号(エンハンスメントレイヤ)を復号化する手順を図8に示す。 FIG. 8 shows a procedure for decoding a signal (enhancement layer) of the resolution of the original video signal using the configuration example of the enhancement layer decoding unit 702 shown in FIG.

エクストラクト部109より得られるエンハンスメントレイヤに相当するビットストリームをエントロピー復号化部710で復号化し[ステップS801]、復号化された信号を逆量子化・逆直交変換部420が逆量子化及び逆直交変換して予測誤差信号を復元して加算器421へ出力する[ステップS802]。 The bit stream corresponding to the enhancement layer obtained from the extractor 109 is decoded by the entropy decoding unit 710 [step S801], and the decoded signal is dequantized and inverse orthogonal transformed by the inverse quantization / inverse orthogonal transform unit 420. The prediction error signal is restored by conversion and output to the adder 421 [step S802].

一方、予測信号選択部416’は、注目するブロックが、映像信号階層符号化装置101におけるエンハンスメントレイヤの符号化の際、イントラ予測、動き補償予測及び高解像度推定信号による予測のいずれが選択されていたか、または合成されていたかを、映像信号階層符号化装置101から送られてきた符号化ビットストリームに符号化パラメータとして含まれる予測信号選択情報から解読して、それに対応する処理を行う[ステップS803]。つまり、このエンハンスメントレイヤデコード部112内の予測信号選択部416’は、エンハンスメントレイヤの符号化ビットストリームに含まれる予測信号選択情報に基づいて、映像信号階層符号化装置101側のエンハンスメントレイヤエンコード部107内の予測信号選択部416と同様に、これら３つの信号を選択、または合成する。 On the other hand, the prediction signal selection unit 416 ′ selects any one of intra prediction, motion compensation prediction, and prediction using a high resolution estimation signal when the block of interest is encoded in the enhancement layer in the video signal hierarchical encoding device 101. Is decoded from the prediction signal selection information included as an encoding parameter in the encoded bit stream sent from the video signal hierarchical encoding apparatus 101, and processing corresponding thereto is performed [step S803 ]. That is, the prediction signal selection unit 416 ′ in the enhancement layer decoding unit 112, based on the prediction signal selection information included in the enhancement layer encoded bitstream, the enhancement layer encoding unit 107 on the video signal hierarchical encoding device 101 side. These three signals are selected or synthesized in the same manner as the prediction signal selection unit 416.

そして、予測信号選択情報に基づいてエンハンスメントレイヤエンコード部107においてイントラ予測が選択されていたと判断した場合、予測信号選択部416’は、イントラ予測部415に接続して、イントラ予測部415を用いてイントラ予測を行う[ステップS804]。一方、予測信号選択情報に基づいてエンハンスメントレイヤエンコード部107において動き補償予測が選択されていたと判断した場合、予測信号選択部416’は、動き補償部414に接続して、動き補償部414を用いて動き補償を行う[ステップS805]。また、予測信号選択情報に基づいてエンハンスメントレイヤエンコード部107において高解像度推定信号による予測が選択されていたと判断した場合、予測信号選択部416’は、高解像度推定信号復元部111に接続して、高解像度推定信号復元部111を用いて高解像度推定信号を復元する[ステップS806]。なお、予測信号選択情報に基づいてエンハンスメントレイヤエンコード部107においてそれぞれの信号が合成されていたと判断した場合、予測信号選択部416’は、順次接続先を切り換えて、ステップS804、ステップS805及びステップS806をすべて実行し、予測信号選択情報に基づいてエンハンスメントレイヤエンコード部107における重み付けと同様に重みをつけて合成する。 Then, when it is determined that the intra prediction is selected in the enhancement layer encoding unit 107 based on the prediction signal selection information, the prediction signal selection unit 416 ′ connects to the intra prediction unit 415 and uses the intra prediction unit 415. Intra prediction is performed [step S804]. On the other hand, when it is determined that the motion compensation prediction has been selected in the enhancement layer encoding unit 107 based on the prediction signal selection information, the prediction signal selection unit 416 ′ connects to the motion compensation unit 414 and uses the motion compensation unit 414. Motion compensation is performed [step S805]. Further, when it is determined that the prediction by the high resolution estimation signal is selected in the enhancement layer encoding unit 107 based on the prediction signal selection information, the prediction signal selection unit 416 ′ is connected to the high resolution estimation signal restoration unit 111, The high resolution estimated signal is restored using the high resolution estimated signal restoring unit 111 [step S806]. Note that when it is determined that the respective signals have been combined in the enhancement layer encoding unit 107 based on the prediction signal selection information, the prediction signal selection unit 416 ′ sequentially switches the connection destination, and performs steps S804, S805, and S806. Are all performed and weighted and synthesized in the same manner as the weighting in the enhancement layer encoding unit 107 based on the prediction signal selection information.

そして、加算器421は、ステップS804、ステップS805及びステップS806のいずれか、またはそれらの合成によって得られた信号と、予測誤差信号とを加算し[ステップS807]、デブロッキングフィルタ部422は、加算器421にて加算された信号をデブロッキングフィルタ処理する[ステップS808]。デブロッキングフィルタ処理した信号は復号映像信号としてディスプレイ等へ出力される。復号化対象ビットストリームが残されている場合、復号映像信号を参照フレームとしてフレームメモリ2・412に蓄積する[ステップS810]。そして、ステップS801からステップS810の処理を繰り返す[ステップS809]。 Then, the adder 421 adds one of Step S804, Step S805, and Step S806, or a signal obtained by combining them and the prediction error signal [Step S807], and the deblocking filter unit 422 adds The signal added in the unit 421 is subjected to deblocking filter processing [step S808]. The signal subjected to the deblocking filter processing is output to a display or the like as a decoded video signal. When the decoding target bit stream remains, the decoded video signal is stored in the frame memories 2 and 412 as a reference frame [step S810]. Then, the processing from step S801 to step S810 is repeated [step S809].

このように、本実施の形態１の映像信号階層復号化装置103によれば、映像信号階層符号化装置101が多重化した多重化ストリームをエクスラクト部109がエンハンスメントレイヤの符号化ビットストリームと、ベースレイヤの符号化ビットストリームとに分離し、ベースレイヤデコード部110ではベースレイヤの符号化ビットストリームからベースレイヤの復号信号と量子化パラメータを復元し、高解像度推定信号復元部111ではベースレイヤデコード部110からの復号信号と量子化パラメータに応じて高周波数成分推定の程度を制御して入力信号を推定した高解像度推定信号を復元し、エンハンスメントレイヤデコード部112ではエンハンスメントレイヤの符号化ビットストリーム中の差分符号化信号を、高解像度推定信号復元部111からの高解像度推定信号を予測信号として復号するようにしたので、ベースレイエンコードの際の量子化パラメータに応じた適確な高解像度化処理を行って階層間予測誤差をより小さくしたエンハンスメントレイヤにて符号化した符号化差分信号でも、正しく復号することができる。 Thus, according to the video signal hierarchical decoding apparatus 103 of the first embodiment, the extract unit 109 converts the multiplexed stream multiplexed by the video signal hierarchical encoding apparatus 101 and the enhancement layer encoded bitstream, The base layer decoding unit 110 restores the base layer decoded signal and the quantization parameter from the base layer coded bit stream, and the high resolution estimation signal restoration unit 111 performs base layer decoding. The high-resolution estimation signal obtained by estimating the input signal by controlling the degree of high-frequency component estimation according to the decoded signal from the unit 110 and the quantization parameter is restored, and the enhancement layer decoding unit 112 includes the enhancement layer encoded bitstream High-resolution estimation from the high-resolution estimation signal reconstruction unit 111 Since the signal is decoded as a prediction signal, the code is encoded in the enhancement layer that has been subjected to appropriate high-resolution processing according to the quantization parameter at the time of base-ray encoding to reduce the inter-layer prediction error. Even a normalized difference signal can be correctly decoded.

［実施の形態２］
本発明の実施の形態２を適用した空間解像度スケーラビリティを実現する階層符号化・復号化装置について説明する。この実施の形態２適用した装置は、上述の実施の形態１を適用した高解像度推定信号生成部106(図4)および高解像度推定信号復元部111(図7)を一部変更したものである。実施の形態1におけるインターポレーションと高周波数成分抽出の処理の順序を変えることで、実施の形態1と同様の効果を得るとともに、さらにメモリ等の資源および処理量の幾分かの削減を実現する。 [Embodiment 2]
A hierarchical encoding / decoding apparatus that realizes spatial resolution scalability to which Embodiment 2 of the present invention is applied will be described. The apparatus to which the second embodiment is applied is obtained by partially changing the high-resolution estimated signal generation unit 106 (FIG. 4) and the high-resolution estimated signal restoration unit 111 (FIG. 7) to which the first embodiment is applied. . By changing the order of interpolation and high-frequency component extraction processing in the first embodiment, the same effects as in the first embodiment can be obtained, and some reduction in resources such as memory and processing amount can be achieved. To do.

つまり、実施の形態1の高解像度推定信号生成部106では、図４に示すように、ベースレイヤハイパスフィルタリング部403がベースレイヤデコード信号に対して高周波数成分の抽出をおこない、続いて第1のインターポレーション部404が抽出された高周波数成分にインターポレーションを実施する一方、第2のインターポレーション部40７がベースレイヤデコード信号にインターポレーションを実施していた。これに対して実施の形態2では、図１１に示すように、最初にベースレイヤデコード信号に対してインターポレーションをおこない、インターポレーションした信号の高周波数成分の抽出を行うことで、処理量やメモリ等の資源の幾分かの削減を実現する。なお、インターポレーションおよび高周波数成分の抽出をそれぞれ線形とすることで、それらの順序を変えても結果は同じとなる。ただし、実施の形態2では、インターポレーションした後に高周波数成分抽出を行う、すなわち、サンプリング周波数が変化した信号に対してのフィルタ処理を行うことになるため、ここで用いるフィルタは、それに対応したものを用いることが望ましい。以下に実施の形態2の詳細を示す。 That is, in the high-resolution estimated signal generation unit 106 according to Embodiment 1, the base layer high-pass filtering unit 403 extracts high-frequency components from the base layer decoded signal as shown in FIG. While the interpolation unit 404 performs interpolation on the extracted high-frequency component, the second interpolation unit 407 performs interpolation on the base layer decoded signal. On the other hand, in the second embodiment, as shown in FIG. 11, the base layer decoded signal is first interpolated, and the high frequency component of the interpolated signal is extracted, thereby reducing the processing amount. Achieve some reduction in resources such as memory. It should be noted that the interpolation and the extraction of the high frequency component are linear, so that the result is the same even if their order is changed. However, in the second embodiment, high frequency component extraction is performed after interpolation, that is, filter processing is performed on a signal whose sampling frequency has changed, so the filter used here corresponds to that. It is desirable to use one. Details of the second embodiment will be described below.

図12に、実施の形態２における高解像度推定信号生成部1601の構成例を示す。高解像度推定信号生成部1601は、第1のインターポレーション部1602、第1のハイパスフィルタリング部1603、振幅制限・定数倍処理部405、第2のハイパスフィルタリング部406、加算器408、推定度判断部409を少なくとも有している。ここで、第1のインターポレーション部1602及び第1のハイパスフィルタリング部1603以外の各部分が備える機能は、図4におけるものと同じもので実現できるため、同じ番号で示してある。 FIG. 12 shows a configuration example of the high resolution estimation signal generation unit 1601 in the second embodiment. The high-resolution estimated signal generation unit 1601 includes a first interpolation unit 1602, a first high-pass filtering unit 1603, an amplitude limiting / constant multiplication processing unit 405, a second high-pass filtering unit 406, an adder 408, an estimation degree determination At least a portion 409 is included. Here, the functions of each part other than the first interpolation unit 1602 and the first high-pass filtering unit 1603 can be realized by the same functions as those in FIG.

第1のインターポレーション部1602は、ベースレイヤのデコード信号を入力として受け付け、その信号をエンハンスメントレイヤに入力されるオリジナルの映像信号の解像度となるように、インターポレーションを行う。インターポレーションは、前述の式(8)で実現可能である。ここでも、インターポレーションの方法(用いるフィルタ係数や補間関数など)は、式(8)以外のものを用いても良い。また、第1のインターポレーション部1602は、インターポレーションした信号を第1のハイパスフィルタリング部1603及び加算器408へ出力する。 First interpolation section 1602 accepts a base layer decoded signal as input, and performs interpolation so that the signal has the resolution of the original video signal input to the enhancement layer. Interpolation can be realized by the aforementioned equation (8). Again, interpolation methods (filter coefficients, interpolation functions, etc.) may be used other than the equation (8). The first interpolation unit 1602 outputs the interpolated signal to the first high-pass filtering unit 1603 and the adder 408.

第1のハイパスフィルタリング部1603は、第1のインターポレーション部1602より出力された信号を入力として受け付け、入力信号から高周波数成分を抽出する。高周波数成分は前述の式(1)、(2)によって求める。ここで、実施の形態2の第1のハイパスフィルタリング部1603に入力される信号は、インターポレーションによってサンプリング周波数(解像度)が高くなっているため、式(2)の帯域をそれに応じたものに設定することが望ましい。例えば、拡大率r（エンハンスメントレイヤの解像度／ベースレイヤの解像度）が2倍の場合には、式(2)の帯域を実施の形態1の場合の半分に設定する。また、式(1)、(2)をそれ以外の方法に置き換えても良い。ただし、ここで用いるフィルタや補間関数等と、空間デシメーション部104、第1のインターポレーション部1602、第2のハイパスフィルタリング部406及び第2のインターポレーション部407に用いるフィルタや補間関数等の関係は、ピラミッド構成を満たすものとなっていることが望ましい。また、第1のハイパスフィルタリング部1603は、ここで得た高周波数成分を振幅制限・定数倍処理部405へ出力する。 The first high-pass filtering unit 1603 accepts the signal output from the first interpolation unit 1602 as an input, and extracts a high frequency component from the input signal. The high frequency component is obtained by the above formulas (1) and (2). Here, since the signal input to the first high-pass filtering unit 1603 of the second embodiment has a higher sampling frequency (resolution) due to interpolation, the band of the equation (2) is set accordingly. It is desirable to set. For example, when the enlargement ratio r (enhancement layer resolution / base layer resolution) is doubled, the band of equation (2) is set to half that of the first embodiment. Further, the expressions (1) and (2) may be replaced with other methods. However, the filters and interpolation functions used here, and the filters and interpolation functions used for the spatial decimation unit 104, the first interpolation unit 1602, the second high-pass filtering unit 406, and the second interpolation unit 407, etc. It is desirable that the relationship satisfies the pyramid configuration. The first high-pass filtering unit 1603 outputs the high frequency component obtained here to the amplitude limit / constant multiplication processing unit 405.

図12に示した高解像度推定信号生成部1601の構成例を用いて高解像度推定信号を生成する手順を図13に示す。ここで、ステップS504からステップS507の各ステップは図5(実施の形態1)と同じである為、同じ番号で示してある。 FIG. 13 shows a procedure for generating a high resolution estimation signal using the configuration example of the high resolution estimation signal generation unit 1601 shown in FIG. Here, since steps S504 to S507 are the same as those in FIG. 5 (Embodiment 1), they are denoted by the same numbers.

まず、第1のインターポレーション部1602を用いて入力信号をインターポレーションする[ステップS1701]。そして、インターポレーションの結果得られた信号を、インターポレーションした信号を第1のハイパスフィルタリング部1603及び加算器408へ送る。 First, the input signal is interpolated using the first interpolation unit 1602 [step S1701]. Then, the signal obtained as a result of the interpolation is sent to the first high-pass filtering unit 1603 and the adder 408.

次に、第1のハイパスフィルタリング部1603を用いてインターポレーションした信号から高周波数成分信号を抽出する[ステップS1702]。抽出した高周波数成分信号に対して振幅制限・定数倍処理部405を用いて振幅制限及び定数倍処理を行う[ステップS504]。それ以降は、実施の形態1の[ステップS505〜S507]と同様の手順で高解像度推定信号を生成する。
なお、実施の形態2における復号側の高解像度推定信号復元部は、図12に示す実施の形態2の高解像度推定信号生成部1601と同様の構成で実現でき、高解像度推定信号を復元する手順も図13と同様である。 Next, a high frequency component signal is extracted from the signal interpolated using the first high-pass filtering unit 1603 [step S1702]. The extracted high frequency component signal is subjected to amplitude limiting and constant multiplication processing using the amplitude limiting / constant multiplication processing unit 405 [step S504]. Thereafter, a high resolution estimation signal is generated in the same procedure as [Steps S505 to S507] in the first embodiment.
Note that the decoding-side high-resolution estimated signal restoration unit in the second embodiment can be realized with the same configuration as the high-resolution estimated signal generation unit 1601 in the second embodiment shown in FIG. 12, and a procedure for restoring the high-resolution estimated signal This is the same as FIG.

従って、本実施の形態２による映像信号階層符号化装置および映像信号階層復号化装置によれば、上記実施の形態１と同様に、従来の映像階層符号化における階層間予測の為の単純なインターポレーション（空間的拡大）とは異なり、ベースレイヤエンコードの際の量子化パラメータに応じた適確な高解像度化処理を行うことができ、階層間予測誤差をより小さくすることができるので、効率的でより高品位な映像信号階層符号化を実現することが可能となると共に、ベースレイヤエンコードの際の量子化パラメータに応じた適確な高解像度化処理を行って階層間予測誤差をより小さくしたエンハンスメントレイヤにて符号化した符号化差分信号でも、正しく復号することができる。 Therefore, according to the video signal hierarchical coding apparatus and the video signal hierarchical decoding apparatus according to the second embodiment, as in the first embodiment, a simple inter-layer prediction for inter-layer prediction in conventional video hierarchical coding is performed. Unlike poration (spatial expansion), it is possible to perform accurate high-resolution processing according to the quantization parameter at the time of base layer encoding, and the inter-layer prediction error can be further reduced. Video signal hierarchical coding can be realized, and high-resolution processing appropriate to the quantization parameter at base layer encoding is performed to reduce inter-layer prediction errors. Even the encoded differential signal encoded by the enhanced layer can be correctly decoded.

特に、本実施の形態２の図12に示す高解像度推定信号生成部1601は、図４に示す実施の形態１の高解像度推定信号生成部106とは異なり、第１のハイパスフィルタリング部1603の前段に第１のインターポレーション部1602を設け、その出力を第１のハイパスフィルタリング部1603に入力するものと、入力しないものとを分岐してようにしたので、図４に示す実施の形態１の高解像度推定信号生成部106では必要であった第2のインター歩レーション部407を省略することが可能となり、部品点数を削減することできる。 In particular, the high-resolution estimated signal generator 1601 shown in FIG. 12 of the second embodiment is different from the high-resolution estimated signal generator 106 of the first embodiment shown in FIG. Since the first interpolation unit 1602 is provided and the output of the first interpolation unit 1602 is input to the first high-pass filtering unit 1603 and the output is not branched, the first embodiment shown in FIG. The high-resolution estimated signal generation unit 106 can omit the second interpedaling unit 407, which is necessary, and the number of parts can be reduced.

［実施の形態３］
図14に、以上説明した本発明の実施の形態１,２の映像信号階層符号化装置101および映像信号階層復号化装置103の符号化機能および復号化機能を備えた情報処理装置1001の一例のブロック図を示す。情報処理装置1001は、外部記憶装置1002、一時記憶装置1003、通信装置1004、入力装置1005、中央処理制御装置1006および出力装置1007で構成されており、コンピュータである中央処理制御装置1006により、上述の実施の形態１の符号化および復号化装置の機能をプログラムにより実現させるものである。ここで、上記のプログラムは記録媒体から読み取られて中央処理制御装置1006に取り込まれても良いし、ネットワークを介して通信装置1004により受信されて中央処理制御装置1006に取り込まれても良い。 [Embodiment 3]
FIG. 14 shows an example of the information processing apparatus 1001 having the encoding function and the decoding function of the video signal hierarchical encoding apparatus 101 and the video signal hierarchical decoding apparatus 103 according to the first and second embodiments of the present invention described above. A block diagram is shown. The information processing apparatus 1001 includes an external storage device 1002, a temporary storage device 1003, a communication device 1004, an input device 1005, a central processing control device 1006, and an output device 1007. The functions of the encoding and decoding apparatus according to the first embodiment are realized by a program. Here, the above program may be read from the recording medium and taken into the central processing control apparatus 1006, or may be received by the communication apparatus 1004 via the network and taken into the central processing control apparatus 1006.

中央処理制御装置1006は、上記プログラムの実行により、図14の中央処理制御装置内に示すそれぞれの手段をソフトウェア処理にて実現して、実施の形態１,２の映像信号階層符号化装置101および映像信号階層復号化装置103の符号化機能および復号化機能を達成する。なお、図14に示す一例では、実施の形態１の映像信号階層符号化装置101および映像信号階層復号化装置103の符号化機能および復号化機能を備えた符号化手段と復号化手段とを一つの情報処理装置1001に設けて説明したが、本発明では、これに限らず、符号化手段と復号化手段とを別々の情報処理装置に設け、ネットワークを介して接続するように構成しても勿論よい。 The central processing control device 1006 implements the respective means shown in the central processing control device of FIG. 14 by software processing by executing the above program, and the video signal hierarchical encoding device 101 of the first and second embodiments and The encoding function and the decoding function of the video signal hierarchical decoding apparatus 103 are achieved. In the example shown in FIG. 14, the encoding unit and the decoding unit having the encoding function and the decoding function of video signal hierarchical encoding apparatus 101 and video signal hierarchical decoding apparatus 103 of Embodiment 1 are combined. However, the present invention is not limited to this, and the encoding unit and the decoding unit may be provided in separate information processing devices and connected via a network. Of course.

従って、上記実施の形態１,２の機能をプログラムの実行によりソフトウエア的に達成する本実施の形態３による映像信号階層符号化装置および映像信号階層復号化装置によっても、上記実施の形態１,２と同様に、従来の映像階層符号化における階層間予測の為の単純なインターポレーション（空間的拡大）とは異なり、ベースレイヤエンコードの際の量子化パラメータに応じた適確な高解像度化処理を行うことができ、階層間予測誤差をより小さくすることができるので、効率的でより高品位な映像信号階層符号化を実現することが可能となると共に、ベースレイヤエンコードの際の量子化パラメータに応じた適確な高解像度化処理を行って階層間予測誤差をより小さくしたエンハンスメントレイヤにて符号化した符号化差分信号でも、正しく復号することができる。 Therefore, the video signal hierarchical coding apparatus and video signal hierarchical decoding apparatus according to the third embodiment that achieves the functions of the first and second embodiments in software by executing a program are also used in the first and second embodiments. Similar to 2, unlike the simple interpolation (spatial expansion) for inter-layer prediction in the conventional video layer coding, the resolution is increased appropriately according to the quantization parameter at the time of base layer encoding. Since it is possible to perform processing and the inter-layer prediction error can be further reduced, it is possible to realize efficient and higher-quality video signal hierarchical coding, and quantization during base layer encoding Even with an encoded differential signal encoded by an enhancement layer that has been subjected to appropriate high-resolution processing according to the parameters to reduce inter-layer prediction errors, It is possible to Ku decoding.

本発明の実施の形態１を適用した映像信号階層符号化・復号化装置の一例を示す構成図である。It is a block diagram which shows an example of the video signal hierarchy encoding / decoding apparatus to which Embodiment 1 of this invention is applied. 図１に示す装置の映像信号階層符号化装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video signal hierarchy encoding apparatus of the apparatus shown in FIG. 図１に示す装置の映像信号階層復号化装置の動作を示すフローチャートである。3 is a flowchart showing the operation of the video signal hierarchical decoding apparatus of the apparatus shown in FIG. 1. 図１に示す装置の映像信号階層符号化装置における高解像度推定信号生成部及びエンハンスメントレイヤエンコード部を示す構成図である。It is a block diagram which shows the high-resolution estimated signal production | generation part and enhancement layer encoding part in the video signal hierarchical coding apparatus of the apparatus shown in FIG. 図４に示す高解像度推定信号生成部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the high resolution estimated signal production | generation part shown in FIG. 図４に示すエンハンスメントレイヤエンコード部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the enhancement layer encoding part shown in FIG. 図１に示す装置の映像信号階層復号化装置における高解像度推定信号復元部及びエンハンスメントレイヤデコード部を示す構成図である。It is a block diagram which shows the high-resolution estimated signal decompression | restoration part and enhancement layer decoding part in the video signal hierarchical decoding apparatus of the apparatus shown in FIG. 図７に示すエンハンスメントレイヤデコード部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the enhancement layer decoding part shown in FIG. 図７に示す高解像度推定信号復元部の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the high resolution estimated signal decompression | restoration part shown in FIG. （a）〜（d）それぞれ本発明の実施の形態1及び実施の形態２を適用した階層符号化・復号化装置における高解像度推定信号生成・復元部内の推定度判定部で用いる量子化パラメータと推定用パラメータとの関係の一例を示す図である。(A) to (d) Quantization parameters used in the estimation degree determination unit in the high-resolution estimation signal generation / restoration unit in the hierarchical encoding / decoding device to which Embodiments 1 and 2 of the present invention are applied, respectively It is a figure which shows an example of the relationship with the parameter for estimation. 実施の形態１による映像信号階層符号化装置、映像信号階層符号化方法から出力される多重化ビットストリームの構成例を示す図である。6 is a diagram illustrating a configuration example of a multiplexed bit stream output from the video signal hierarchical encoding device and the video signal hierarchical encoding method according to Embodiment 1. FIG. 本発明の実施の形態２を適用した映像信号階層符号化・復号化装置における高解像度推定信号生成部を示す構成図である。It is a block diagram which shows the high-resolution estimated signal production | generation part in the video signal hierarchy coding / decoding apparatus to which Embodiment 2 of this invention is applied. 図12に示す高解像度推定信号生成部の動作を示すフローチャートである。13 is a flowchart showing an operation of a high resolution estimation signal generation unit shown in FIG. 本発明の実施の形態１，２を適用した符号化および復号化プログラムを実行する情報処理装置の一例を示すブロック図である。It is a block diagram which shows an example of the information processing apparatus which performs the encoding and decoding program to which Embodiment 1, 2 of this invention is applied.

Explanation of symbols

101 映像信号階層符号化装置
102 ネットワーク
103 映像信号階層復号化装置
104 空間デシメーション部
105 ベースレイヤエンコード部
106 高解像度推定信号生成部
107 エンハンスメントレイヤエンコード部
108 多重化部
109 エクストラクト部
110 ベースレイヤデコード部
111 高解像度推定信号復元部
112 エンハンスメントレイヤデコード部
403 第1のハイパスフィルタリング部
404 第1のインターポレーション部
405 振幅制限・定数倍処理部
406 第2のハイパスフィルタリング部
407 第2のインターポレーション部
408 加算器
409 推定度判断部
411 フレームメモリ1
412 フレームメモリ2
413 動き推定部
414 動き補償部
415 イントラ予測部
416 予測信号選択部
417 予測誤差信号生成部
418 直交変換・量子化部
419 エントロピー符号化部
420 逆量子化・逆直交変換部
421 加算器
422 デブロッキングフィルタ部
701 高解像度推定信号復元部
702 エンハンスメントレイヤデコード部
710 エントロピー復号化部
1001 情報処理装置
1002 外部記憶装置
1003 一時記憶装置
1004 通信装置
1005 入力装置
1006 中央処理制御装置
1007 出力装置
1601 高解像度推定信号生成部
1602 第1のインターポレーション部
1603 第1のハイパスフィルタリング部 101 Video signal hierarchical encoding device
102 network
103 Video signal hierarchical decoding device
104 Spatial decimation section
105 Base layer encoding part
106 High-resolution estimation signal generator
107 Enhancement layer encoding section
108 Multiplexer
109 Extract part
110 Base layer decoding section
111 High-resolution estimated signal restoration unit
112 Enhancement layer decoding unit
403 First high-pass filtering unit
404 1st interpolation part
405 Amplitude limit and constant multiplier
406 Second high-pass filtering unit
407 Second interpolation part
408 Adder
409 Estimator
411 Frame memory 1
412 Frame memory 2
413 Motion estimation unit
414 Motion compensation unit
415 Intra prediction unit
416 Predictive signal selector
417 Prediction error signal generator
418 Orthogonal Transform / Quantizer
419 Entropy Coding Unit
420 Inverse quantization and inverse orthogonal transform
421 Adder
422 Deblocking filter
701 High resolution estimation signal restoration unit
702 Enhancement layer decoding unit
710 Entropy decoding unit
1001 Information processing equipment
1002 External storage device
1003 Temporary storage
1004 Communication equipment
1005 Input device
1006 Central processing controller
1007 Output device
1601 High resolution estimation signal generator
1602 First interpolation part
1603 First high-pass filtering unit

Claims

Encode a video signal having a resolution lower than that of the input video signal obtained by decomposing the input video signal into layers having different resolutions, generate a prediction signal from the video signal having a low resolution, and use the prediction signal A video signal hierarchical encoding device that encodes the input video signal on the higher resolution side by prediction between spatial resolutions and obtains encoded data of video signals of different resolutions,
Spatial reduction means for spatially reducing the input video signal to obtain a first video signal having a resolution lower than that of the input video signal;
First encoding means for obtaining first encoded data obtained by encoding the first video signal using an encoding process including a decoding process;
High frequency estimation means for estimating a high frequency component that can be expressed with a spatial resolution equal to or higher than the spatial resolution of the decoded signal from the decoded signal obtained by the decoding process, and generating a high frequency component estimated signal;
In the process of generating the high frequency component estimation signal, overemphasis suppressing means for suppressing overemphasis of the high frequency component estimation signal;
The quantization parameter used in the first encoding means is at least one of the degree of the high frequency component estimator in the high frequency estimation means and the degree of suppression of overemphasis in the overemphasis suppression means. Spatial enlargement means for obtaining a second video signal which is a high-resolution enlarged video signal obtained by performing a high-resolution processing controlled in response, and spatially enlarging the decoded signal based on the high-frequency component estimation signal ;
Second encoding means for obtaining second encoded data that is encoded data of a video signal on the higher resolution side, wherein the input video signal is encoded by prediction between spatial resolutions using a prediction signal ;
As the prediction signal used in the second encoding means, a predetermined prediction signal obtained in a hierarchy having a spatial resolution to be encoded by the second encoding means, and a lower resolution hierarchy Prediction signal selection means for selecting any one of the second video signal which is a prediction signal obtained based on the high frequency component estimation signal from:
Multiplexing means for multiplexing each of the first and second encoded data and the quantization parameter data ;
A video signal hierarchical encoding device comprising:

Encode a video signal having a resolution lower than that of the input video signal obtained by decomposing the input video signal into layers having different resolutions, generate a prediction signal from the video signal having a low resolution, and use the prediction signal A video signal hierarchical encoding method for encoding the input video signal on the higher resolution side by inter-spatial resolution prediction and obtaining encoded data of video signals of different resolutions,
A spatial reduction step of spatially reducing the input video signal to obtain a first video signal having a lower resolution than the input video signal;
A first encoding step of obtaining first encoded data obtained by encoding the first video signal using an encoding process including a decoding process;
A high frequency estimation step for estimating a high frequency component that can be expressed with a spatial resolution equal to or higher than a spatial resolution of the decoded signal from the decoded signal obtained by the decoding process, and generating a high frequency component estimated signal;
In the process of generating the high frequency component estimation signal, an overemphasis suppressing step for suppressing overemphasis of the high frequency component estimation signal;
The quantization parameter used in the first encoding step is at least one of the degree of the high frequency component estimator in the high frequency estimation step and the degree of suppression of overemphasis in the overemphasis suppression step. A spatial enlargement step of obtaining a second video signal that is a high-resolution enlarged video signal obtained by performing a high-resolution process controlled in response, and spatially enlarging the decoded signal based on the high-frequency component estimation signal ;
A second encoding step of obtaining second encoded data which is encoded data of a video signal on the higher resolution side, wherein the input video signal is encoded by prediction between spatial resolutions using a prediction signal ;
As the prediction signal used in the second encoding step, a predetermined prediction signal obtained in a layer having a spatial resolution to be encoded in the second encoding step, and a layer on the lower resolution side A prediction signal selection step of selecting any one of the second video signal which is a prediction signal obtained from the high frequency component estimation signal from:
A multiplexing step for multiplexing each of the first and second encoded data and the quantization parameter data ;
A video signal hierarchical encoding method comprising:

Encode a video signal having a resolution lower than that of the input video signal obtained by decomposing the input video signal into layers having different resolutions, generate a prediction signal from the video signal having a low resolution, and use the prediction signal A video signal hierarchical encoding program for encoding the input video signal on the higher resolution side by inter-spatial resolution prediction and causing a computer to execute an operation of obtaining encoded data of video signals of different resolutions,
Spatial reduction means for spatially reducing the input video signal to obtain a first video signal having a resolution lower than that of the input video signal;
First encoding means for obtaining first encoded data obtained by encoding the first video signal using an encoding process including a decoding process;
High frequency estimation means for estimating a high frequency component that can be expressed with a spatial resolution equal to or higher than the spatial resolution of the decoded signal from the decoded signal obtained by the decoding process, and generating a high frequency component estimated signal;
In the process of generating the high frequency component estimation signal, overemphasis suppressing means for suppressing overemphasis of the high frequency component estimation signal;
The quantization parameter used in the first encoding means is at least one of the degree of the high frequency component estimator in the high frequency estimation means and the degree of suppression of overemphasis in the overemphasis suppression means. Spatial enlargement means for obtaining a second video signal which is a high-resolution enlarged video signal obtained by performing a high-resolution processing controlled in response, and spatially enlarging the decoded signal based on the high-frequency component estimation signal ;
Second encoding means for obtaining second encoded data that is encoded data of a video signal on the higher resolution side, wherein the input video signal is encoded by prediction between spatial resolutions using a prediction signal ;
As the prediction signal used in the second encoding means, a predetermined prediction signal obtained in a hierarchy having a spatial resolution to be encoded by the second encoding means, and a lower resolution hierarchy Prediction signal selection means for selecting any one of the second video signal which is a prediction signal obtained based on the high frequency component estimation signal from:
Multiplexing means for multiplexing each of the first and second encoded data and the quantization parameter data ;
Video signal hierarchical encoding program for causing a computer to function.