KR20130078569A

KR20130078569A - Region of interest based screen contents quality improving video encoding/decoding method and apparatus thereof

Info

Publication number: KR20130078569A
Application number: KR1020110147588A
Authority: KR
Inventors: 임웅; 남정학; 유종훈; 오승준; 심동규
Original assignee: 광운대학교 산학협력단
Priority date: 2011-12-30
Filing date: 2011-12-30
Publication date: 2013-07-10

Abstract

PURPOSE: A method and a device of coding and decoding screen content video for improving region of interest (ROI) based screen quality are provided to improve subjective quality of the screen content video by determining an ROI considering screen content properties, assigning more bits in a main ROI, and reducing information amount assigned in another regions. CONSTITUTION: An ROI extraction unit (201) extracts ROI information on reference to an inputted original image. An image encoder (202) applies encoding methods according to an interest rate of each region on reference to the extracted ROI information. A rate-distortion adjustment unit (203) adjusts the quality of whole images and a generated bit rate and adjusts subjective quality and the generated bit rate according to the interest rate. The ROI information of a corresponding region is added to an output bit stream (204) when the coding is performed. [Reference numerals] (200) Input image; (201) ROI extraction unit; (202) Image encoder; (203) Rate-distortion adjustment unit; (204) Output bit stream

Description

Screen Encoding / Decoding Method for Screen Content Video Enhancement Based on Region of Interest and Its Device {Region of Interest based Screen Contents Quality Improving Video Encoding / Decoding Method and Apparatus Thereof}

The present invention proposes a method and apparatus for improving the image quality of a corresponding region by discriminating a main region of interest of the screen content video.

The screen content video may be an artificially produced image, or may exist in a mixed form with a general natural image. Unlike general natural video, screen content video has different characteristics from natural video such as limited color difference signal, relatively low noise and high chroma. Due to these characteristics, the ROI of the screen content video may be different from the ROI extraction result for the natural image. Therefore, to improve the image quality based on the region of interest of screen content video, it is necessary to consider the characteristics different from the natural image. Such screen content video compression is also considered in High Efficiency Video Coding (HEVC), which is currently being standardized.

Existing video encoding / decoding techniques have been developed for compression of natural video, which is generally acquired through a camera. However, Joint Collaborative Team on Video Coding (JCT-VC), jointly established by Moving Picture Experts Group (MPEG) from ISO / IEC and Video Coding Experts Group (VCEG) from ITU-T, is the next generation of video that is being standardized. In HEVC, the compression standard, CG (Computer Generated) image, mixed image of natural image and CG image, etc. are used as standard experimental image in order to consider high efficiency compression technology of screen content video. This may be considered to apply the next generation video codec to the field of screen contents such as animation, game, etc., and to apply video compression technology to images of various characteristics including natural video. Accordingly, according to the technical trend, the present invention proposes a method and apparatus for analyzing a region of interest based on characteristics of screen content video and for improving image quality of the region of interest.

The present invention proposes a method and apparatus for efficient compression of screen content video in an existing codec for efficiently compressing general natural video.

Screen content video has characteristics such as sharp edge change as in the text area, no noise in a specific area, and monotonous increase in pixel value. Therefore, blurring is performed at the edge of the edge by the conventional encoding / decoding method. blur) Noise may occur. In addition, a step phenomenon may occur in a monotonically increasing region or a ringing phenomenon may occur at a boundary of the region. In the case of an image mixed with a natural image and a screen content image, a method of minimizing noise in the screen content region may be applied by detecting the screen content region. However, the existing method of extracting the region of interest is based on characteristics such as contrast of the natural image and geometrical shape of the edge, and thus it is difficult to divide the screen content region into regions of interest. Accordingly, the present invention proposes a method of determining the screen content area and improving the image quality in consideration of the characteristics of the screen content for the screen content area.

In the present invention, the proposed method compresses the screen content video to determine the region of interest in consideration of the characteristics of the screen content, and allocates more bits to the major region of interest in consideration of the rate-distortion optimization aspect. By reducing the amount of information allocated to, it can be used to improve the subjective picture quality of screen content video while maintaining the amount of information similar to the existing compression process. In the proposed method, the main region of interest may be determined by referring to the input screen content video. In the encoding of the determined main ROI, the image quality of the main ROI of the screen content video may be improved by adaptively changing the encoding parameter of the main ROI, and may be decoded without transmitting additional information. In addition, by transmitting additional information on the ROI of the input screen content video and referencing it in the decoding process, a reconstructed image identical to the encoding process may be generated.

1 is a block diagram of the highest level of a decoding apparatus according to the present invention.
2 is a block diagram of the highest level of the encoding apparatus according to the present invention.
3 is a simplified block diagram of a decoding apparatus according to the present invention.
4 is a simplified block diagram of an encoder according to the present invention.
5 is a simplified block diagram of the ROI extractor according to the present invention.
6 is a simplified block diagram of a method and apparatus for performing intra prediction in consideration of characteristics of an input image.
7 is a schematic block diagram and an adaptive interpolation method for an inter-screen prediction method and apparatus based on a region of interest considering characteristics of screen content.
FIG. 8 is an embodiment of a quantization parameter table considering an interest map of a block unit extracted in consideration of characteristics of a screen content video with respect to an input image and a corresponding interest map.
9 is a diagram illustrating an example of a reference region and a filtering method of an in-loop post-processing filter considering characteristics of a region of interest and screen content video.

The present invention improves the quality of the main region of interest by extracting the main region of interest of the screen content video or the mixed image in which the screen content and the natural image exist together, and adjusting the bit amount of the region of interest and the other region. A method and apparatus for improving the overall subjective picture quality of input screen content by maintaining the overall bit amount similar to the existing method. The proposed method, in decoding the compressed bitstream, refers to a parsing module of main region of interest information and a parameter controller for adaptively adjusting parameters by referring to region of interest information generated through the module, and to a decoded screen content video slice. It may include an in-loop filter that corrects an error close to the original image by filtering the region of interest. In the following proposed method and apparatus and all processes describing the same, 'screen content' means that a part of the image is a screen content or the whole is composed of screen content.

The above objects, features and methods will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art to which the present invention pertains may easily implement the technical idea of the present invention. Could be. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 is an embodiment of the present invention.

A block diagram of the highest level of a decoding apparatus according to the present invention. In the proposed method, in the decoding of the input bitstream 100, the ROI information may be decoded by the ROI decoder 102 in consideration of characteristics of screen content transferred from the encoder. In addition, for an application such as fast decoding of a bitstream, it may be decoded in the same manner as a conventional video decoder without decoding a region of interest information. The decoded region of interest information may be referred to all modules considering the region of interest information in the image decoder 101, and the reconstructed image 103 is output through the decoding process.

2 is an embodiment of the present invention.

It is a block diagram of the highest stage of the encoding apparatus which concerns on this invention. In the proposed method, in encoding the input image 200, the ROI information of the corresponding image is extracted by referring to the original image input by the ROI extractor 201, which is the image encoder 202 and the rate-distortion control. The input image 200 may be efficiently encoded by referring to the controller 203. In addition, the input image 200 may be encoded by an existing method without a function of extracting ROI information and rate-distortion control for applications such as high speed encoding. In encoding the corresponding image by referring to the ROI information, the image encoder 202 may apply an encoding method according to the ROI of each region by referring to the ROI information extracted by the ROI extractor 201. have. In applying the encoding method according to the interest of each region, the rate-distortion adjusting unit 203 adjusts the image quality and the generated bit amount of the entire image, and adjusts the subjective quality and the generated bit amount according to the interest of each region. I can regulate it. When encoding is performed by referring to the ROI information generated by the ROI 201, the ROI of the corresponding image may be added to the output bitstream 204.

3 is an embodiment of the present invention.

Brief block diagram of a decoding apparatus according to the present invention. Referring to the simple decoding step, in decoding the compressed bitstream of the input screen content video, the ROI information generated in the encoding process may be decoded by the ROI information decoder 301. This may be referred to the decoding module later and may be used in the decoding process 300 to improve the subjective quality of the ROI. The entropy decoder 302 then decodes information necessary for reconstruction of the image. The transform coefficient decoded by the entropy decoder 302 is converted into a differential signal of a spatial axis through an inverse quantizer 303 and an inverse transform unit 304, which is an intra prediction unit 306 and a motion compensator 307. Restored via The reconstructed image may be improved in image quality closer to the original image through the filter unit 305 and stored in the reference image buffer 308 for reference in a later image. The region of interest information reconstructed by the region of interest information decoder 301 may be adjusted according to the degree of interest of the corresponding region based on the received quantization parameter, which is referred to during the inverse quantization of the inverse quantization unit 303. If the region to be dequantized by referring to the region of interest decoded by the proposed method is a region of high interest, the quality of the image is improved by lowering the value of the quantization parameter to the value of the received quantization parameter. Inverse quantization may be performed by increasing the value of the quantization parameter. In performing the intra prediction on the intra prediction unit 306, the number of prediction modes may be adaptively applied according to the interest of the corresponding region. In other words, the prediction may be performed in a direction subdivided from the existing intra prediction direction for the region having high interest, and the prediction may be performed in a smaller number of directions than the existing prediction direction in the region having low interest. can do. In performing the inter prediction in the motion compensator 307, the motion compensator may have an interpolation resolution and a resolution of a motion vector adaptively according to the interest of a corresponding region. An adaptive in-loop filter may be applied to the image reconstructed by the intra prediction unit 306 and the motion compensator 307 in consideration of the characteristics of the screen content. Also, the motion vectors of the currently decoded slice may be referred to in the ROI 401 to extract the interest of the slice to be subsequently decoded.

4 is an embodiment of the present invention.

Brief block diagram of an encoder according to the present invention. According to the present invention, the ROI information is extracted from the ROI information extractor 401 in consideration of the characteristics of the screen content. The extracted ROI information may be used later in the encoding process 400 to increase the subjective quality of the encoded image. The ROI information measured by the ROI information extractor 401 may be used in the encoding process 400 of the image and may be stored in the bitstream output during the encoding process. In addition, the operation of the ROI information extractor 401 may not be performed according to the characteristics of the input image. The temporal redundancy is referred to by referring to the previous reconstructed image stored in the intra prediction unit 405 and the reference image buffer 403, which removes the spatial redundancy with the reconstructed peripheral region. The inter prediction is performed through the motion predictor 401 to remove and the motion compensator 402 to compensate for it. The difference signal output through the prediction process is transformed to the frequency axis through the transform unit 406, and the transformed coefficient is quantized through the quantization unit 407. The quantized coefficients are then compressed by the entropy encoder 410 and output as a bitstream. The quantized coefficients output through the quantization unit 407 are restored to the differential signal through the inverse quantization unit 409 and the inverse transform unit 408 so as to be referred to in the subsequent encoding process. The reconstructed image is generated by being compensated with the same prediction value. The reconstructed image is additionally compensated for the loss generated during the encoding process through the filter unit 404 and the degradation is removed and stored in the reference image buffer 403. In order to improve the subjective quality of the screen content video, the ROI information extracted from the input image may include the motion predictor 401, the intra prediction unit 405, the quantization unit 407, and the filter unit 404 of the encoding process 400. ) May be considered. In order to improve the image quality of the ROI which has a large influence on the subjective image quality, the motion predictor 401 may adaptively increase or decrease the resolution of the interpolation resolution and the motion vector according to the interest of the corresponding region. . The intra prediction unit 405 may adaptively determine the number of intra prediction modes according to the degree of interest of the corresponding region by referring to the ROI information extracted from the input image. The quantization unit 407 may adaptively adjust the value of the quantization parameter with reference to the ROI information. The filter unit 404 for removing image quality improvement and deterioration of the reconstructed image may adaptively obtain filter coefficients in consideration of a region of interest and generate information on a region to which the filter is to be applied. In addition, the motion vector determined by the motion predictor 401 may be referenced to extract region of interest information about a slice to be encoded later, and may extract a high degree of interest.

5 is an embodiment of the present invention.

A simplified block diagram of a region of interest extractor 500 in accordance with the present invention. In extracting a region of interest with respect to the input image 501, the input image 501 may be input to the screen content feature extraction unit 503. In analyzing the characteristics of the input screen content video, the screen content feature extractor 503 may analyze and extract the characteristics of the corresponding image using brightness, saturation, color range, directionality, motion, histogram, and smoothness. have. Unlike natural video, screen content video has high contrast and high saturation in text areas. In addition, there is a characteristic that the range and number of colors used locally. According to the characteristics of the screen content, the characteristics of the screen content may be extracted and referred to in the process of determining the ROI.

Also, unlike still images, video may affect a region of interest by movement. Accordingly, the motion vector 502 of the past encoded or decoded image may be input to the screen content feature extractor 503. The region of interest with respect to the input image 501 may be determined by the region of interest extractor 504 for the characteristics of the screen content video analyzed in consideration of the input motion vector 502.

6 is an embodiment of the present invention.

A simplified block diagram of a method and a device for performing intra prediction in consideration of characteristics of an input image. By using the proposed method and apparatus, an intra prediction may be performed on the input image 600 in consideration of the interest of the corresponding region and the characteristics of the screen content video. In performing on-screen prediction of the screen content video, the on-screen prediction mode determiner 602 based on the screen content characteristic is inputted to the input image 600 and the interest level 601 of the corresponding area as input characteristics and interest of the screen content. An intra prediction may be performed by referring to. In performing the proposed intra prediction, it is possible to determine which direction prediction mode set to use among various intra predictions through a model between the input interest and the screen content characteristic of the corresponding region. The prediction may be performed in a small intra prediction mode according to the input interest level 601, or may be performed using a detailed prediction mode in many directions. For example, when the interest level of the prediction unit block of the currently input image is high, the prediction is performed by additionally using the detailed intra prediction mode set 604 in addition to the existing intra prediction mode set 603, By using the existing prediction mode set 603 for the low unit block, it is possible to generate less information for the prediction mode. In performing on-screen prediction using the input image and the reconstructed region around the screen, the on-screen prediction unit 602 based on the screen content characteristics, based on the sharp edges, complex shapes, etc. of the screen content, does not have a curved line or a straight line prediction. Prediction accuracy can be improved by generating prediction signals in various directions modeling edge shapes.

7 is an embodiment of the present invention.

A brief block diagram of a method and apparatus for inter-screen prediction based on a region of interest considering characteristics of screen content. In performing inter-screen prediction based on the ROI in consideration of characteristics of screen content, adaptive reference image interpolation may be applied according to the interest 701 of the corresponding region input together with the input image 700. In order to find the optimal prediction signal for the input image with respect to the reference image stored in the reconstructed image buffer 702, the adaptive image interpolator 703 may perform adaptive interpolation having various resolutions in consideration of the interest of the corresponding region. . In the interpolation of the reference image by the adaptive image interpolator 703, the interpolation of the resolution of 1/8 pixel or more is adaptively performed by referring to the integer pixels 707 when the interest of the corresponding region is high. can do. If the interest of the region is low for the reference region of the block on which the current inter prediction is to be performed, the 1/2 pixel 708 and the 1/4 pixel 709 are referred to the integer pixel 707 of the reference image. Interpolation up to resolution can be performed. If the area of interest is high, interpolation is performed to 1/2 pixel 708, 1/4 pixel 709 and 1/8 pixel 710 with reference to the integer pixel 707 of the reference image. Can be. The adaptive inter picture prediction unit 704 may determine the difference signal 705 and the motion vector 706 by referring to the prediction signal with reference to the interpolated image. In performing the inter prediction, the motion vector 706 determined by the adaptive inter prediction unit 704 may be determined by the interpolation resolution determined according to the interest of the corresponding region. If the interest 701 of the region is high, the adaptive image interpolator 703 may perform interpolation having a resolution of 1/8 pixel with respect to the reference image stored in the reconstructed image buffer 702. In this case, the resolution of the motion vector 706 determined by the adaptive inter prediction unit 704 is 1/8, which is the same as the interpolation resolution of the corresponding reference image.

8 is an embodiment of the present invention.

An interest map 800 of a block unit extracted by considering characteristics of screen content video with respect to an input image. This is the result of extracting the characteristics of the screen content for each block, taking into account this, and measuring the interest of the block. With reference to the interest level map 800 of the extracted block unit with respect to the original image, encoding may be performed considering the subjective quality of the video. In the encoding process of the screen content video, a quantization parameter having a different value may be adaptively applied according to the interest of each region with reference to the interest map 800. Using the quantization value offset table 801 according to the degree of interest, an offset according to the degree of interest for each region may be added to the basic quantization parameter value of the current slice and applied in the encoding process. By using the extracted interest map 800 and the quantization parameter offset table 801, the image quality of the ROI may be improved by using the quantization parameter value 802 applied to each block in the encoding process. The interest map 500 and the quantization value offset table 801 according to the interest may be added to the bitstream of the corresponding video and transmitted. In adding the interest map 800 to the bitstream, the interest for each block may be predicted by the interest of neighboring neighboring blocks, and may be transmitted by encoding a differential signal. The prediction of the degree of interest may be performed according to the coding order of each block. That is, a variable length coding (VLC) or a context-adaptive variable length coding (CAVLC) according to the interest of a neighboring region is obtained by obtaining a difference signal between the interest of the current block and the interest of a previously encoded block. Length Coding) or Context Adaptive Binary Arithmetic Coding (CABAC). The quantization parameter offset table 801 according to the interest may be determined linearly or nonlinearly with reference to the interest map 800, and may be post-coded and transmitted like the interest map 800. An interest map 800 and a quantization parameter offset table 801 of each slice may be added to the slice header and transmitted. Alternatively, the predefined table may be used in the encoding process and transmission of the corresponding information may be omitted.

9 is an embodiment of the present invention.

An in-loop post-processing filter considering the region of interest and screen content video may be applied. Applying a filter to improve the image quality of the screen content video decoded with reference to the ROI information 900 output through the ROI information extractor of the encoder and the ROI information decoder of the decoder to have maximum similarity with the original image. In this regard, the degree of interest and screen content characteristics of each region may be considered. In applying the filter considering the interest, the filter may differentially refer to the priority from the region having high interest to the region of low interest in the image. That is, the optimum filter may be obtained by referring to only the region 901 having a high interest according to the interest of the region of interest information 900 and applied to the reconstructed image. In addition, a filter 903 for the natural video region and a filter 902 for the screen content video region of the reference region are obtained, and the characteristics of the screen content region 904 and the natural image region 905 inside the reference region are obtained. Different filters can be applied accordingly.

Claims

An apparatus for extracting a region of interest based on characteristics of an input screen content image or a mixed image with a natural image including screen content in transmitting, compressing, and processing an image, and compressing the image by referring to the same.
Region of interest extraction unit for extracting characteristics and interests of the screen content with reference to the input image, Compressor for adaptively adjusting parameters and codewords during image compression using extracted region of interest information, extracted region of interest information And an optional transmitter of the RO, a decoder of the ROI information, and the module.

The method of claim 1, wherein in analyzing the characteristics of the input image, spatially, brightness, saturation, histogram-based color frequency, directionality, smoothness, transform coefficient, and the like are used temporally. And an ROI extractor.

2. The image unit of claim 1, wherein in performing intra prediction on the input image, the intra prediction is performed by adaptively determining the number of intra prediction modes according to the degree of interest of each region. Decryption device.

The inter prediction unit of claim 1, further comprising an inter-screen prediction unit configured to adaptively determine the interpolation resolution of the reference image and the resolution of the motion vector based on the input image and the interest of the corresponding region. Video encoding / decoding device, characterized in that.

The quantization method of claim 1, wherein the input image and the interest of the corresponding region are used to adaptively adjust the value of the quantization parameter of each region or to refer to a fixed quantization parameter offset table or to transmit optimal quantization parameter offset information. Parameter adjusting unit, video encoding / decoding apparatus comprising such a module.

The method of claim 1, wherein in applying the adaptive filter to the reconstructed image, the filter is applied by dividing the screen content region and the natural image region with reference to the extracted interest, or applying the filter to a high region of interest based on the extracted interest. Filter unit for applying an optimal filter for the image encoding / decoding apparatus comprising a module.