KR101019010B1

KR101019010B1 - Preprocessor method and apparatus

Info

Publication number: KR101019010B1
Application number: KR1020087026885A
Authority: KR
Inventors: 타오 톈; 팡 류; 팡 스; 비자얄라크시미 알 라빈드란
Original assignee: 퀄컴 인코포레이티드
Priority date: 2006-04-03
Filing date: 2007-03-13
Publication date: 2011-03-04
Also published as: JP2015109662A; JP2013031171A; CN104159060B; EP2002650A1; JP2009532741A; KR20090006159A; JP5897419B2; KR101377370B1; TW200803504A; KR101373896B1; AR060254A1; KR20120091423A; KR20100126506A; CN104159060A; WO2007114995A1; KR20110128366A; KR20140010190A; JP6352173B2; KR101127432B1

Abstract

FIELD OF THE INVENTION The present invention generally relates to multimedia data processing, and more particularly, to processing operations performed before or in conjunction with data compression processing. A method of processing multimedia data includes receiving interlaced video frames, obtaining metadata for the interlaced video frames, and converting the interlaced video frames into sequential video using at least a portion of the metadata. And providing at least a portion of the sequential video and metadata to an encoder used to encode the sequential video. The method also includes generating spatial information and bidirectional motion information for the interlaced video frame, and generating sequential video based on the interlaced video frame and using spatial and bidirectional motion information. do.

Multimedia, Data Processing, Data Compression, Metadata, Sequential Video

Description

Preprocessor method and apparatus {PREPROCESSOR METHOD AND APPARATUS}

35 U.S.C.§119 에 따른 우선권 주장Claims of Priority under 35 U.S.C. §119

본 특허 출원은, 본원의 양수인에게 모두 양도되며 본원에서 명확히 참조로서 병합하고 있는, 2006 년 4 월 3 일 출원된 미국 가출원 제 60/789,048 호와, 2006 년 4 월 4 일 출원된 미국 가출원 제 60/789,266 호와, 2006 년 4 월 4 일 출원된 미국 가출원 제 60/789,377 호에 대한 우선권을 주장하고 있다.This patent application is issued to U.S. Provisional Application No. 60 / 789,048, filed Apr. 3, 2006, and assigned U.S. Provisional Application No. 60, filed April 4, 2006, all assigned to the assignee herein and hereby expressly incorporated by reference. / 789,266 and US Provisional Application No. 60 / 789,377, filed April 4, 2006.

배경background

분야Field

본 출원은 일반적으로 멀티미디어 데이터 처리에 관한 것으로, 더욱 상세하게는, 데이터 압축 처리 전에 또는 데이터 압축 처리와 함께 수행되는 처리 동작에 관한 것이다.The present application generally relates to multimedia data processing, and more particularly, to a processing operation performed before or in conjunction with a data compression process.

배경background

개요summary

본원에 설명되는 본 발명의 장치 및 방법의 각각은 몇몇 양태를 가지며, 그 양태들 중 어떤 단일의 양태도 그 바람직한 속성에 대해 단독으로 책임을 지는 것은 아니다. 이하, 본 발명의 범위를 제한함 없이, 본 발명의 보다 현저한 특징들을 간략하게 설명할 것이다. 본 설명을 고려한 다음에, 특히 "상세한 설명" 으로 지칭되는 부분을 읽은 다음에, 당업자라면 본 발명의 특징들이 멀티미디어 데 이터 처리 장치 및 방법에 대한 개선을 어떻게 제공하는지를 이해할 것이다.Each of the devices and methods of the invention described herein has several aspects, and no single aspect of those aspects is solely responsible for its desirable attributes. Hereinafter, more prominent features of the present invention will be briefly described without limiting its scope. After considering the present description, and in particular after reading the section referred to as the "detailed description", those skilled in the art will understand how the features of the present invention provide improvements to the multimedia data processing apparatus and method.

일 양태에서, 멀티미디어 데이터를 처리하는 방법은 인터레이싱된 비디오 프레임들을 수신하는 단계, 인터레이싱된 비디오 프레임들을 순차 비디오로 변환하는 단계, 순차 비디오와 연관된 메타데이터를 생성하는 단계, 및 순차 비디오 및 메타데이터의 적어도 일부를 순차 비디오를 인코딩하는데 이용되는 인코더에 제공하는 단계를 포함한다. 그 방법은 메타데이터를 이용하여 순차 비디오를 인코딩하는 단계를 더 포함할 수 있다. 몇몇 양태에서, 인터레이싱된 비디오 프레임은 NTSC 비디오를 포함한다. 비디오 프레임을 변환하는 단계는 인터레이싱된 비디오 프레임을 디인터레이싱하는 단계를 포함할 수 있다.In one aspect, a method of processing multimedia data includes receiving interlaced video frames, converting interlaced video frames into sequential video, generating metadata associated with sequential video, and sequential video and meta Providing at least a portion of the data to an encoder used to encode the sequential video. The method may further comprise encoding the sequential video using metadata. In some aspects, the interlaced video frame includes NTSC video. Converting the video frame may include deinterlacing the interlaced video frame.

몇몇 양태에서, 메타데이터는 대역폭 정보, 양방향 모션 정보, 대역폭 비율, 시간 및/또는 공간 복잡도와 같은 복잡도 값, 휘도 정보를 포함할 수 있고, 공간 정보는 휘도 및/또는 크로미넌스 정보를 포함할 수 있다. 또한, 그 방법은 인터레이싱된 비디오 프레임에 대한 공간 정보 및 양방향 모션 정보를 생성하는 단계, 및 인터레이싱된 비디오 프레임에 기초하고 공간 및 양방향 모션 정보를 이용하여 순차 비디오를 생성하는 단계를 포함할 수 있다. 몇몇 양태에서, 인터레이싱된 비디오 프레임을 변환하는 단계는 3/2 풀다운 비디오 프레임을 역 텔레시네하는 단계, 및/또는 순차 비디오를 리사이징하는 단계를 포함한다. 그 방법은 순차 비디오를 분할하여 영상 그룹 정보를 결정하는 단계를 더 포함할 수 있고, 분할 단계는 순차 비디오의 샷 검출을 포함할 수 있다. 또한, 몇몇 양태에서, 그 방법은 잡음 제거 필터로 순차 비디오를 필터링하는 단계를 포함한다.In some aspects, the metadata may include bandwidth information, bidirectional motion information, bandwidth ratios, complexity values such as time and / or spatial complexity, luminance information, and the spatial information may include luminance and / or chrominance information. Can be. The method may also include generating spatial information and bidirectional motion information for the interlaced video frame, and generating sequential video based on the interlaced video frame and using spatial and bidirectional motion information. have. In some aspects, converting the interlaced video frame includes inverse telecine of a 3/2 pulldown video frame, and / or resizing the sequential video. The method can further include dividing the sequential video to determine image group information, and the dividing step can include shot detection of the sequential video. Also, in some aspects, the method includes filtering the sequential video with a noise canceling filter.

다른 양태에서, 멀티미디어 데이터를 처리하는 장치는 인터레이싱된 비디오 프레임을 수신하도록 구성된 수신기, 인터레이싱된 비디오 프레임을 순차 비디오로 변환하도록 구성된 디인터레이서, 및 순차 비디오와 연관된 메타데이터를 생성하고, 순차 비디오 및 메타데이터를 순차 비디오를 인코딩하는데 이용되는 인코더에 제공하도록 구성된 분할기를 포함할 수 있다. 몇몇 양태에서, 그 장치는 통신 모듈로부터 순차 비디오를 수신하고, 제공된 메타데이터를 이용하여 순차 비디오를 인코딩하도록 구성된 인코더를 더 포함할 수 있다. 디인터레이서는 공간-시간 디인터레이싱 및/또는 역 텔레시네를 수행하도록 구성될 수 있다. 분할기는 샷 검출을 수행하고, 샷 검출에 기초하여 압축 정보를 생성하도록 구성될 수 있다. 몇몇 양태에서, 분할기는 대역폭 정보를 생성하도록 구성될 수 있다. 또한, 그 장치는 순차 프레임을 리사이징하도록 구성된 리샘플러를 포함할 수 있다. 메타데이터는 대역폭 정보, 양방향 모션 정보, 대역폭 비율, 휘도 정보, 콘텐츠에 관련된 공간 복잡도 값 및/또는 콘텐츠에 관련된 시간 복잡도 값을 포함할 수 있다. 몇몇 양태에서, 디인터레이서는 인터레이싱된 비디오 프레임에 기초하고 공간 및 양방향 모션 정보를 이용하여 인터레이싱된 비디오 프레임 및 순차 비디오에 대한 공간 정보 및 양방향 모션 정보를 생성하도록 구성된다.In another aspect, an apparatus for processing multimedia data includes a receiver configured to receive an interlaced video frame, a deinterlacer configured to convert the interlaced video frame into sequential video, and generate metadata associated with the sequential video, and It may include a divider configured to provide metadata to an encoder used to encode sequential video. In some aspects, the apparatus can further include an encoder configured to receive the sequential video from the communication module and to encode the sequential video using the provided metadata. The deinterlacer may be configured to perform space-time deinterlacing and / or reverse telecine. The divider may be configured to perform shot detection and generate compressed information based on the shot detection. In some aspects, the divider may be configured to generate bandwidth information. The apparatus may also include a resampler configured to resize the sequential frames. The metadata may include bandwidth information, bidirectional motion information, bandwidth ratios, luminance information, spatial complexity values associated with the content, and / or time complexity values associated with the content. In some aspects, the deinterlacer is based on the interlaced video frame and is configured to generate spatial information and bidirectional motion information for the interlaced video frame and sequential video using spatial and bidirectional motion information.

다른 양태는 멀티미디어 데이터를 처리하는 장치를 포함하고, 멀티미디어 데이터 처리 장치는 인터레이싱된 비디오 프레임을 수신하는 수단, 인터레이싱된 비디오 프레임을 순차 비디오로 변환하는 수단, 순차 비디오와 연관된 메타데이터를 생성하는 수단, 및 순차 비디오 및 메타데이터 중 적어도 일부를 순차 비디오를 인 코딩하는데 이용되는 인코더에 제공하는 수단을 포함한다. 몇몇 양태에서, 변환 수단은 역 텔레시네 장치 및/또는 공간-시간 디인터레이서를 포함한다. 몇몇 양태에서, 생성 수단은 샷 검출을 수행하고, 샷 검출에 기초하여 압축 정보를 생성하도록 구성된다. 몇몇 양태에서, 생성 수단은 대역폭 정보를 생성하도록 구성된다. 몇몇 양태에서, 생성 수단은 순차 프레임을 리사이징하도록 리샘플링하는 수단을 포함한다.Another aspect includes an apparatus for processing multimedia data, the apparatus for processing multimedia data comprising means for receiving interlaced video frames, means for converting interlaced video frames into sequential video, and generating metadata associated with sequential video. Means, and means for providing at least some of the sequential video and metadata to an encoder used to encode the sequential video. In some aspects, the conversion means comprises an inverse telecine device and / or a space-time deinterlacer. In some aspects, the generating means is configured to perform shot detection and generate compressed information based on the shot detection. In some aspects, the generating means is configured to generate bandwidth information. In some aspects, the means for generating comprises means for resampling to resize a sequential frame.

다른 양태는, 실행 시, 머신으로 하여금, 인터레이싱된 비디오 프레임을 수신하게 하고, 인터레이싱된 비디오 프레임을 순차 비디오로 변환하게 하고, 순차 비디오와 연관된 메타데이터를 생성하게 하고, 순차 비디오 및 메타데이터의 적어도 일부를 순차 비디오를 인코딩하는데 이용되는 인코더로 제공하게 하는, 멀티미디어 데이터를 처리하는 명령들을 포함한 머신 판독가능 매체를 포함한다.Another aspect, when executed, causes a machine to receive an interlaced video frame, convert the interlaced video frame into sequential video, generate metadata associated with the sequential video, and generate sequential video and metadata And a machine readable medium comprising instructions for processing multimedia data, the at least one portion being provided to an encoder used to encode sequential video.

다른 양태는, 인터레이싱된 비디오를 수신하고, 인터레이싱된 비디오를 순차 비디오로 변환하고, 순차 비디오와 연관된 메타데이터를 생성하고, 순차 비디오 및 메타데이터의 적어도 일부를 순차 비디오를 인코딩하는데 이용되는 인코더에 제공하는 구성을 포함한 프로세서를 포함한다. 인터레이싱된 비디오의 변환은 공간-시간 디인터레이싱을 수행하는 것을 포함할 수 있다. 몇몇 양태에서, 인터레이싱된 비디오의 변환은 역 텔레시네를 수행하는 것을 포함한다. 몇몇 양태에서, 메타데이터의 생성은 샷 변화의 검출에 기초하여 압축 정보를 생성하는 것을 포함한다. 몇몇 양태에서, 메타데이터의 생성은 순차 비디오의 압축 정보를 결정하는 것을 포함한다. 몇몇 양태에서, 상기 구성은 비디오를 리샘플링하여 리 사이징된 순차 프레임을 생성하는 구성을 포함한다. 몇몇 양태에서, 메타데이터는 대역폭 정보, 양방향 모션 정보, 콘텐츠에 기초한 시간 또는 공간 복잡도 정보와 같은 복잡도 정보, 및/또는 압축 정보를 포함할 수 있다.Another aspect is an encoder used to receive interlaced video, convert the interlaced video into sequential video, generate metadata associated with the sequential video, and encode at least a portion of the sequential video and metadata to sequential video. It includes the processor, including the configuration provided to. Transformation of the interlaced video can include performing space-time deinterlacing. In some aspects, the transformation of the interlaced video includes performing inverse telecine. In some aspects, the generation of metadata includes generating compressed information based on the detection of shot changes. In some aspects, the generation of the metadata includes determining compression information of the sequential video. In some aspects, the configuration includes a configuration to resample the video to produce a resized sequential frame. In some aspects, the metadata may include bandwidth information, bidirectional motion information, complexity information such as temporal or spatial complexity information based on content, and / or compression information.

도면의 간단한 설명Brief description of the drawings

도 1 은 스트리밍 멀티미디어 데이터를 전달하는 통신 시스템의 블록도이다.1 is a block diagram of a communication system for delivering streaming multimedia data.

도 2 는 전처리기를 포함하는 디지털 송신 설비의 블록도이다.2 is a block diagram of a digital transmission facility that includes a preprocessor.

도 3a 는 전처리기의 예시적인 양태의 블록도이다.3A is a block diagram of an exemplary embodiment of a preprocessor.

도 3b 는 멀티미디어 데이터를 처리하는 프로세스를 도시한 흐름도이다.3B is a flowchart illustrating a process of processing multimedia data.

도 3c 는 멀티미디어 데이터를 처리하는 수단을 도시한 블록도이다.3C is a block diagram illustrating means for processing multimedia data.

도 4 는 예시적인 전처리기의 동작을 도시한 블록도이다.4 is a block diagram illustrating operation of an exemplary preprocessor.

도 5 는 역 텔레시네 프로세스에 있어서 위상 결정의 도면이다.5 is a diagram of phase determination in an inverse telecine process.

도 6 은 텔레시네된 비디오를 역 텔레시네하는 프로세스를 도시한 흐름도이다.6 is a flow chart illustrating a process for reverse telecine of telecined video.

도 7 은 위상 전이를 도시한 트렐리스 도면이다.7 is a trellis diagram illustrating the phase transition.

도 8 은 복수의 행렬을 생성하는데 이용되는 각각의 프레임을 식별하기 위한 가이드이다.8 is a guide for identifying each frame used to generate a plurality of matrices.

도 9 는 도 8 의 메트릭이 생성되는 방법을 도시한 흐름도이다.9 is a flow chart illustrating how the metric of FIG. 8 is generated.

도 10 은 메트릭을 처리하여 추정된 위상을 얻는 것을 도시한 흐름도이다.10 is a flowchart illustrating processing a metric to obtain an estimated phase.

도 11 은 결정 변수를 생성하는 시스템을 도시한 데이터 흐름도이다.11 is a data flow diagram illustrating a system for generating decision variables.

도 12 는 분기 정보를 평가하는데 이용되는 변수를 도시한 블록도이다.12 is a block diagram illustrating variables used to evaluate branch information.

도 13a, 도 13b 및 도 13c 는 하부 포락선이 계산되는 방법을 도시한 흐름도이다.13A, 13B and 13C are flowcharts illustrating how the lower envelope is calculated.

도 14 는 일치성 검출기의 동작을 도시한 흐름도이다.14 is a flowchart illustrating the operation of the coincidence detector.

도 15 는 위상 결정의 불일치를 보상하는데 이용되는 결정 변수에 대한 오프셋을 계산하는 프로세스를 도시한 흐름도이다.15 is a flow diagram illustrating a process of calculating an offset for a decision variable used to compensate for a mismatch in phase determination.

도 16 은 풀 다운 위상이 추정된 후의 역 텔레시네 동작을 표현한다.16 represents inverse telecine operation after a pull down phase is estimated.

도 17 은 디인터레이서 디바이스의 블록도이다.17 is a block diagram of a deinterlacer device.

도 18 은 다른 디인터레이서 디바이스의 블록도이다.18 is a block diagram of another deinterlacer device.

도 19 는 인터레이싱된 영상의 서브샘플링 패턴의 도면이다.19 is a diagram of a subsampling pattern of an interlaced image.

도 20 은 Wmed 필터링 모션 추정을 이용하여 디인터레이싱된 프레임을 생성하는 디인터레이서 디바이스의 블록도이다.20 is a block diagram of a deinterlacer device for generating a deinterlaced frame using Wmed filtered motion estimation.

도 21 은 멀티미디어 데이터의 정적 영역을 결정하는 애퍼처 (aperture) 의 일 양태를 도시한다.21 illustrates an aspect of an aperture that determines a static area of multimedia data.

도 22 는 멀티미디어 데이터의 슬로우 모션 영역을 결정하는 애퍼처의 일 양태를 도시하는 도면이다.FIG. 22 is a diagram illustrating an aspect of an aperture that determines a slow motion region of multimedia data. FIG.

도 23 은 모션 추정의 일 양태를 도시하는 도면이다.23 is a diagram illustrating an aspect of motion estimation.

도 24 는 모션 보상 결정에 이용되는 2 개의 모션 벡터 맵을 도시한다.24 illustrates two motion vector maps used for determining motion compensation.

도 25 는 멀티미디어 데이터를 디인터레이싱하는 방법을 도시하는 흐름도이다.25 is a flowchart illustrating a method of deinterlacing multimedia data.

도 26 은 공간-시간 정보를 이용하여 디인터레이싱된 프레임을 생성하는 방 법을 도시한 흐름도이다.FIG. 26 is a flowchart illustrating a method of generating a deinterlaced frame using space-time information.

도 27 은 디인터레이싱을 위해 모션 보상을 수행하는 방법을 도시한 흐름도이다.27 is a flowchart illustrating a method of performing motion compensation for deinterlacing.

도 28 은 몇몇 양태에 따라 샷 검출 및 다른 전처리 동작을 위해 구성된 프로세서를 포함한 전처리기의 블록도이다.28 is a block diagram of a preprocessor including a processor configured for shot detection and other preprocessing operations in accordance with some aspects.

도 29 는 인코딩 복잡도 C 와 할당된 비트 B 간의 관계를 도시한다.29 shows the relationship between encoding complexity C and allocated bit B. FIG.

도 30 은 영상 그룹을 처리하며, 몇몇 양태에서 이용되어 비디오 프레임에서의 샷 검출에 기초하여 비디오를 인코딩할 수 있는 프로세스를 도시한 흐름도이다.30 is a flow diagram illustrating a process that processes groups of pictures and may be used in some aspects to encode video based on shot detection in video frames.

도 31 은 샷 검출에 대한 프로세스를 도시한 흐름도이다.31 is a flowchart illustrating a process for shot detection.

도 32 는 비디오의 상이한 샷 분류를 결정하는 프로세스를 도시한 흐름도이다.32 is a flowchart illustrating a process of determining different shot classifications of a video.

도 33 은 샷 검출 결과에 기초하여 비디오 프레임에 프레임 압축 방식을 할당하는 프로세스를 도시한 흐름도이다.33 is a flowchart illustrating a process of assigning a frame compression scheme to video frames based on shot detection results.

도 34 는 갑작스런 장면 전환을 결정하는 프로세스를 도시한 흐름도이다.34 is a flowchart illustrating a process of determining a sudden scene change.

도 35 는 느리게 변하는 장면을 결정하는 프로세스를 도시한 흐름도이다.35 is a flow chart illustrating a process of determining a slowly changing scene.

도 36 은 카메라 플래시라이트를 포함하는 장면을 결정하는 프로세스를 도시한 흐름도이다.36 is a flowchart illustrating a process of determining a scene including a camera flashlight.

도 37 은 이전 프레임 MV_P 와 현재 프레임과 다음 프레임 MV_N 간의 모션 보상 벡터를 도시한다.37 shows a motion compensation vector between the previous frame MV _P and the current frame and the next frame MV _N.

도 38 은 프레임 차이 메트릭을 결정하는데 이용되는 변수에 대한 관계를 도시한 그래프이다.38 is a graph illustrating the relationship to variables used to determine a frame difference metric.

도 39 는 데이터 인코딩 및 레지듀얼 (residual) 계산을 도시한 블록도이다.FIG. 39 is a block diagram illustrating data encoding and residual calculation. FIG.

도 40 은 프레임 차이 메트릭의 결정을 도시한 블록도이다.40 is a block diagram illustrating determination of a frame difference metric.

도 41 은 압축 타입을 프레임에 할당하는 절차를 도시한 흐름도이다.41 is a flowchart illustrating a procedure of assigning a compression type to a frame.

도 42 는 1-D 다상 (poly-phase) 리샘플링의 일 예를 도시한다.42 shows an example of 1-D poly-phase resampling.

도 43 은 데이터 프레임의 세이프 액션 영역 및 세이프 타이틀 영역을 도시한 그래픽이다.43 is a graphic showing the safe action area and the safe title area of a data frame.

도 44 는 데이터 프레임의 세이프 액션 영역을 도시한 그래픽이다.44 is a graphic illustrating a safe action region of a data frame.

상세한 설명details

다음 설명은 본 발명의 예들의 완전한 이해를 제공하기 위한 상세한 설명을 제공한다. 그러나, 당업자라면, 일 예 또는 양태의 프로세스 또는 디바이스의 모든 상세가 본원에 설명되거나 도시되어 있지는 않더라도 본 발명의 예들을 실시할 수도 있다는 것을 알 수 있다. 예를 들어, 전기 컴포넌트는, 본 발명의 예들을 지나치게 상세히 설명하여 불명확하게 되지 않도록 전기 컴포넌트의 모든 전기 접속 또는 모든 전기 소자를 도시하지는 않는 블록도로 도시될 수도 있다. 다른 경우에는, 그러한 컴포넌트, 다른 구조 및 기술을 상세히 도시하여, 본 발명의 예들을 더 설명할 수도 있다.The following description provides a detailed description to provide a thorough understanding of the examples of the invention. However, one of ordinary skill in the art appreciates that although all the details of an example or aspect of a process or device may not be described or illustrated herein, examples of the invention may be practiced. For example, the electrical component may be shown in block diagrams, not all electrical connections or all electrical components of the electrical component are illustrated so as not to obscure the examples of the present invention in too much detail. In other instances, such components, other structures and techniques may be shown in detail to further illustrate examples of the present invention.

본원에는, 기존 전처리 및 인코딩 시스템의 성능을 개선하는 전처리기 및 전처리기 동작 방법에 대한 어떤 발명의 양태 및 양태들이 설명되어 있다. 그러 한 전처리기는 디인터레이싱, 역 텔레시네, 필터링, 샷 타입의 식별, 메타데이터의 처리와 생성, 및 대역폭 정보의 생성을 수행하는 것을 비롯한 인코딩 준비에 있어서 메타데이터 및 비디오를 처리할 수 있다. 본원에서, "한 양태", "일 양태", "어떤 양태", 또는 "일정 양태" 에 대한 참조는, 그 양태와 관련하여 설명된 특정 피처, 구조, 또는 특징 중 하나 이상이 전처리기 시스템의 적어도 하나의 양태에 포함될 수 있음을 의미한다. 본 명세서의 여러 곳에서 그러한 문구의 사용은 반드시 모두가 동일한 양태를 지칭하는 것은 아니며, 기타 양태들을 서로 제외한 별개의 양태 또는 다른 양태를 지칭하는 것도 아니다. 또한, 어떤 양태에서는 나타날 수도 있지만 다른 양태에서는 나타나지 않을 수도 있는 여러 피처가 설명된다. 이와 유사하게, 어떤 양태를 위한 단계일 수도 있지만 다른 양태를 위한 단계는 아닐 수도 있는 여러 단계가 설명된다.Disclosed herein are aspects and aspects of certain inventions for preprocessors and methods of operating preprocessors that improve the performance of existing preprocessing and encoding systems. Such preprocessors can process metadata and video in preparation for encoding, including performing deinterlacing, inverse telecine, filtering, identification of shot types, processing and generation of metadata, and generation of bandwidth information. References herein to “an aspect”, “an aspect”, “an aspect”, or “an aspect” refer to one or more of the particular features, structures, or features described in connection with the aspect of the preprocessor system. It can be included in at least one embodiment. The use of such phrases in various places in the specification is not necessarily all referring to the same aspect, nor does it refer to separate or other aspects other than the other aspects. In addition, various features are described that may appear in some aspects but not in other aspects. Similarly, various steps are described that may be steps for some aspects but not steps for other aspects.

본원에서 이용되는 "멀티미디어 데이터" 또는 "멀티미디어" 는 비디오 데이터 (오디오 데이터를 포함할 수 있음), 오디오 데이터, 또는 비디오 데이터와 오디오 데이터 모두를 포함하는 광의의 용어이다. 본원에서 이용되는 "비디오 데이터" 또는 "비디오" 는 광의의 용어로서, 텍스트, 이미지, 및/또는 오디오 데이터를 포함한 하나 이상의 이미지 시리즈나 시퀀스 또는 하나의 이미지를 지칭하며, 달리 지정되지 않는 한, 멀티미디어 데이터를 지칭하는데 이용될 수 있고, 또는 서로 교환 가능하게 이용될 수도 있다.As used herein, “multimedia data” or “multimedia” is a broad term that includes video data (which may include audio data), audio data, or both video data and audio data. As used herein, “video data” or “video” is a broad term and refers to one or more image series or sequences or one image, including text, images, and / or audio data, and unless otherwise specified, multimedia It may be used to refer to data, or may be used interchangeably.

도 1 은 스트리밍 멀티미디어를 전달하는 통신 시스템 (100) 의 블록도이다. 그러한 시스템은 도 1 에 도시된 것과 같은 다수의 단말기로 디지털 압축된 비 디오를 송신하는데 적용된다. 디지털 비디오 소스는, 예를 들어, 디지털 케이블 또는 위성 피드 (satellite feed) 또는 디지털화되는 아날로그 소스일 수 있다. 비디오 소스는 송신 설비 (120) 에서 처리되고, 하나 이상의 단말기 (160) 로 네트워크 (140) 를 통한 송신을 위해 캐리어 상에서 인코딩 및 변조된다. 단말기 (160) 는 수신된 비디오를 디코딩하고, 통상 비디오의 적어도 일부를 표시한다. 네트워크 (140) 는 인코딩된 데이터를 송신하는데 적합한 임의 타입의 통신 네트워크, 유선 또는 무선을 지칭한다. 예를 들어, 네트워크 (140) 는 휴대 전화 네트워크, 유선 또는 무선 LAN (local area network) 또는 WAN (wide area network), 또는 인터넷일 수 있다. 단말기 (160) 는 휴대 전화, PDA, 가정용이나 상업용 비디오 디스플레이 장비, 컴퓨터 (휴대용, 랩톱, 핸드헬드, PC, 및 대규모 서버 기반 컴퓨터 시스템), 및 멀티미디어 데이터를 이용할 수 있는 개인용 엔터테인먼트 디바이스를 비롯한, 데이터를 수신 및 표시할 수 있는 임의 타입의 통신 디바이스일 수 있지만, 이에 한정되는 것은 아니다.1 is a block diagram of a communication system 100 for delivering streaming multimedia. Such a system is applied to transmitting digitally compressed video to multiple terminals, such as shown in FIG. The digital video source can be, for example, a digital cable or satellite feed or an analog source to be digitized. The video source is processed at the transmission facility 120 and encoded and modulated on the carrier for transmission over the network 140 to one or more terminals 160. Terminal 160 decodes the received video and typically displays at least a portion of the video. Network 140 refers to any type of communication network, wired or wireless, suitable for transmitting encoded data. For example, network 140 may be a cellular telephone network, a wired or wireless local area network (LAN) or wide area network (WAN), or the Internet. Terminal 160 includes data, including mobile phones, PDAs, home or commercial video display equipment, computers (portable, laptop, handheld, PC, and large-scale server-based computer systems), and personal entertainment devices capable of utilizing multimedia data. It may be any type of communication device capable of receiving and displaying a, but is not limited thereto.

도 2 및 도 3 은 전처리기 (202) 의 예시적인 양태를 도시한다. 도 2 에서, 전처리기 (202) 는 디지털 송신 설비 (120) 내에 있다. 디코더 (201) 는 디지털 비디오 소스로부터 인코딩된 데이터를 디코딩하고, 메타데이터 (204) 및 비디오 (205) 를 전처리기 (202) 에 제공한다. 전처리기 (202) 는 비디오 (205) 및 메타데이터 (204) 에 대한 일정 타입의 처리를 수행하고, 처리된 메타데이터 (206; 예를 들어, 기저 계층 참조 프레임, 인핸스먼트 계층 참조 프레임, 대역폭 정보, 콘텐츠 정보) 및 비디오 (207) 를 인코더 (203) 에 제공한다. 그러한 멀 티미디어 데이터의 전처리는 시각 명료도, 안티-에일리어싱, 및 데이터의 압축 효율을 개선할 수 있다. 일반적으로, 전처리기 (202) 는 디코더 (201) 에 의해 제공된 비디오 시퀀스를 수신하고, 비디오 시퀀스를 인코더에 의한 후속 처리 (예를 들어, 인코딩) 를 위해 순차 비디오 시퀀스로 변환한다. 몇몇 양태에서, 전처리기 (202) 는 역 텔레시네, 디인터레이싱, 필터링 (예를 들어, 아티팩트 제거, 디링잉, 디블록킹, 및 디노이징), 리사이징 (예를 들어, SD (standard definition) 로부터 QVGA (Quarter Video Graphics Array) 로의 공간 해상도 다운-샘플링), 및 GOP 구조 생성 (예를 들어, 복잡도 맵 생성, 장면 전환 검출, 및 페이드/플래시 검출의 계산) 을 비롯한, 다수의 동작을 위해 구성될 수 있다.2 and 3 show exemplary embodiments of the preprocessor 202. In FIG. 2, the preprocessor 202 is in the digital transmission facility 120. Decoder 201 decodes the encoded data from the digital video source and provides metadata 204 and video 205 to preprocessor 202. Preprocessor 202 performs some type of processing on video 205 and metadata 204, and processes the processed metadata 206 (eg, base layer reference frame, enhancement layer reference frame, bandwidth information). , Content information) and a video 207 to the encoder 203. Such preprocessing of multimedia data can improve visual intelligibility, anti-aliasing, and compression efficiency of the data. In general, preprocessor 202 receives the video sequence provided by decoder 201 and converts the video sequence into a sequential video sequence for subsequent processing (eg, encoding) by the encoder. In some aspects, the preprocessor 202 may include reverse telecine, deinterlacing, filtering (eg, artifact elimination, de-ringing, deblocking, and denoising), resizing (eg, QVGA (Quarter) from standard definition (SD)). Video resolution array), and GOP structure generation (eg, complexity map generation, scene transition detection, and calculation of fade / flash detection).

도 3a 는 모듈 또는 컴포넌트 (본원에서 일괄하여 "모듈" 로 지칭됨) 와 함께 수신된 메타데이터 (204) 및 비디오 (205) 에 대한 전처리 동작을 수행한 다음에, 후속 처리를 위해 (예를 들어, 인코더로) 처리된 메타데이터 (206) 및 순차 비디오 (207) 를 제공하도록 구성되는 전처리기 (202) 를 도시한다. 그 모듈은 하드웨어, 소프트웨어, 펌웨어, 또는 그 조합으로 구현될 수 있다. 전처리기 (202) 는 도시되어 있는 하나 이상의 모듈을 포함하며, 모두가 아래에서 더 설명되는 역 텔레시네 (301), 디인터레이서 (302), 디노이저 (303), 에일리어스 제거기 (304), 리샘플러 (305), 디블록커/디링거 (306), 및 GOP 분할기 (307) 를 포함하는 여러 모듈을 포함할 수 있다. 또한, 전처리기 (202) 는 메모리 (308) 및 통신 모듈 (309) 을 비롯한, 비디오 및 메타데이터를 처리하는데 이용될 수도 있는 다른 적당한 모듈을 포함할 수 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈식 디스크, CD-ROM, 또는 당해 기술분야에서 공지되어 있는 임의의 다른 형태의 저장 매체에 상주할 수도 있다. 예시적인 저장 매체는 프로세서에 연결되어, 프로세서가 저장 매체로부터 정보를 판독하고, 저장 매체에 정보를 기록할 수 있도록 해준다. 다른 방법으로는, 저장 매체는 프로세서와 일체로 형성될 수도 있다. 프로세서와 저장 매체는 ASIC 에 상주할 수도 있다. ASIC 은 사용자 단말기에 상주할 수도 있다. 다른 방법으로는, 프로세서와 저장 매체는 사용자 단말기에 별도의 컴포넌트로서 상주할 수도 있다.3A performs a preprocessing operation on metadata 204 and video 205 received with a module or component (collectively referred to herein as a “module”), and then for subsequent processing (eg, , Preprocessor 202, configured to provide processed metadata 206 and sequential video 207. The module may be implemented in hardware, software, firmware, or a combination thereof. Preprocessor 202 includes one or more modules shown, inverse telecine 301, deinterlacer 302, denoiser 303, alias remover 304, resampler, all of which are further described below. 305, a deblocker / deringer 306, and a GOP divider 307. In addition, preprocessor 202 may include other suitable modules that may be used to process video and metadata, including memory 308 and communication module 309. The software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other type of storage medium known in the art. . An exemplary storage medium is coupled to the processor to enable the processor to read information from and write information to the storage medium. In the alternative, the storage medium may be integrally formed with the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

도 3b 는 멀티미디어 데이터의 처리를 위한 프로세스 (300) 를 도시한 흐름도이다. 프로세스 (300) 가 시작하고, 블록 320 으로 진행하여, 인터레이싱된 비디오를 수신한다. 도 2 및 도 3 에 도시된 전처리기 (202) 는 이러한 단계를 수행할 수 있다. 몇몇 양태에서, 디코더 (예를 들어, 도 2 의 디코더 (201)) 는 인터레이싱된 데이터를 수신하고, 이 인터레이싱된 데이터를 전처리기 (202) 에 제공할 수 있다. 몇몇 양태에서, 도 3c 에 전처리기 (202) 의 일부로서 도시되어 있는 데이터 수신 모듈 (330) 은 이러한 단계를 수행할 수 있다. 그 다음에, 프로세스 (300) 는 블록 322 로 진행하여, 인터레이싱된 비디오를 순차 비디오로 변환한다. 도 2 및 도 3a 에 도시된 전처리기 (202) , 및 도 3c 의 모듈 (332) 은 이러한 단계를 수행할 수 있다. 인터레이싱된 비디오가 텔레시네되면, 블록 322 의 처리는 역 텔레시네를 수행하여 순차 비디오를 생성하는 것을 포함할 수 있다. 그 다음에, 프로세스 (300) 는 블록 324 로 진행하여, 순차 비디오와 연관된 메타데이터를 생성할 수 있다. 도 3a 의 GOP 분할기 (307) 및 도 3c 의 모듈 (334) 은 그러한 처리를 수행할 수 있다. 그 다음에, 프로세스 (300) 는 블록 326 으로 진행하여, 순차 비디오 및 메타데이터의 적어도 일부를 인코딩 (예를 들어, 압축) 을 위해 인코더에 제공한다. 도 2 및 도 3a 에 도시된 전처리기 (202), 및 도 3c 의 모듈 (336) 은 이러한 단계를 수행할 수 있다. 순차 비디오 및 연관된 메타데이터를 인코딩을 위해 다른 컴포넌트로 제공한 후에, 프로세스 (300) 가 종료할 수 있다.3B is a flowchart illustrating a process 300 for processing multimedia data. Process 300 begins and proceeds to block 320 to receive the interlaced video. The preprocessor 202 shown in FIGS. 2 and 3 may perform this step. In some aspects, a decoder (eg, decoder 201 of FIG. 2) may receive interlaced data and provide this interlaced data to preprocessor 202. In some aspects, the data receiving module 330 shown as part of the preprocessor 202 in FIG. 3C may perform this step. Process 300 then proceeds to block 322 to convert the interlaced video into sequential video. The preprocessor 202 shown in FIGS. 2 and 3A, and the module 332 of FIG. 3C may perform this step. If the interlaced video is telecined, the processing of block 322 may include performing reverse telecine to generate sequential video. Process 300 can then proceed to block 324 to generate metadata associated with the sequential video. GOP divider 307 of FIG. 3A and module 334 of FIG. 3C may perform such processing. Process 300 then proceeds to block 326 to provide at least a portion of the sequential video and metadata to the encoder for encoding (eg, compression). The preprocessor 202 shown in FIGS. 2 and 3A and the module 336 of FIG. 3C can perform this step. After providing the sequential video and associated metadata to other components for encoding, process 300 may end.

도 3c 는 멀티미디어 데이터를 처리하는 수단을 도시한 블록도이다. 본원에는 그러한 수단이 전처리기 (202) 에 포함되어 있는 것으로 도시되어 있다. 전처리기 (202) 는 모듈 (330) 과 같은 비디오를 수신하는 수단을 포함한다. 또한, 전처리기 (202) 는 모듈 (332) 과 같은 인터레이싱된 데이터를 순차 비디오로 변환하는 수단을 포함한다. 그러한 수단은, 예를 들어, 공간-시간 디인터레이서 및/또는 역 텔레시네 장치를 포함할 수 있다. 또한, 전처리기 (202) 는 모듈 (334) 과 같은 순차 비디오와 연관된 메타데이터를 생성하는 수단을 포함한다. 그러한 수단은, 상술한 것과 같이 여러 타입의 메타데이터를 생성할 수 있는 GOP 분할기 (307; 도 3a) 를 포함할 수 있다. 또한, 전처리기 (202) 는 모듈 (336) 에 의해 도시된 것과 같이 순차 비디오 및 메타데이터를 인코딩을 위해 인코더에 제공하는 수단을 포함할 수 있다. 몇몇 양태에서, 그러한 수단은 도 3a 에 도시된 통신 모듈 (309) 을 포함할 수 있다. 당업자라면 알 수 있는 바와 같이, 그러한 수단은 다수의 표준적인 방법으로 구현될 수 있다.3C is a block diagram illustrating means for processing multimedia data. Such means are shown herein as being included in the preprocessor 202. Preprocessor 202 includes means for receiving a video, such as module 330. The preprocessor 202 also includes means for converting interlaced data, such as module 332, into sequential video. Such means may include, for example, a space-time deinterlacer and / or a reverse telecine device. In addition, preprocessor 202 includes means for generating metadata associated with sequential video, such as module 334. Such means may include a GOP divider 307 (FIG. 3A) capable of generating various types of metadata as described above. In addition, preprocessor 202 may include means for providing sequential video and metadata to the encoder for encoding as shown by module 336. In some aspects, such means can include the communication module 309 shown in FIG. 3A. As will be appreciated by those skilled in the art, such means can be implemented in a number of standard ways.

전처리기 (202) 는 전처리 동작 중 하나 이상의 동작을 위해 획득한 메타데이터 (예를 들어, 디코더 (201) 또는 다른 소스로부터 획득함) 를 이용할 수 있다. 메타데이터는 멀티미디어 데이터의 콘텐츠 ("콘텐츠 정보") 를 설명하거나 분류하는 것과 관련된 정보를 포함할 수 있다. 특히, 메타데이터는 콘텐츠 분류를 포함할 수 있다. 몇몇 양태에서, 메타데이터는 인코딩 동작을 위해 요구된 콘텐츠 정보를 포함할 수 있다. 그러한 경우에, 전처리기 (202) 는 콘텐츠 정보를 결정하고, 전처리 동작을 위해 콘텐츠 정보를 이용하고/하거나 예를 들어, 인코더 (203) 와 같은 다른 컴포넌트에 콘텐츠 정보를 제공하도록 구성될 수 있다. 몇몇 양태에서, 전처리기 (202) 는 그러한 콘텐츠 정보를 이용하여, GOP 분할에 영향을 주고, 적절한 타입의 필터링을 결정하고/하거나 인코더로 통신되는 인코딩 파라미터를 결정할 수 있다.The preprocessor 202 can use the acquired metadata (eg, obtained from the decoder 201 or other source) for one or more of the preprocessing operations. The metadata may include information related to describing or classifying the content (“content information”) of the multimedia data. In particular, the metadata may include content classification. In some aspects, the metadata may include content information required for the encoding operation. In such a case, the preprocessor 202 may be configured to determine the content information, use the content information for the preprocessing operation, and / or provide the content information to another component, such as, for example, the encoder 203. In some aspects, preprocessor 202 may use such content information to influence GOP segmentation, determine appropriate types of filtering, and / or determine encoding parameters communicated to the encoder.

도 4 는 전처리기에 포함될 수 있는 프로세스 블록의 예시적인 실시형태를 도시하는 것으로서, 전처리기 (202) 에 의해 수행될 수 있는 처리를 도시한다. 본 예에서, 전처리기 (202) 는 메타데이터 및 비디오 (204, 205) 를 수신하고, (처리된) 메타데이터 및 비디오를 포함한 출력 데이터 (206, 207) 를 인코더 (203) 에 제공한다. 통상, 전처리기에 의해 수신되는 비디오에는 3 가지 타입이 있다. 첫째로, 수신된 비디오는 순차 비디오일 수 있고, 디인터레이싱은 수행되지 않아도 된다. 둘째로, 비디오 데이터는 텔레시네된 비디오일 수 있고, 인터레이싱된 비디오는 24fps 영화 시퀀스로부터 변환된다. 셋째로, 비디오는 텔레시네되지 않은 인터레이싱된 비디오일 수 있다. 전처리기 (226) 는 아래에서 설명하 는 바와 같이 이들 타입의 비디오를 처리할 수 있다.4 illustrates an example embodiment of a process block that may be included in a preprocessor, which illustrates the processing that may be performed by preprocessor 202. In this example, preprocessor 202 receives metadata and video 204, 205 and provides output data 206, 207 including (processed) metadata and video to encoder 203. Typically, there are three types of video received by the preprocessor. Firstly, the received video may be sequential video, and deinterlacing does not have to be performed. Secondly, the video data can be telecined video, and the interlaced video is converted from a 24 fps movie sequence. Third, the video may be interlaced video that is not telecined. Preprocessor 226 may process these types of video as described below.

블록 401 에서, 전처리기 (202) 는 수신된 비디오 (204, 205) 가 순차 비디오인지 여부를 결정한다. 몇몇 경우에, 이는 메타데이터가 그러한 정보를 포함하면 메타데이터로부터 결정될 수 있고, 또는 비디오 자체의 처리에 의해 결정될 수 있다. 예를 들어, 아래에서 설명되는 역 텔레시네 프로세스는, 수신된 비디오 (205) 가 순차 비디오인지 여부를 결정할 수 있다. 수신된 비디오가 순차 비디오인 경우에, 프로세스는 블록 407 로 진행하여, 백색 가우스 잡음과 같은 잡음을 줄이기 위해 비디오에 대해 필터링 동작을 수행한다. 비디오가 순차 비디오가 아닌 경우에, 블록 401 에서, 프로세스는 블록 404 의 위상 검출기로 진행한다.At block 401, the preprocessor 202 determines whether the received video 204, 205 is sequential video. In some cases, this may be determined from the metadata if the metadata includes such information, or may be determined by the processing of the video itself. For example, the inverse telecine process described below can determine whether the received video 205 is sequential video. If the received video is sequential video, the process proceeds to block 407 to perform a filtering operation on the video to reduce noise, such as white Gaussian noise. If the video is not sequential video, at block 401, the process proceeds to the phase detector of block 404.

위상 검출기 (404) 는 텔레시네에서 생성된 비디오와 표준 방송 포맷에서 시작된 비디오를 구별한다. 비디오가 텔레시네되었다는 결정이 이루어지면 (위상 검출기 (404) 에서 나가는 "예" 결정 경로), 텔레시네된 비디오는 역 텔레시네 (406) 에서 그 원래 포맷으로 리턴된다. 중복 필드가 식별 및 제거되고, 동일 비디오로부터 유도된 필드가 완전한 이미지로 다시 만들어진다. 복원된 필름 이미지의 시퀀스가 1/24 초의 일정한 간격으로 사진 기록되므로, GOP 분할기 (412) 또는 디코더에서 수행되는 모션 추정 프로세스는 일정하지 않은 시간축을 갖는 텔레시네된 데이터보다 오히려 역 텔레시네된 이미지를 이용하여 더 정확해진다.Phase detector 404 distinguishes between video generated in telecine and video originating in a standard broadcast format. If a determination is made that the video has been telecined (“yes” decision path exiting phase detector 404), the telecined video is returned in its original format at inverse telecine 406. Duplicate fields are identified and removed, and fields derived from the same video are recreated as complete images. Since the sequence of reconstructed film images is photographed at regular intervals of 1/24 seconds, the motion estimation process performed at the GOP splitter 412 or decoder uses an inverse telecine image rather than telecine data with non-uniform time axes. More accurate.

일 양태에서, 위상 검출기 (404) 는 비디오 프레임의 수신 후에 어떤 결정을 행한다. 이들 결정은 (ⅰ) 현재 비디오가 텔레시네 출력으로부터의 비디오인지 및 3:2 풀 다운 위상이 도 5 에 도시된 5 개의 위상 P₀, P₁, P₂, P₃, 및 P₄ 중 하나인지 여부, 및 (ⅱ) 비디오가 통상의 NTSC 로서 생성되었는지 여부를 포함한다. 그 결정은 위상 P₅ 로서 표시된다. 이들 결정은 도 4 에 도시된 위상 검출기 (404) 의 출력으로서 나타난다. "예" 라벨을 붙인 위상 검출기 (404) 로부터의 경로는 역 텔레시네 (406) 를 작동시켜, 동일한 사진 이미지로부터 형성된 필드를 분류하고 결합할 수 있도록 정확한 풀 다운 위상이 제공되었음을 나타낸다. 이와 유사하게, "아니오" 라벨을 붙인 위상 검출기 (404) 로부터의 경로는 디인터레이서 (405) 를 작동시켜, 외관상 NTSC 프레임을 최적 처리를 위한 필드로 분할한다. 또한, 역 텔레시네는 발명의 명칭이 "INVERSE TELECINE ALGORITHM BASED ON STATE MACHINE" 이고 본원의 양수인이 소유하고 있으며 본원에서 그 전체를 참조로서 병합하고 있는 동시 계속 중인 미국 특허 출원 [대리인 Docket No. QFDM.021A(050943)] 에 더 설명되어 있다.In one aspect, the phase detector 404 makes some decisions after the reception of the video frame. These decisions include (i) whether the current video is from a telecine output and whether the 3: 2 pull down phase is one of the five phases P ₀ , P ₁ , P ₂ , P ₃ , and P ₄ shown in FIG. 5. , And (ii) whether the video was generated as conventional NTSC. The crystal is represented as phase P ₅ . These crystals appear as the output of the phase detector 404 shown in FIG. The path from the phase detector 404 labeled "Yes" indicates that the correct pull down phase has been provided to operate the inverse telecine 406 to classify and combine the fields formed from the same photographic image. Similarly, the path from the phase detector 404 labeled “No” activates the deinterlacer 405, apparently dividing the NTSC frame into fields for optimal processing. In addition, reverse telecine is also a concurrent US patent application owned by the assignee of the present invention "INVERSE TELECINE ALGORITHM BASED ON STATE MACHINE" and incorporated herein by reference in its entirety [representative Docket No. QFDM.021A (050943).

위상 검출기 (404) 는, 상이한 타입의 비디오가 어느 때라도 수신될 수도 있으므로 연속하여 비디오 프레임을 분석할 수 있다. 일 예로서, NTSC 표준에 따르는 비디오는 커머셜로서 비디오 내에 삽입될 수도 있다. 역 텔레시네 후에, 결과로서 생성된 순차 비디오는 백색 가우스 잡음을 줄이는데 이용될 수 있는 디노이저 (필터) (407) 로 송신된다.Phase detector 404 can analyze video frames continuously as different types of video may be received at any time. As an example, video conforming to the NTSC standard may be inserted into the video as commercial. After inverse telecine, the resulting sequential video is transmitted to a denoiser (filter) 407 that can be used to reduce white Gaussian noise.

통상의 NTSC 비디오가 인식되는 경우에는 (위상 검출기 (401) 로부터의 "아니오" 경로), 압축을 위해 디인터레이서 (405) 로 송신된다. 디인터레이서 (405) 가 인터레이싱된 필드를 순차 비디오로한 다음에, 순차 비디오에 대해 디노이징 동작이 수행될 수 있다.If normal NTSC video is recognized (“no” path from phase detector 401), it is sent to deinterlacer 405 for compression. After the deinterlacer 405 turns the interlaced fields into sequential video, a denoising operation may be performed on the sequential video.

적절한 역 텔레시네 또는 디인터레이싱 처리 후에, 블록 408 에서, 순차 비디오는 에일리어스 제거 및 리샘플링 (예를 들어, 리사이징) 을 위해 처리된다.After appropriate inverse telecine or deinterlacing processing, at block 408, the sequential video is processed for alias removal and resampling (eg, resizing).

또한, 리샘플링 후에, 순차 비디오는 블록 410 으로 진행하여, 디블록커 및 디링잉 동작이 수행된다. 2 가지 타입의 아티팩트인, "블록킹" 과 "링잉" 은 비디오 압축 애플리케이션에서 보통 발생한다. 압축 알고리즘이 각각의 프레임을 블록 (예를 들어, 8 × 8 블록) 으로 분할하기 때문에, 블록킹 아티팩트가 발생한다. 각각의 블록은 얼마간의 작은 에러로 재구성되고, 블록의 에지에 있는 에러는 종종 이웃 블록의 에지에 있는 에러와 대비되어, 블록 경계를 볼 수 있도록 해준다. 한편, 링잉 아티팩트는 이미지 피처의 에지 주위에 있는 왜곡으로서 나타난다. 인코더가 고주파수 DCT 계수의 양자화 시에 너무 많은 정보를 버리므로, 링잉 아티팩트가 발생한다. 몇몇 예시적인 예에서, 디블록킹과 디링잉 모두는 저역통과 FIR (finite impulse response) 필터를 이용하여 이들 눈에 보이는 아티팩트를 숨길 수 있다.In addition, after resampling, the progressive video proceeds to block 410 where deblocker and de-ringing operations are performed. Two types of artifacts, "blocking" and "ringing", usually occur in video compression applications. Because the compression algorithm splits each frame into blocks (e.g., 8x8 blocks), blocking artifacts occur. Each block is reconstructed with some small error, and the error at the edge of the block is often contrasted with the error at the edge of the neighboring block, allowing the block boundary to be seen. On the other hand, the ringing artifacts appear as distortions around the edges of the image feature. Ringing artifacts occur because the encoder discards too much information when quantizing the high frequency DCT coefficients. In some illustrative examples, both deblocking and deringing can hide these visible artifacts using lowpass finite impulse response (FIR) filters.

디블록킹 및 디링잉 후에, 순차 비디오는 GOP 분할기 (412) 에 의해 처리된다. GOP 포지셔닝은 샷 전환의 검출, 복잡도 맵 (예를 들어, 시간, 공간 대역폭 맵) 의 생성, 및 적응 GOP 분할을 포함할 수 있다. 샷 검출은, GOP (group of picture; 영상 그룹) 내의 프레임이 장면 전환이 발생한 것을 표시하는 데이터를 나타내는 때를 결정하는 것에 관련된다. 고정된 간격으로 I-프레임을 삽입 하는 대신에, 적절한 GOP 길이를 결정하고, GOP 길이에 기초하여 I-프레임을 삽입하도록, 장면 전환 검출을 비디오 인코더가 이용할 수 있다. 또한, 전처리기 (202) 는 멀티미디어 데이터를 인코딩하는데 이용될 수 있는 대역폭 맵을 생성하도록 구성될 수 있다. 그 대신에, 몇몇 양태에서는, 전처리기 외부에 위치한 콘텐츠 분류 모듈이 대역폭 맵을 생성한다. 적응 GOP 분할은 함께 코딩된 영상 그룹의 콤퍼지션 (composition) 을 적응 변경할 수 있다. 도 4 에 도시된 동작의 예시적인 실시형태는 아래에 설명된다.After deblocking and deringing, the sequential video is processed by the GOP divider 412. GOP positioning may include detection of shot transitions, generation of complexity maps (eg, temporal, spatial bandwidth maps), and adaptive GOP segmentation. Shot detection relates to determining when a frame in a group of picture (GOP) represents data indicating that a scene change has occurred. Instead of inserting I-frames at fixed intervals, scene change detection can be used by the video encoder to determine the appropriate GOP length and insert the I-frame based on the GOP length. In addition, the preprocessor 202 can be configured to generate a bandwidth map that can be used to encode the multimedia data. Instead, in some aspects, a content classification module located outside the preprocessor generates a bandwidth map. Adaptive GOP segmentation can adaptively change the composition of the group of pictures coded together. An exemplary embodiment of the operation shown in FIG. 4 is described below.

역 텔레시네Station telecine

역 텔레시네 처리는 아래에서 설명되고, 역 텔레시네의 예시적인 실시형태는 도 4 내지 도 16 을 참조하여 제공된다. 소스의 특성이 알려져 있으며 이상적으로 매칭하는 처리 형태를 선택하는데 이용되는 경우에, 비디오 압축이 최선의 결과를 제공한다. 예를 들어, 오프-디-에어 (off-the-air) 비디오는 몇몇 방법으로 발생할 수 있다. 비디오 카메라나 브로드캐스트 스튜디오 등에서 통상적인 방법으로 생성되는 브로드캐스트 비디오는 미국의 NTSC 표준을 따른다. NTSC 표준에 따라, 각각의 프레임은 2 개의 필드로 이루어진다. 한 필드는 홀수 라인으로 이루어지고, 다른 필드는 짝수 라인으로 이루어진다. 이는 "인터레이싱된" 포맷으로 지칭될 수도 있다. 약 30 프레임/초로 프레임이 생성되는 동안에, 필드는 1/60 초 떨어져 있는 텔레비전 카메라 이미지의 레코드이다. 한편, 필름은 24 프레임/초로 촬영되어, 각각의 프레임은 완전한 이미지로 이루어진다. 이는 "순차" 포맷으로서 지칭될 수도 있다. NTSC 장비에서의 송신의 경우에, "순차" 비디오는 텔레시네 프로세스를 통해 "인터레이싱된" 비디오 포맷으로 변환된다. 일 양태에서, 아래에서 더 설명되는 바와 같이, 본 발명의 시스템은 비디오가 텔레시네되고 적절한 변환을 수행하여 원래 순차 프레임을 재생하는 때를 결정하는 이점을 갖는다.Inverse telecine processing is described below, and an exemplary embodiment of reverse telecine is provided with reference to FIGS. 4-16. Video compression provides the best results when the characteristics of the source are known and used to select the ideally matched processing type. For example, off-the-air video can occur in several ways. Broadcast video, which is generated by conventional methods in video cameras and broadcast studios, conforms to the US NTSC standard. According to the NTSC standard, each frame consists of two fields. One field consists of odd lines and the other field consists of even lines. This may be referred to as an "interlaced" format. While the frame is generated at about 30 frames / second, the field is a record of the television camera image 1/60 second away. On the other hand, the film is taken at 24 frames / second so that each frame consists of a complete image. This may be referred to as a "sequential" format. In the case of transmission on NTSC equipment, the "sequential" video is converted into a "interlaced" video format via a telecine process. In one aspect, as described further below, the system of the present invention has the advantage of determining when the video is telecined and performing the appropriate conversion to play the original sequential frame.

도 5 는 인터레이싱된 비디오로 변환된 순차 프레임을 텔레시네하는 효과를 도시한다. F₁, F₂, F₃, 및 F₄ 는 텔레시네 장치에 입력되는 순차 비디오이다. 아래의 숫자 "1" 및 "2" 는 홀수 또는 짝수 필드 중 어느 하나의 표시이다. 프레임 레이트 간의 불일치를 고려하여 몇몇 필드가 반복된다는 것에 주목하자. 또한, 도 4 는 풀-다운 위상 P₀, P₁, P₂, P₃, 및 P₄ 를 도시한다. 위상 P0 은 동일한 제 1 필드를 갖는 2 개의 NTSC 호환성 프레임 중 제 1 프레임에 의해 표시된다. 다음 4 개의 프레임은 위상 P₁, P₂, P₃, 및 P₄ 에 대응한다. P₂ 및 P₃ 에 의해 표시된 프레임은 동일한 제 2 필드를 갖는다는 것에 주목하자. 필름 프레임 F₁ 이 3 회 스캔되므로, 2 개의 동일한 연속적인 출력의 NTSC 호환성 제 1 필드가 형성된다. 필름 프레임 F₁ 로부터 유도된 모든 NTSC 필드가 동일 필름 이미지로부터 선택되므로, 동시에 선택되게 된다. 필름으로부터 유도된 다른 NTSC 프레임은 1/24 초 떨어져 있는 인접한 필드를 가질 수도 있다.5 illustrates the effect of telecineting a sequential frame converted to interlaced video. F ₁ , F ₂ , F ₃ , and F ₄ are sequential video input to a telecine device. The numbers "1" and "2" below are indications of either odd or even fields. Note that some fields are repeated taking into account inconsistencies between frame rates. 4 also shows the pull-down phases P ₀ , P ₁ , P ₂ , P ₃ , and P ₄ . Phase P0 is indicated by the first of two NTSC compatible frames having the same first field. The next four frames correspond to phases P ₁ , P ₂ , P ₃ , and P ₄ . Note that the frames indicated by P ₂ and P ₃ have the same second field. Since film frame F ₁ is scanned three times, an NTSC compatible first field of two identical continuous outputs is formed. Since all NTSC fields derived from film frame F ₁ are selected from the same film image, they are selected simultaneously. Other NTSC frames derived from the film may have adjacent fields that are 1/24 seconds apart.

도 4 에 도시된 위상 검출기 (404) 는 비디오 프레임의 수신 후에 어떤 결정을 행한다. 이들 결정은 (ⅰ) 현재 비디오가 텔레시네 출력으로부터의 비디오 인지 및 3:2 풀 다운 위상이 도 5 의 정의 (512) 에 도시되어 있는 5 개의 위상 P₀, P₁, P₂, P₃, 및 P₄ 중 하나인지 여부; 및 (ⅱ) 비디오가 통상적인 NTSC 로서 생성되었는지 여부를 포함하고, 그 결정은 위상 P₅ 로서 표시된다.The phase detector 404 shown in FIG. 4 makes some decisions after the reception of the video frame. These determinations include (i) whether the current video is from a telecine output and the five phases P ₀ , P ₁ , P ₂ , P ₃ , and 3: 2 pull down phases shown in the definition 512 of FIG. Whether it is one of P ₄ ; And (ii) whether the video was generated as conventional NTSC, and the determination is indicated as phase P ₅ .

이들 결정은 도 4 에 도시된 위상 검출기 (401) 의 출력으로서 나타난다. "예" 라벨을 붙인 위상 검출기 (401) 로부터의 경로는 역 텔레시네 (406) 를 작동시켜, 동일 사진 이미지로부터 형성된 필드를 분류하고 결합할 수 있도록 정확한 풀 다운 위상이 제공되는 것을 나타낸다. 이와 유사하게, "아니오" 라벨을 붙인 위상 검출기 (401) 로부터의 경로는 디인터레이서 블록 (405) 을 작동시켜, 외관상 NTSC 프레임을 최적 처리를 위한 필드로 분할한다.These crystals appear as the output of the phase detector 401 shown in FIG. The path from the phase detector 401 labeled "Yes" indicates that an accurate pull down phase is provided to operate the inverse telecine 406 to classify and combine the fields formed from the same photographic image. Similarly, the path from phase detector 401 labeled "No" activates deinterlacer block 405, apparently dividing the NTSC frame into fields for optimal processing.

도 6 은 비디오 스트림을 역 텔레시네하는 프로세스 (600) 를 도시한 흐름도이다. 일 양태에서, 프로세스 (600) 는 도 3 의 역 텔레시네 (301) 에 의해 수행된다. 단계 651 에서 시작하여, 역 텔레시네 (301) 는 수신된 비디오에 기초하여 복수의 메트릭을 결정한다. 본 양태에서는, 동일 프레임 또는 인접 프레임으로부터 유도된 필드 간의 차이의 합인 4 개의 메트릭이 형성된다. 또한, 4 개의 메트릭은 수신된 데이터로부터 유도된 4 개의 메트릭과 6 개의 가설화된 위상 각각에 대한 이들 메트릭의 가장 가능성 있는 값 간의 거리의 유클리드 측정으로 어셈블링된다. 유클리드 합은 분기 정보로 지칭되고, 각각의 수신된 프레임에 대해 6 개의 그러한 양이 존재한다. 각각의 가설화된 위상은 뒤에 오는 위상을 가지며, 이러한 뒤에 오는 위상은 가능한 풀 다운 위상의 경우에 각각의 수신된 프 레임에 따라 변한다.6 is a flowchart illustrating a process 600 for reverse telecine of a video stream. In one aspect, process 600 is performed by inverse telecine 301 of FIG. 3. Beginning at step 651, inverse telecine 301 determines a plurality of metrics based on the received video. In this aspect, four metrics are formed that are the sum of the differences between fields derived from the same frame or adjacent frames. In addition, the four metrics are assembled into Euclidean measurements of the distance between the four metrics derived from the received data and the most likely values of these metrics for each of the six hypothesized phases. Euclidean sum is referred to as branch information, and there are six such amounts for each received frame. Each hypothesized phase has a phase that follows, which phase changes with each received frame in the case of a possible pull down phase.

가능한 전이 경로는 도 7 에 도시되어 있고, 참조 부호 767 로 표시되어 있다. 그러한 경로가 6 개 존재한다. 결정 프로세스는 가설화된 위상의 각 경로에 대해 유클리드 거리의 합과 등가인 4 개의 측정을 유지한다. 변경된 조건에 절차가 응답하도록, 그 합에 있어서 각각의 유클리드 거리는 오래됨에 따라 줄어든다. 유클리드 거리의 합이 최소인 위상 트랙이 동작하는 위상 트랙인 것으로 생각된다. 이러한 트랙의 현재 위상은 "적용 가능한 위상" 으로 지칭된다. 이때, P₅ 가 아닌 한, 선택된 위상에 기초한 역 텔레시네가 발생할 수 있다. P₅ 가 선택되면, 블록 405 (도 4) 에서, 현재 프레임은 디인터레이서를 이용하여 디인터레이싱된다. 요약하면, 적용 가능한 위상은 현재 풀 다운 위상으로서 이용되거나, 유효한 NTSC 포맷을 갖는 것으로 추정된 프레임의 디인터레이싱을 명령하는 표시자로서 이용된다.Possible transition paths are shown in FIG. 7 and indicated at 767. There are six such routes. The decision process maintains four measurements that are equivalent to the sum of Euclidean distance for each path of the hypothesized phase. Each Euclidean distance in the sum decreases with age so that the procedure responds to the changed condition. It is thought that the phase track whose minimum sum of Euclidean distances is the operating phase track is operated. The current phase of this track is referred to as the "applicable phase". At this time, unless P ₅ , inverse telecine based on the selected phase may occur. If P ₅ is selected, then at block 405 (FIG. 4), the current frame is deinterlaced using a deinterlacer. In summary, the applicable phase is used as the current pull down phase, or as an indicator to command the deinterlacing of frames estimated to have a valid NTSC format.

비디오 입력으로부터 수신된 모든 프레임에 대해, 4 개의 메트릭 각각에 대한 새로운 값이 계산된다. 이들은 다음과 같이 정의된다.For every frame received from the video input, a new value for each of the four metrics is calculated. These are defined as follows.

용어 SAD 는 용어 "summed absolute differences" 의 약자이다. 도 8 에 는 메트릭을 형성하도록 차분되는 필드가 그림으로 도시되어 있다. 아래 첨자는 필드 번호를 지칭하고, 글자는 이전 (= P) 또는 현재 (= C) 를 지칭한다. 도 8 의 괄호는 필드의 쌍 단위 (pair-wise) 차분을 지칭한다. SAD_FS 는 C₁ 라벨을 붙인 현재 프레임의 필드 1 과 P₁ 라벨을 붙인 이전 프레임의 필드 1 간의 차이 (도 8 에 제공된 정의에서 FS 라는 라벨을 붙인 괄호에 의해 걸쳐 있음) 를 지칭하고, SAD_SS 는 C₂ 라벨을 붙인 현재 프레임의 필드 2 와 P₂ 라벨을 붙인 이전 프레임의 필드 2 간의 차이 (SS 라벨을 붙인 괄호에 의해 걸쳐 있음) 를 지칭하고, SAD_CO 는 C₂ 라벨을 붙인 현재 프레임의 필드 2 와 C₁ 라벨을 붙인 현재 프레임의 필드 1 간의 차이 (CO 라벨을 붙인 괄호에 의해 걸쳐 있음) 를 지칭하며, SAD_PO 는 양쪽 모두가 PO 라벨을 붙인 괄호에 의해 걸쳐 있는 현재 프레임의 필드 1 과 이전 프레임의 필드 2 간의 차이를 지칭한다.The term SAD is an abbreviation for the term "summed absolute differences". In figure 8 the fields which are differentiated to form a metric are illustrated. Subscripts refer to field numbers, and letters refer to previous (= P) or current (= C). Brackets in FIG. 8 refer to pair-wise differences of fields. SAD _FS refers to the difference between field 1 of the current frame labeled C ₁ and field 1 of the previous frame labeled P ₁ , spanned by parentheses labeled FS in the definition provided in FIG. 8, and SAD _SS Refers to the difference between field 2 of the current frame labeled C ₂ and field 2 of the previous frame labeled P ₂ , spanned by parentheses labeled SS, and SAD _CO of the current frame labeled C ₂ . Refers to the difference between field 1 of the current frame labeled Field 2 and C ₁ (covered by parentheses labeled CO), and SAD _PO refers to field 1 of the current frame, both of which are covered by PO labeled parentheses. And the difference between field 2 of the previous frame.

아래에는, 각각의 SAD 를 평가하기 위한 계산 부하가 설명된다. 통상적인 NTSC 에 있어서, 약 480 개의 활성 수평 라인이 존재한다. 4:3 가로세로비 (aspect ratio) 를 가지며, 수평 방향으로 해상도가 동일하기 위해, 480 × 4/3 = 640 과 등가인 수직 라인 또는 자유도가 존재해야 한다. 640 × 480 개 화소의 비디오 포맷은 ATSC (Advanced Television Standards Committee) 에 의해 수용된 포맷 중 하나이다. 따라서, 1/30 초의 프레임 지속기간마다, 640 × 480 = 307,200 개의 새로운 화소가 생성된다. 새로운 데이터는 9.2 × 10⁶ 화소/초의 레이트로 생성되어, 이러한 시스템을 실행하는 하드웨어나 소프트웨어가 약 10 MB 이상의 레이트로 데이터를 처리한다는 것을 의미한다. 이는 그 시스템의 고속 부분 중 하나이다. 이는 하드웨어, 소프트웨어, 펌웨어, 미들웨어, 마이크로코드, 또는 그 임의의 조합에 의해 구현될 수 있다. SAD 계산기는 자립형 컴포넌트이거나, 다른 디바이스의 컴포넌트 내에 있는 하드웨어, 펌웨어, 미들웨어로서 포함되거나, 프로세서 상에서 실행되는 마이크로코드 또는 소프트웨어에 구현되거나, 또는 그 조합일 수 있다. 소프트웨어, 펌웨어, 미들웨어 또는 마이크로코드에 구현 시, 계산을 수행하는 프로그램 코드 또는 코드 세그먼트는 저장 매체와 같은 머신 판독가능 매체에 저장될 수도 있다. 코드 세그먼트는 절차, 함수, 서브프로그램, 프로그램, 루틴, 서브루틴, 모듈, 소프트웨어 패키지, 클래스, 또는 명령들, 데이터 구조들, 또는 프로그램 문장들의 임의의 조합을 표현할 수도 있다. 코드 세그먼트는 정보, 데이터, 인수 (argument), 파라미터, 또는 메모리 콘텐츠를 전달하고/하거나 수신함으로써 다른 코드 세그먼트 또는 하드웨어 회로에 연결될 수도 있다.In the following, the calculation load for evaluating each SAD is described. In conventional NTSC, there are about 480 active horizontal lines. In order to have a 4: 3 aspect ratio and have the same resolution in the horizontal direction, there must be a vertical line or degrees of freedom equivalent to 480 × 4/3 = 640. The video format of 640 x 480 pixels is one of the formats accepted by the Advanced Television Standards Committee (ATSC). Thus, every frame duration of 1/30 seconds, 640 x 480 = 307,200 new pixels are created. New data is generated at a rate of 9.2 × 10 ⁶ pixels / second, meaning that hardware or software running such a system processes the data at a rate of about 10 MB or more. This is one of the high speed parts of the system. This may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. The SAD calculator may be a standalone component, included as hardware, firmware, middleware in a component of another device, implemented in microcode or software running on a processor, or a combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments that perform the calculation may be stored on a machine-readable medium, such as a storage medium. A code segment may represent a procedure, function, subprogram, program, routine, subroutine, module, software package, class, or any combination of instructions, data structures, or program statements. Code segments may be coupled to other code segments or hardware circuits by communicating and / or receiving information, data, arguments, parameters, or memory contents.

도 9 의 흐름도 (900) 는 도 8 의 관계를 명시적으로 나타낸 것으로서, 식 (1) 내지 식 (4) 의 그림 표현이다. 이는 SAD_FS, SAD_CO, SAD_SS 및 SAD_PO 의 가장 최근 값이 각각 유지되는 저장 위치 (941, 942, 943 및 944) 를 도시한다. 이들은 각각 4 개의 절대 차이 합 계산기 (940) 에 의해 각각 생성되어, 이전 제 1 필드 데이터의 휘도 값 (931), 현재 제 1 필드 데이터의 휘도 값 (932), 현재 제 2 필드 데이터의 휘도 값 (933) 및 이전 제 2 필드 데이터의 휘도 값 (934) 을 처리한다. 메트릭을 정의하는 덧셈에 있어서, 용어 "값(i,j)" 는 위치 i, j 에 있는 휘도 값을 의미하고, 활성 화소 전부에 대해 덧셈이 이루어지지만, 활성 화소의 중요한 부분집합에 대한 덧셈이 제외되는 것은 아니다.The flowchart 900 of FIG. 9 explicitly illustrates the relationship of FIG. 8, which is a pictorial representation of equations (1) to (4). This shows the storage locations 941, 942, 943 and 944 where the most recent values of SAD _FS , SAD _CO , SAD _SS and SAD _PO are maintained, respectively. These are each generated by four absolute difference sum calculators 940, respectively, so that the luminance value 931 of the previous first field data, the luminance value 932 of the current first field data, and the luminance value of the current second field data ( 933 and the luminance value 934 of the previous second field data. In addition, which defines a metric, the term "value (i, j)" means the luminance value at positions i, j and adds to all the active pixels, but adds to an important subset of the active pixels. It is not excluded.

도 10 의 흐름도 (1000) 는 텔레시네된 비디오를 검출하고, 이를 역 텔레시네하여 원래 스캔된 필름 이미지로 복원하는 프로세스를 도시하는 상세한 흐름도이다. 단계 1030 에서, 도 9 에 정의된 메트릭이 평가된다. 단계 1083 으로 진행하여, 4 개의 메트릭의 하부 포락선 값을 찾는다. SAD 메트릭의 하부 포락선은 동적으로 결정되는 양으로서, 최고 수치 플로어 (numerical floor) 이며, 그 아래에서 SAD 는 관통하지 않는다. 단계 1085 로 진행하고, 아래의 식 (5) 내지 식 (10) 에서 정의되는 분기 정보를 결정하여, 이전에 결정된 메트릭, 하부 포락선 값 및 실험적으로 결정된 상수 A 를 이용할 수 있다. 연속적인 위상 값이 일치하지 않을 수도 있으므로, 단계 1087 에서 이러한 외관상 불안정을 줄이기 위해 양 Δ 가 결정된다. 그 위상은, 위상 결정 시퀀스가 도 7 에 도시된 당해 모델과 일치하는 경우에, 일치하는 것으로 생각된다. 그 단계 다음에, 프로세스는 단계 1089 로 진행하여, 현재 Δ 값을 이용하여 결정 변수를 계산한다. 결정 변수 계산기 1089 는 유도되어 1080 의 블록에서 생성된 모든 정보를 이용하여 결정 변수를 평가한다. 단계 1030, 단계 1083, 단계 1085, 단계 1087 및 단계 1089 는 도 6 의 메트릭 결정 (651) 의 확장이다. 이들 변수로부터, 위상 선택기 (1090) 에 의해 적용 가능한 위상이 구해진다. 도시된 것과 같이, 결정 단계 1091 는 적용 가능한 위상을 이용하여, 텔레시네된 비디오를 역 텔레시네하거나 디인터레이싱한다. 이는 도 4 의 위상 검출기 (404) 동작의 보다 명시적인 표현이다. 일 양태에서, 도 10 의 처리는 도 4 의 위상 검출기 (404) 에 의해 수행된다. 단계 1030 에서 시작하여, 위상 검출기 (404) 는 도 8 을 참조하여 상술한 프로세스에 의해 복수의 메트릭을 결정하고, 단계 1083, 단계 1085, 단계 1087, 단계 1089, 단계 1090, 및 단계 1091 을 걸쳐 진행한다.Flowchart 1000 of FIG. 10 is a detailed flow diagram illustrating a process of detecting telecined video and reversing it to reconstruct the original scanned film image. In step 1030, the metric defined in FIG. 9 is evaluated. Proceeding to step 1083, find the lower envelope values of the four metrics. The lower envelope of the SAD metric is a dynamically determined quantity, the highest numerical floor, below which no SAD penetrates. Proceeding to step 1085, branch information defined in Equations (5) to (10) below can be determined to use previously determined metrics, lower envelope values, and experimentally determined constants A. Since the successive phase values may not match, the amount Δ is determined in step 1087 to reduce this apparent instability. The phase is considered to coincide when the phase determination sequence coincides with the model shown in FIG. After that step, the process proceeds to step 1089 to calculate the decision variable using the current Δ value. The decision variable calculator 1089 derives and evaluates the decision variable using all the information generated in the block of 1080. Step 1030, step 1083, step 1085, step 1087, and step 1089 are extensions of the metric determination 651 of FIG. 6. From these variables, the applicable phase is obtained by the phase selector 1090. As shown, decision step 1091 uses the applicable phase to reverse telecine or deinterlace the telecined video. This is a more explicit representation of the phase detector 404 operation of FIG. In one aspect, the processing of FIG. 10 is performed by the phase detector 404 of FIG. Beginning at step 1030, the phase detector 404 determines a plurality of metrics by the process described above with reference to FIG. 8 and proceeds through steps 1083, 1085, 1087, 1089, 1090, and 1091. do.

흐름도 (1000) 는 현재 위상을 추정하는 프로세스를 도시한다. 단계 1083 에서의 흐름도는 결정된 메트릭 및 하부 포락선 값을 이용하여 분기 정보를 계산하는 것을 설명한다. 분기 정보는 상술한 유클리드 거리로서 인식될 수도 있다. 분기 정보를 생성하는데 이용될 수도 있는 예시적인 식은 아래의 식 (5) 내지 식 (10) 이다. 분기 정보 양은 도 12 의 블록 (1209) 에서 계산된다.Flowchart 1000 shows a process for estimating the current phase. The flow chart at step 1083 describes calculating branch information using the determined metric and lower envelope values. The branch information may be recognized as the Euclidean distance described above. Exemplary equations that may be used to generate branch information are Equations (5) through (10) below. The branch information amount is calculated at block 1209 of FIG.

처리된 비디오 데이터는 예를 들어, 프로세서에 접속된 디스크 타입 저장 매체 (예를 들어, 자기식 또는 광학식) 또는 칩 구성된 저장 매체 (예를 들어, ROM, RAM) 를 포함할 수 있는 저장 매체에 저장될 수 있다. 몇몇 양태에서, 역 텔레시네 (406) 및 디인터레이서 (405) 는 각각 저장 매체의 일부 또는 전부를 포함할 수 있다. 분기 정보 양은 다음 식에 의해 정의된다.The processed video data is stored in a storage medium that may include, for example, a disk type storage medium (eg, magnetic or optical) or a chip configured storage medium (eg, ROM, RAM) connected to a processor. Can be. In some aspects, reverse telecine 406 and deinterlacer 405 may each comprise some or all of a storage medium. The amount of branch information is defined by the equation

분기 계산의 구체적인 상세는 도 12 의 분기 정보 계산기 (1209) 에 도시되어 있다. 계산기 (1209) 에 도시된 바와 같이, 분기 정보를 전개하는 것은, SAD_FS 및 SAD_SS 의 하부 포락선 값인 L_S, SAD_PO 의 하부 포락선 값인 L_P, 및 SAD_CO 의 하부 포락선 값인 L_C 와 같은 양들을 이용한다. 하부 포락선들은 분기 정보 계산 시 거리 오프셋으로서 이용되거나, 단독으로 이용되거나, 또는 미리 결정된 상수 A 와 함께 이용되어, H_S, H_P 및 H_C 를 생성한다. 그 값들은 아래에서 설명되는 하부 포락선 추적기에 최신 값으로 유지된다. H 오프셋은 다음과 같이 정의된다.Specific details of the branch calculation are shown in the branch information calculator 1209 of FIG. As shown in the calculator 1209, the branching information development includes an amount such as L _{S which} is the lower envelope value of SAD _FS and SAD _SS , L _{P which} is the lower envelope value of SAD _PO , and L _{C which} is the lower envelope value of SAD _CO . Use them. The lower envelopes are used as distance offsets in the branch information calculation, used alone, or used with a predetermined constant A to produce H _S , H _P and H _C. The values are kept up to date in the lower envelope tracker described below. The H offset is defined as follows.

L_S, L_P 및 L_C 의 값들을 추적하는 프로세스는 도 13a, 도 13b 및 도 13c 에 표현된다. 예를 들어, 도 13a 의 상부에 도시된 L_P 에 대한 추적 알고리즘 (1300) 을 고려하자. 메트릭 SAD_PO 는 비교기 (1305) 에서 현재 L_P 값과 임계값 T_P 를 더한 것과 비교된다. 메트릭 SAD_PO 가 초과하는 경우에, 블록 1315 에 도시된 바와 같이, 현재 L_P 값은 변하지 않는다. 메트릭 SAD_PO 가 초과하지 않는 경우에, 블록 1313 에 도시된 바와 같이, 새로운 L_P 값은 SAD_PO 및 L_P 의 선형 조합으로 된다. 블록 1315 에 대한 다른 양태에서, 새로운 L_P 값은 L_P + T_P 이다.The process of tracking the values of L _S , L _P and L _C is represented in FIGS. 13A, 13B and 13C. For example, consider the tracking algorithm 1300 for L _{P shown} at the top of FIG. 13A. The metric SAD _PO is compared at the comparator 1305 with the current L _P value plus the threshold T _P. If the metric SAD _PO exceeds, then the current L _P value does not change, as shown in block 1315. If the metric SAD _PO does not exceed, as shown in block 1313, the new L _P value is a linear combination of SAD _PO and L _P. In another aspect for block 1315, the new L _P value is L _P + T _P.

이와 유사하게, 도 13b 및 도 13c 의 양들 L_S 및 L_C 가 계산된다. 동일 기능을 갖는 도 13a, 도 13b 및 도 13c 의 처리 블록은 동일한 번호이지만 프라임 (' 또는 ") 이 부여되어, 상이한 변수 세트에 대해 작용한다는 것을 나타낸다. 예를 들어, SAD_PO 및 L_C 의 선형 조합이 형성되는 경우에, 그 동작은 블록 1313' 에 도시되어 있다. L_P 의 경우에서와 같이, 1315' 에 대한 다른 양태는 L_C 를 L_C + T_C 로 대체할 것이다.Similarly, the amounts L _S and L _C of FIGS. 13B and 13C are calculated. The processing blocks of Figures 13A, 13B and 13C having the same function are given the same numbers but are primed ('or ") to indicate that they act on different sets of variables, for example, linear of SAD _PO and L _C. If a combination is formed, the operation is shown in block 1313 '. As in the case of L _P , another aspect for 1315' will replace L _C with L _C + T _C.

그러나, L_S 의 경우에, 도 13b 의 알고리즘은 교대로 SAD_PS 와 SAD_SS 를 처리하여, 각각의 X 를 라벨링하는데, 그 이유는 이러한 하부 포락선이 양쪽 변수에 적용되기 때문이다. 블록 1308 내의 현재 SAD_FS 값이 블록 1303 에서 X 에 대한 위치로 판독된 다음에, 블록 1307 내의 현재 SAD_SS 값이 블록 1302 에서 X 에 대한 위치로 판독될 때, SAD_FS 및 SAD_SS 값의 교번이 발생한다. L_P 에 대한 경우에서와 같이, 1315" 에 대한 다른 양태는 L_S 를 L_S + T_S 로 대체할 것이다. 현재 하부 포락선 값 테스트 시 이용되는 임계값 및 양 A 는 실험에 의해 미리 결정된다.However, in the case of L _S , the algorithm of FIG. 13B alternately processes SAD _PS and SAD _SS to label each X because this lower envelope is applied to both variables. When the current SAD _FS value in block 1308 is read into the position for X in block 1303, then the current SAD _SS value in block 1307 is read into the position for X in block 1302, the alternation of the SAD _FS and SAD _SS values occurs. Occurs. As in the case for L _P , another aspect for 1315 ″ will replace L _S with L _S + T _S. The threshold and amount A currently used in the lower envelope value test are predetermined by experiment.

도 11 은 도 10 의 단계 1089 를 수행하는 예시적인 프로세스를 도시한 흐름도이다. 도 11 은 일반적으로 결정 변수를 갱신하는 프로세스를 도시한다. 메트릭으로부터 유도된 새로운 정보에 따라 6 개의 결정 변수 (6 개의 가능한 결정에 대응함) 가 갱신된다. 결정 변수는 다음과 같이 구해진다.FIG. 11 is a flow diagram illustrating an example process for performing step 1089 of FIG. 10. 11 generally illustrates a process for updating decision variables. Six decision variables (corresponding to six possible decisions) are updated according to the new information derived from the metric. The decision variable is obtained as follows.

양 α 는 1 보다 작고, 자신의 과거 값들에 대한 결정 변수의 의존성을 제한하는데, α 의 이용은 그 데이터가 오래됨에 따라 각각의 유클리드 거리의 영향을 줄이는 것과 등가이다. 흐름도 (1162) 에서, 갱신될 결정 변수는 라인 (1101, 1102, 1103, 1104, 1105 및 1106) 상에서 이용 가능한 것으로서 왼쪽에 기재되어 있다. 그 다음에, 위상 전이 경로 중 하나의 위상 전이 경로 상의 각각의 결정 변수를 블록 (1100) 중 하나에서 1 보다 작은 숫자인 α 로 곱한 다음에, 오래된 결정 변수의 감쇠된 값을 감쇠된 결정 변수가 존재하였던 위상 전이 경로 상의 다음 위상에 의해 인덱싱된 분기 정보 변수의 현재 값에 더한다. 이는 블록 (1110) 에서 일어난다. 변수 D₅ 는 블록 (1193) 에서 양 Δ 만큼 오프셋되고, Δ 는 블록 (1112) 에서 계산된다. 아래에서 설명하는 바와 같이, 그 양은 본 시스템에 의해 결정된 위상 시퀀스의 불일치를 줄이도록 선택된다. 블록 (1120) 에서 가장 작은 결정 변수가 구해진다.The amount α is less than 1 and limits the dependence of the decision variable on its past values, the use of α being equivalent to reducing the effect of each Euclidean distance as the data grows older. In flowchart 1162, the decision variables to be updated are described on the left as available on lines 1101, 1102, 1103, 1104, 1105, and 1106. Then, multiply each decision variable on the phase transition path of one of the phase transition paths by α, a number less than 1 in one of the blocks 1100, and then attenuate the old, determinant value by the attenuated decision variable. It is added to the current value of the branch information variable indexed by the next phase on the phase transition path that was present. This occurs at block 1110. The variable D ₅ is offset in block 1 193 by the amount Δ, and Δ is calculated in block 1112. As described below, the amount is chosen to reduce the mismatch of the phase sequence determined by the present system. In block 1120 the smallest decision variable is found.

요약하면, 각각의 결정에 고유한 새로운 정보가 α 로 곱해진 적당한 결정 변수의 이전 값에 더해져서, 현재 결정 변수의 값을 얻는다. 새로운 메트릭을 제어할 수 있을 때에 새로운 결정을 내릴 수 있으므로, 본 기술은 모든 프레임마다 필드 1 및 2 의 수신 시 새로운 결정을 내릴 수 있다. 이들 결정 변수는 상술한 유클리드 거리의 합이다.In summary, new information unique to each decision is added to the previous value of the appropriate decision variable multiplied by α, to obtain the value of the current decision variable. Since a new decision can be made when the new metric can be controlled, the present technology can make a new decision upon reception of fields 1 and 2 every frame. These determinants are the sum of the Euclidean distances described above.

적용 가능한 위상은 가장 작은 결정 변수의 아래첨자를 갖는 것이 되도록 선택된다. 결정 변수에 기초한 결정은 도 10 의 블록 1090 에서 명시적으로 행해진다. 어떤 결정이 결정 공간에서 허용된다. 블록 1091 에서 설명된 바와 같이, 이들 결정은, (ⅰ) 적용 가능한 위상이 P5 가 아니고 (비디오의 역 텔레시네), (ⅱ) 적용 가능한 위상이 P5 이다 (비디오의 디인터레이싱) 라는 것이다.The applicable phase is chosen to have the subscript of the smallest decision variable. Decisions based on decision variables are made explicitly at block 1090 of FIG. Any decision is allowed in the decision space. As described in block 1091, these decisions are that (i) the applicable phase is not P5 (inverse telecine of video), and (ii) the applicable phase is P5 (deinterlacing of video).

본질상 가변인 비디오로부터 메트릭이 도출되기 때문에, 결정의 코히런트 스트링에서 때때로 에러가 존재할 수도 있다. 이러한 기술은 도 7 과 일치하지 않는 위상 시퀀스를 검출한다. 그 동작은 도 14 에 개략적으로 도시되어 있다. 알고리즘 (1400) 은 블록 1405 에서 현재 위상 결정의 아래첨자 (=x) 를 저장하고, 블록 1406 에서 이전 위상 결정의 아래첨자 (=y) 를 저장한다. 블록 1410 에서, x=y=5 가 테스트되면, 블록 1411 에서 다음과 같은 값, 즉,Since metrics are derived from video that is inherently variable, there may sometimes be errors in the coherent string of decisions. This technique detects a phase sequence that is inconsistent with FIG. The operation is shown schematically in FIG. The algorithm 1400 stores the subscript (= x) of the current phase determination at block 1405 and the subscript (= y) of the previous phase determination at block 1406. In block 1410, if x = y = 5 is tested, in block 1411 the following values, i.e.

x=1, y=0; 또는x = 1, y = 0; or

x=2, y=1; 또는x = 2, y = 1; or

x=3, y=2; 또는x = 3, y = 2; or

x=4, y=3; 또는x = 4, y = 3; or

x=0, y=4 인지 여부가 테스트된다. 2 개의 테스트 중 어느 한쪽이라도 긍정적이면, 블록 1420 에서 일치하는 것으로 결정이 선언된다. 어느 한쪽도 긍정적이지 않으면, 도 11 의 블록 1193 에 도시되어 있는 오프셋이 도 15 에서 계산되어, P₅ 와 연관된 결정 변수인 D₅ 에 추가된다.Whether x = 0 and y = 4 is tested. If either of the two tests is positive, a decision is declared to match at block 1420. If neither is positive, the offset shown in block 1193 of FIG. 11 is calculated in FIG. 15 and added to D ₅ , the decision variable associated with P ₅ .

또한, 도 15 에서 D₅ 에 대한 수정은 프로세스 (1500) 의 일부로서 나타나고, 이 프로세스 (1500) 는 위상 시퀀스의 불일치에 대한 교정 액션을 제공한다. 흐름도 (1500) 의 블록 1510 에서 일치성 테스트가 실패한 것으로 가정하자. 블록 1510 에서 시작하는 "아니오" 분기를 따라 진행하여, 블록 1514 의 다음 테스트는 모든 i ＜ 5 에 대해 D₅ ＞ D_i 인지 여부, 또는 다른 방법으로는 변수 D_i (i ＜ 5) 중 적어도 하나가 D₅ 보다 큰지 여부에 관한 것이다. 첫 번째 경우가 유효하면, 블록 1516 에서, δ₀ 를 초기값으로 갖는 파라미터 δ 가 3δ₀ 로 변경된다. 두 번째 경우가 유효하면, 블록 1517 에서 δ 는 4δ₀ 로 변경된다. 블록 152B 에서, Δ 의 값은 Δ_B 로 갱신되는데, 여기서,In addition, the modification to D ₅ in FIG. 15 appears as part of process 1500, which provides a corrective action for mismatch in phase sequence. Assume that the consistency test failed at block 1510 of the flowchart 1500. Proceeding along the “No” branch beginning at block 1510, the next test of block 1514 is whether D ₅ > D _i for all i <5, or alternatively at least one of the variables D _i (i <5). Is greater than D ₅ . If the first case is valid, then at block 1516 the parameter δ with δ ₀ as the initial value is changed to 3δ ₀ . If the second case is valid, then at block 1517 δ is changed to 4δ ₀ . In block 152B, the value of Δ is updated to Δ _B , where

다시 블록 1510 으로 리턴하여, 결정 스트링이 일치하는 것으로 판정된 것으 로 가정하자. 블록 1515 에서, 파라미터 δ 는 다음과 같이 정의되는 δ₊ 로 변경된다.Returning to block 1510 again, assume that the decision string is determined to match. At block 1515, the parameter δ is changed to δ ₊ , defined as follows.

블록 152A 에서, 새로운 δ 값이 Δ 에 대한 갱신 관계인 Δ_A 내에 삽입된다. 이는 다음과 같이 기재된다.At block 152A, the new value of δ is inserted into the updated relationship for Δ Δ _A. This is described as follows.

그 다음에, 블록 1593 에서, 갱신된 Δ 값이 결정 변수 D₅ 에 더해진다.Then, at block 1593, the updated Δ value is added to decision variable D ₅ .

도 16 은 일단 풀 다운 위상이 결정되면 역 텔레시네 프로세스가 진행되는 방법을 도시한다. 이러한 정보를 이용하여, 필드 (1605 및 1605') 는 동일 비디오 필드를 나타내는 것으로서 식별된다. 2 개의 필드는 서로 평균되고, 필드 (1606) 와 조합되어, 프레임 (1620) 을 재구성한다. 재구성된 프레임은 참조 부호가 1620' 이다. 이와 유사한 프로세스가 프레임 (1622) 을 재구성할 것이다. 프레임 (1621 및 1623) 으로부터 유도된 필드는 복제되지 않는다. 이들 프레임은 제 1 프레임과 제 2 프레임을 함께 조합하여 재구성된다.Figure 16 illustrates how the reverse telecine process proceeds once the pull down phase is determined. Using this information, the fields 1605 and 1605 'are identified as representing the same video field. The two fields are averaged together and combined with field 1606 to reconstruct frame 1620. The reconstructed frame has reference number 1620 '. A similar process will reconstruct frame 1622. Fields derived from frames 1621 and 1623 are not duplicated. These frames are reconstructed by combining the first frame and the second frame together.

상술한 양태에서, 새로운 프레임이 수신될 때마다, 4 개의 새로운 메트릭 값이 구해지고, 6 겹의 가설 세트가 새로 계산된 결정 변수를 이용하여 테스트된다. 다른 처리 구조는 결정 변수를 계산하도록 적응될 수 있다. 비터비 디코더는 경로 메트릭을 형성하도록 함께 경로를 구성하는 분기의 메트릭을 더한다. 여기서 정의된 결정 변수는 유사한 규칙에 의해 형성되는데, 각각은 새로운 정보 변수의 "리키 (leaky)" 합이다. (리키 합에 있어서, 결정 변수의 이전 값은 새로운 정보 데이터가 더해지기 전에 1 보다 작은 숫자로 곱해진다.) 비터비 디코더 구조는 이러한 절차의 동작을 지원하도록 수정될 수 있다.In the above aspect, each time a new frame is received, four new metric values are obtained, and a six-ply hypothesis set is tested using the newly calculated decision variable. Other processing structures can be adapted to calculate decision variables. The Viterbi decoder adds the metrics of the branches that make up the path together to form the path metric. The decision variables defined here are formed by similar rules, each of which is a "leaky" sum of new information variables. (For Ricky sum, the previous value of the decision variable is multiplied by a number less than 1 before new information data is added.) The Viterbi decoder structure can be modified to support the operation of this procedure.

본 양태는 1/30 초마다 새로운 프레임이 나타나는 통상의 비디오를 처리하는 관점에서 설명되지만, 이러한 프로세스는 시간상 역방향으로 기록 및 처리되는 프레임에 적용될 수도 있다. 그 결정 공간은 동일 상태를 유지하지만, 입력 프레임 시퀀스의 시간 반전을 나타내는 중요치 않은 변화가 존재한다. 예를 들어, (여기에 도시된) 시간 반전된 모드로부터의 코히런트 텔레시네 결정의 스트링인,Although this aspect is described in terms of processing a typical video in which new frames appear every 1/30 second, this process may be applied to frames that are recorded and processed backward in time. The decision space remains the same, but there is an insignificant change that represents the time reversal of the input frame sequence. For example, a string of coherent telecine crystals from a time inverted mode (shown here),

P₄ P₃ P₂ P₁ P₀ P ₄ P ₃ P ₂ P ₁ P ₀

또한 시간상 반전될 것이다.It will also be reversed in time.

제 1 양태에 대한 이러한 변형을 이용하면, 결정 프로세스가, 성공적인 결정 수행 시, 2 회의 시도 (한 번의 시도는 시간상 순방향이며, 다른 한 번의 시도는 시간상 역방향임) 를 할 수 있다. 2 회의 시도가 독립적이지 않은 동안에는, 각각의 시도가 상이한 순서로 메트릭을 처리하도록 상이할 것이다.Using this variant of the first aspect, the decision process can make two attempts (one attempt is forward in time and the other forward in time) upon successful decision making. While the two attempts are not independent, each attempt will be different to process the metric in a different order.

이러한 사상은 부가물 (additional) 을 요구할 수도 있는 미래의 비디오 프레임을 저장하도록 유지된 버퍼와 관련하여 적용될 수 있다. 일 비디오 세그먼트가 순방향 처리에 있어서 허용할 수 없을 정도로 일치하지 않는 결과를 주는 것으로 판정되면, 그 절차는 버퍼로부터 미래의 프레임을 도출하고, 역방향으로 프레임을 처리함으로써 난해한 비디오 스트레치를 해결하려는 시도를 할 것이다.This idea can be applied in the context of a buffer maintained to store future video frames that may require additional. If one video segment is determined to give unacceptably inconsistent results in the forward processing, the procedure may attempt to resolve the difficult video stretch by deriving future frames from the buffer and processing the frames in the reverse direction. will be.

또한, 본 출원에서 설명된 비디오의 처리는 PAL 포맷의 비디오에 적용될 수 있다.In addition, the processing of video described in this application can be applied to video in a PAL format.

디인터레이서Deinterlacer

본원에서 이용되는 "디인터레이서" 는 광의의 용어로서, 전체적으로 또는 상당 부분 인터레이싱된 멀티미디어 데이터를 처리하여 순차 멀티미디어 데이터를 형성하는 디인터레이싱 시스템, 디바이스, 또는 프로세스 (예를 들어, 프로세스를 수행하도록 구성된 소프트웨어, 펌웨어, 또는 하드웨어를 포함함) 를 설명하는데 이용될 수 있다.As used herein, “deinterlacer” is a broad term and refers to a deinterlacing system, device, or process (e.g., software configured to perform a process, that processes a wholly or substantial portion of interlaced multimedia data to form sequential multimedia data. Firmware, or hardware).

비디오 카메라나 방송 스튜디오 등에서 통상적으로 생성되는 방송 비디오는 미국의 NTSC 표준에 따른다. 비디오를 압축하는 일반적인 방법은 비디오를 인터레이싱하는 것이다. 인터레이싱된 데이터에 있어서, 각각의 프레임은 2 개 필드 중 하나의 필드로 이루어진다. 한 필드는 프레임의 홀수 라인으로 이루어지고, 다른 필드는 프레임의 짝수 라인으로 이루어진다. 프레임이 약 30 프레임/초로 생성되는 동안에, 필드는 1/60 초 떨어져 있는 텔레비전 카메라의 이미지의 레코드이다. 인터레이싱된 비디오 신호의 각 프레임은 하나 걸러서 이미지의 수평 라인을 나타낸다. 프레임이 화면 상에 투영될 때, 비디오 신호는 짝수 라인과 홀수 라인 사이에서 교번한다. 이러한 것이 충분히 빨리, 예를 들어, 약 초당 60 프레임으로 행해질 때, 비디오 이미지는 사람 눈에 매끄럽게 보인다.Broadcast video that is typically generated by a video camera or broadcast studio is in accordance with the US NTSC standard. A common way of compressing video is to interlace the video. In interlaced data, each frame consists of one of two fields. One field consists of odd lines of the frame and the other field consists of even lines of the frame. While the frame is generated at about 30 frames / second, the field is a record of the image of the television camera 1/60 second away. Each frame of the interlaced video signal represents every other horizontal line of the image. When a frame is projected onto the screen, the video signal alternates between even and odd lines. When this is done fast enough, for example at about 60 frames per second, the video image looks smooth to the human eye.

인터레이싱은 NTSC (미국) 및 PAL (유럽) 포맷에 기초하는 아날로그 텔레비 전 방송에서 수십년 동안 이용되고 있다. 이미지의 절반만을 각각의 프레임으로 송신하면 되므로, 인터레이싱된 비디오는 전체 영상을 송신하는 경우보다 대략 절반의 대역폭을 이용하게 된다. 단말기 (160) 내부에서의 결과로서 생성된 비디오의 디스플레이 포맷은 반드시 NTSC 호환성이 있는 것은 아니며, 인터레이싱된 데이터를 쉽게 표시할 수 없다. 그 대신에, 최신 화소 기반 디스플레이 (예를 들어, LCD, DLP, LCOS, 플라즈마 등) 는 순차 스캔되어, 순차 스캔된 비디오 소스를 표시한다 (그런데, 다수의 구형 비디오 디바이스는 구형의 인터레이싱된 스캔 기술을 이용함). 몇몇 일반적으로 이용되는 디인터레이싱 알고리즘의 예는 "Scan rate up-conversion using adaptive weighted median filtering," P. Haavisto, J. Juhola, and Y. Neuvo, Signal Processing of HDTV Ⅱ, pp. 703-710, 1990, and "Deinterlacing of HDTV Images for Multimedia Applications," R. Simonetti, S. Carrato, G. Ramponi, and A. Polo Filisan, in Signal Processing of HDTV Ⅳ, pp. 765-772, 1993 에 설명되어 있다.Interlacing has been used for decades in analog television broadcasts based on NTSC (US) and PAL (Europe) formats. Since only half of the image needs to be transmitted in each frame, the interlaced video uses approximately half the bandwidth of the entire image. The display format of the video generated as a result inside terminal 160 is not necessarily NTSC compatible and cannot easily display interlaced data. Instead, modern pixel-based displays (e.g., LCD, DLP, LCOS, Plasma, etc.) are sequentially scanned to display sequentially scanned video sources (however, many older video devices are spherical interlaced scans). Technology). Examples of some commonly used deinterlacing algorithms are "Scan rate up-conversion using adaptive weighted median filtering," P. Haavisto, J. Juhola, and Y. Neuvo, Signal Processing of HDTV II , pp. 703-710, 1990, and "Deinterlacing of HDTV Images for Multimedia Applications," R. Simonetti, S. Carrato, G. Ramponi, and A. Polo Filisan, in Signal Processing of HDTV IV , pp. 765-772, 1993.

이하, 디인터레이싱의 성능을 개선하는데 단독으로 또는 조합하여 이용될 수 있으며, 디인터레이서 (405; 도 4) 에서 이용될 수 있는 시스템 및 방법에 대한 디인터레이싱 양태의 예를 설명한다. 그러한 양태는 공간-시간 필터링을 이용하여 선택된 프레임을 디인터레이싱함으로써 제 1 임시 디인터레이싱된 프레임을 결정하는 것, 양방향 모션 추정 및 모션 보상을 이용하여 선택된 프레임으로부터 제 2 임시 디인터레이싱된 프레임을 결정하는 것, 또한 제 1 및 제 2 임시 프레임을 조합하여 최종 순차 프레임을 형성하는 것을 포함할 수 있다. 공간-시간 필터 링은 수평 또는 근방의 수평 에지의 블러링을 방지하는 수평 에지 검출기를 포함할 수 있는 Wmed (weighted median) 필터를 이용할 수 있다. "현재" 필드에 인접한 이전 및 후속 필드의 공간-시간 필터링은 예를 들어, 정적, 슬로우-모션, 및 패스트 모션과 같은 상이한 모션 레벨로 선택된 프레임의 일부를 분류하는 세기 모션-레벨 맵을 생성한다.An example of a deinterlacing aspect for a system and method that can be used alone or in combination to improve the performance of deinterlacing, and that can be used in the deinterlacer 405 (FIG. 4) is described. Such an aspect may include determining a first temporary deinterlaced frame by deinterlacing the selected frame using space-time filtering, determining a second temporary deinterlaced frame from the selected frame using bidirectional motion estimation and motion compensation, and also Combining the first and second temporary frames to form a final sequential frame. Space-time filtering may use a Wmed (weighted median) filter that may include a horizontal edge detector that prevents blurring of horizontal or near horizontal edges. Spatial-temporal filtering of previous and subsequent fields adjacent to the "current" field produces an intensity motion-level map that classifies a portion of the selected frame into different motion levels such as, for example, static, slow-motion, and fast motion. .

몇몇 양태에서, 세기 맵은 5 개의 인접한 필드 (2 개의 이전 필드, 현재 필드, 및 2 개의 다음 필드) 로부터의 화소를 포함한 필터링 애퍼처를 이용하여 Wmed 필터링에 의해 생성된다. Wmed 필터링은 장면 전환 및 출몰하는 대상을 효과적으로 다룰 수 있도록 순방향, 역방향, 및 양방향 정적 영역 검출을 결정할 수 있다. 여러 양태에서, Wmed 필터는 필드 간 (inter-field) 필터링 모드에서 동일 패리티를 갖는 하나 이상의 필드에 걸쳐 이용될 수 있고, 임계값 기준을 미조정 (tweak) 함으로서 필드 내 (intra-field) 필터링 모드로 전환될 수 있다. 몇몇 양태에서, 모션 추정 및 보상은 루마 (화소의 세기나 밝기) 및 크로마 데이터 (화소의 컬러 정보) 를 이용하여, 선택된 프레임의 디인터레이싱 영역을 개선하는데, 밝기 레벨은 거의 균일하지만 컬러는 상이하다. 디노이징 필터는 모션 추정의 정확도를 증가시키는데 이용될 수 있다. Wmed 디인터레이싱된 임시 프레임에 디노이징 필터를 적용하여, Wmed 필터링에 의해 생성된 에일리어스 아티팩트를 제거할 수 있다. 아래에서 설명되는 디인터레이싱 방법 및 시스템은 양호한 디인터레이싱 결과를 생성하며 비교적 낮은 계산 복잡도를 가지므로, 고속으로 실행되는 디인터레이싱 구현을 허용함으로써, 그러한 구현을 디스플레이를 이용하는 휴대 전화, 컴퓨터 및 다른 타입의 전자 또는 통신 디바이스에 데이터를 제공하는데 이용되는 시스템을 비롯한 다양한 디인터레이싱 애플리케이션에 적합하게 만든다.In some aspects, the intensity map is generated by Wmed filtering using filtering apertures including pixels from five adjacent fields (two previous fields, current field, and two next fields). Wmed filtering can determine forward, reverse, and bidirectional static region detection to effectively handle scene transitions and haunting objects. In various aspects, the Wmed filter can be used across one or more fields having the same parity in inter-field filtering mode, and intra-field filtering mode by tweaking the threshold criteria. Can be switched to. In some aspects, motion estimation and compensation utilizes luma (pixel intensity or brightness) and chroma data (pixel color information) to improve the deinterlacing area of the selected frame, where the brightness levels are nearly uniform but the colors are different. Denoising filters can be used to increase the accuracy of motion estimation. Denoising filters can be applied to the Wmed deinterlaced temporary frames to remove alias artifacts generated by Wmed filtering. The deinterlacing methods and systems described below produce good deinterlacing results and have relatively low computational complexity, allowing for deinterlacing implementations that run at high speeds, thereby allowing such implementations to be used in mobile phones, computers, and other types of electronic or communications using displays. It is suitable for a variety of deinterlacing applications, including the system used to provide data to the device.

본원에서, 디인터레이서 및 디인터레이싱 방법의 양태는 멀티미디어 데이터를 디인터레이싱하는데 이용되는 여러 컴포넌트, 모듈 및/또는 단계를 참조하여 설명된다.Aspects of the deinterlacer and deinterlacing method are described herein with reference to various components, modules, and / or steps used to deinterlace multimedia data.

도 17 은 도 4 의 디인터레이서 (405) 로서 이용될 수 있는 디인터레이서 (1700) 의 일 양태를 도시한 블록도이다. 디인터레이서 (1700) 는 인터레이싱된 데이터의 적어도 일부를 공간 및 시간 ("공간-시간") 필터링하며 공간-시간 정보를 생성하는 공간 필터 (1730) 를 포함한다. 예를 들어, Wmed 는 공간 필터 (1730) 에서 이용될 수 있다. 또한, 몇몇 양태에서, 디인터레이서 (1700) 는 예를 들어, 위너 필터 (Weiner filter) 또는 웨이브렛 축소 필터와 같은 디노이징 필터 (도시생략) 를 포함한다. 또한, 디인터레이서 (1700) 는 인터레이싱된 데이터의 선택된 프레임의 모션 추정 및 보상을 제공하며 모션 정보를 생성하는 모션 추정기 (1732) 를 포함한다. 결합기 (1734) 는 공간-시간 정보와 모션 정보를 수신 및 조합하여 순차 프레임을 형성한다.FIG. 17 is a block diagram illustrating an aspect of a deinterlacer 1700 that can be used as the deinterlacer 405 of FIG. 4. The deinterlacer 1700 includes a spatial filter 1730 that filters at least a portion of the interlaced data space and time (“space-time”) and generates space-time information. For example, Wmed can be used in spatial filter 1730. Further, in some aspects, the deinterlacer 1700 includes a denoising filter (not shown), such as, for example, a Weiner filter or a wavelet reduction filter. The deinterlacer 1700 also includes a motion estimator 1732 that provides motion estimation and compensation of selected frames of interlaced data and generates motion information. The combiner 1734 receives and combines the space-time information and the motion information to form a sequential frame.

도 18 은 디인터레이서 (1700) 의 다른 블록도이다. 디인터레이서 (1700) 내의 프로세서 (1836) 는 공간 필터 모듈 (1838), 모션 추정 모듈 (1840), 및 결합기 모듈 (1842) 을 포함한다. 외부 소스 (1848) 로부터의 인터레이싱된 멀티미디어 데이터는 디인터레이서 (1700) 내의 통신 모듈 (1844) 에 제공될 수 있다. 디인터레이서, 및 그 컴포넌트나 단계는 하드웨어, 소프트웨어, 펌웨어, 미들웨어, 마이크로코드, 또는 그 임의의 조합에 의해 구현될 수 있다. 예를 들어, 디인터레이서는 자립형 컴포넌트이거나, 다른 디바이스의 컴포넌트 내에 하드웨어, 펌웨어, 미들웨어로서 포함되거나, 프로세서 상에서 실행되는 마이크로코드나 소프트웨어에 구현되거나, 또는 그 조합으로 될 수도 있다. 소프트웨어, 펌웨어, 미들웨어 또는 마이크로코드에 구현 시, 디인터레이서 작업을 수행하는 프로그램 코드 또는 코드 세그먼트는 저장 매체와 같은 머신 판독가능 매체에 저장될 수도 있다. 코드 세그먼트는 절차, 함수, 서브프로그램, 프로그램, 루틴, 서브루틴, 모듈, 소프트웨어 패키지, 클래스, 또는 명령들, 데이터 구조들 또는 프로그램 문장들의 임의의 조합을 표현할 수도 있다. 코드 세그먼트는 정보, 데이터, 인수, 파라미터, 또는 메모리 콘텐츠를 전달하고/하거나 수신함으로써 다른 코드 세그먼트 또는 하드웨어 회로에 연결될 수도 있다.18 is another block diagram of deinterlacer 1700. The processor 1836 in the deinterlacer 1700 includes a spatial filter module 1838, a motion estimation module 1840, and a combiner module 1842. Interlaced multimedia data from an external source 1848 may be provided to the communication module 1844 in the deinterlacer 1700. The deinterlacer, and its components or steps, may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. For example, the deinterlacer may be a standalone component, included as hardware, firmware, middleware in a component of another device, implemented in microcode or software running on a processor, or a combination thereof. When implemented in software, firmware, middleware, or microcode, program code or code segments that perform deinterlacer operations may be stored on a machine-readable medium, such as a storage medium. A code segment may represent a procedure, function, subprogram, program, routine, subroutine, module, software package, class, or any combination of instructions, data structures, or program statements. Code segments may be coupled to other code segments or hardware circuits by communicating and / or receiving information, data, arguments, parameters, or memory content.

수신되는 인터레이싱된 데이터는, 예를 들어, 프로세서 (1836) 에 연결되는 칩 구성된 저장 매체 (예를 들어, ROM, RAM) 또는 디스크 타입 저장 매체 (예를 들어, 자기 또는 광학식) 를 포함할 수 있는 저장 매체 (1846) 내의 디인터레이서 (1700) 에 저장될 수 있다. 몇몇 양태에서, 프로세서 (1836) 는 저장 매체의 일부 또는 전부를 포함할 수 있다. 프로세서 (1836) 는 인터레이싱된 멀티미디어 데이터를 처리하여 순차 프레임을 형성하도록 구성되고, 이 순차 프레임은 다른 디바이스 또는 프로세스에 제공된다.The received interlaced data may include, for example, a chip configured storage medium (eg, ROM, RAM) or disk type storage medium (eg, magnetic or optical) coupled to the processor 1836. And may be stored in a deinterlacer 1700 in a storage medium 1846. In some aspects, the processor 1836 may include some or all of the storage medium. The processor 1836 is configured to process the interlaced multimedia data to form a sequential frame, which is provided to another device or process.

텔레비전과 같은 통상의 아날로그 비디오 디바이스는 인터레이싱된 방식으로 비디오를 렌더링하는데, 즉, 그러한 디바이스는 짝수 번호를 갖는 스캔 라인 (짝수 필드) 과, 홀수 번호를 갖는 스캔 라인 (홀수 필드) 을 송신한다. 신호 샘플링 관점에서, 이는 다음과 같이 설명되는 패턴에서의 공간-시간 서브샘플링과 등가이다.Conventional analog video devices, such as televisions, render video in an interlaced manner, that is, such devices transmit scan lines with even numbers (even fields) and scan lines with odd numbers (odd fields). In terms of signal sampling, this is equivalent to space-time subsampling in the pattern described as follows.

여기서, Θ 는 원래 프레임 영상을 나타내고, F 는 인터레이싱된 필드를 나타내며, (x, y, n) 은 각각 화소의 수평, 수직, 및 시간 위치를 나타낸다.Here, Θ denotes an original frame image, F denotes an interlaced field, and ( x, y, n ) denotes horizontal, vertical, and temporal positions of pixels, respectively.

일반성을 잃지 않으면서, n = 0 이 본 개시내용 전체에서 짝수 필드이므로, 식 (23) 은 다음과 같이 단순화될 수 있다.Without losing generality, since n = 0 is an even field throughout this disclosure, equation (23) can be simplified as follows.

데시메이션 (decimation) 이 수평 방향으로 수행되므로, 서브샘플링 패턴은 다음 n ~ y 좌표에서 표시될 수 있다. 도 19 에서, 원 모양과 별 모양 모두는 원래 풀-프레임 영상이 샘플 화소를 갖는 위치를 나타낸다. 인터레이싱 프로세스는 원 모양 화소에 영향을 주지 않으면서 별 모양 화소를 데시메이션한다. 수직 위치를 0 에서부터 인덱싱하므로, 짝수 필드가 상부 필드 (top field) 이고, 홀수 필드가 하위 필드 (bottom field) 인 것에 주목해야 한다.Since the decimation is performed in the horizontal direction, the subsampling pattern can be displayed at the next n to y coordinates. In Fig. 19, both the circle shape and the star shape indicate a position where the original full-frame image has sample pixels. The interlacing process decimates star pixels without affecting the circular pixels. Note that since the vertical position is indexed from zero, the even field is the top field and the odd field is the bottom field.

디인터레이서의 목적은 인터레이싱된 비디오 (필드 시퀀스) 를 인터레이싱되 지 않은 순차 프레임 (프레임 시퀀스) 으로 변환하는 것이다. 즉, 짝수 필드와 홀수 필드를 보간 (interpolate) 하여 풀-프레임 영상을 "복원" 하거나 생성하는 것이다. 이는 식 (25) 에 의해 표현될 수 있다.The purpose of the deinterlacer is to convert the interlaced video (field sequence) into an uninterlaced sequential frame (frame sequence). In other words, the full-frame image is "restored" or generated by interpolating even and odd fields. This can be expressed by equation (25).

여기서, F _i 는 손실된 화소에 대한 디인터레이싱 결과를 나타낸다.Here, F _i represents the deinterlacing result for the lost pixel.

도 20 은 Wmed 필터링 및 모션 추정을 이용하여 인터레이싱된 멀티미디어 데이터로부터 순차 프레임을 생성하는 디인터레이서의 일 양태의 몇몇 양상을 도시한 블록도이다. 도 20 의 상부는 현재 필드, 2 개의 이전 필드 (PP 필드 및 P 필드), 및 2 개의 후속 필드 (다음 필드 (Next Field) 및 다음 다음 필드 (Next Next Field)) 로부터의 정보를 이용하여 생성될 수 있는 모션 세기 맵 (2052) 을 도시한다. 아래에서 더 상세히 설명되는 바와 같이, 모션 세기 맵 (2052) 은 현재 프레임을 2 개 이상의 상이한 모션 레벨로 분류하거나 분할하고, 공간-시간 필터링에 의해 생성될 수 있다. 아래에서 식 (4) 내지 식 (8) 을 참조하여 설명되는 바와 같이, 몇몇 양태에서는, 모션 세기 맵 (2052) 을 생성하여, 정적 영역, 슬로우-모션 영역, 및 패스트-모션 영역을 식별한다. 예를 들어, Wmed 필터 (2054) 와 같은 공간-시간 필터는 모션 세기 맵에 기초한 기준을 이용하여 인터레이싱된 멀티미디어 데이터를 필터링하고, 공간-시간 임시 디인터레이싱된 프레임을 생성한다. 몇몇 양태에서, Wmed 필터링 프로세스는 [-1, 1] 의 수평 근방, [-3, 3] 의 수직 근방, 및 5 개의 인접한 필드의 시간 근방을 수반하고, 이는 도 20 에 도시된 5 개의 필드 (PP 필드, P 필드, 현재 필드, 다음 필드, 다음 다음 필드) 에 의해 표현되며, Z^-1 은 일 필드의 지연을 나타낸다. 현재 필드에 대해, 다음 필드와 P 필드는 넌-패리티 필드이고, PP 필드와 다음 다음 필드는 패리티 필드이다. 공간-시간 필터링에 이용되는 "근방" 은 필터링 동작 동안에 실제 이용되는 필드와 화소의 공간 및 시간 로케이션을 지칭하고, 예를 들어, 도 21 및 도 22 에 도시된 것과 같은 "애퍼처" 로서 나타낼 수 있다.20 is a block diagram illustrating some aspects of an aspect of a deinterlacer that generates sequential frames from interlaced multimedia data using Wmed filtering and motion estimation. The upper portion of FIG. 20 can be generated using information from the current field, two previous fields (PP field and P field), and two subsequent fields (Next Field and Next Next Field). A motion intensity map 2052 is shown. As described in more detail below, motion intensity map 2052 may classify or divide the current frame into two or more different motion levels, and may be generated by space-time filtering. As described below with reference to equations (4)-(8), in some aspects, a motion intensity map 2052 is generated to identify static regions, slow-motion regions, and fast-motion regions. For example, a space-time filter, such as Wmed filter 2054, filters the interlaced multimedia data using criteria based on the motion intensity map, and generates a space-time temporary deinterlaced frame. In some aspects, the Wmed filtering process involves a horizontal neighborhood of [-1, 1], a vertical neighborhood of [-3, 3], and a time neighborhood of five adjacent fields, which is illustrated by the five fields shown in FIG. PP field, P field, current field, next field, next next field), Z- ¹ represents a delay of one field. For the current field, the next field and the P field are non-parity fields, the PP field and the next next field are parity fields. “Nearby” used for space-time filtering may refer to the spatial and temporal location of fields and pixels actually used during the filtering operation, and may be represented as, for example, “aperture” as shown in FIGS. 21 and 22. have.

또한, 디인터레이서는 디노이저 (잡음 제거 필터) (2056) 를 포함할 수 있다. 디노이저 (2056) 는 Wmed 필터 (2056) 에 의해 생성된 공간-시간 임시 디인터레이싱된 프레임을 필터링하도록 구성된다. 공간-시간 임시 디인터레이싱된 프레임을 잡음 제거함으로써, 소스 인터레이싱된 멀티미디어 데이터 시퀀스가 백색 잡음에 의해 오염되는 경우에 특히, 후속 모션 검색 프로세스를 보다 정확하게 만든다. 또한, Wmed 영상 내의 짝수 행과 홀수 행 사이의 에일리어스를 적어도 부분적으로 제거할 수 있다. 디노이저 (2056) 는 또한 아래에서 더 설명되는 웨이브렛 축소 및 웨이브렛 위너 필터를 비롯한 다양한 필터로서 구현될 수 있다.The deinterlacer may also include a denoiser (noise cancellation filter) 2056. Denoiser 2056 is configured to filter the space-time temporary deinterlaced frame generated by Wmed filter 2056. Noise canceling the space-time temporary deinterlaced frame makes the subsequent motion search process more accurate, especially when the source interlaced multimedia data sequence is contaminated by white noise. In addition, an alias between even and odd rows in the Wmed image may be at least partially removed. Denoiser 2056 may also be implemented as various filters, including the wavelet reduction and wavelet winner filters described further below.

도 20 의 하부는 인터레이싱된 멀티미디어 데이터의 모션 정보 (예를 들어, 모션 벡터 후보, 모션 추정, 모션 보상) 를 결정하는 일 양태를 도시한다. 특히, 도 20 은, 선택된 프레임의 모션 보상된 임시 순차 프레임을 생성하는데 이용 된 다음에 Wmed 임시 프레임과 결합되어 결과로서 생성된 "최종" 순차 프레임 (디인터레이싱된 현재 프레임 (2064) 으로서 도시됨) 을 형성하는 모션 추정 및 모션 보상 방식을 도시한다. 몇몇 양태에서, 인터레이싱된 멀티미디어 데이터의 모션 벡터 (MV) 후보 (또는 추정치) 는 외부 모션 추정기로부터 디인터레이서에 제공되고, 양방향 모션 추정기 및 보상기 (2068; ME/MC) 에 대한 시작 포인트를 제공하는데 이용된다. 몇몇 양태에서, MV 후보 선택기 (2072) 는 예를 들어, 디인터레이싱된 이전 프레임 (2070) 내의 블록과 같은 이전 처리된 블록의 MV 와 같이, 처리 중인 블록의 MV 후보에 대한 근방 블록에 대해 이전에 결정된 MV 를 이용한다. 모션 보상은 이전 디인터레이싱된 프레임 (2070) 및 다음 (예를 들어, 미래) Wmed 프레임 (2058) 에 기초하여 양방향으로 행해질 수 있다. 현재 Wmed 프레임 (2060) 및 모션 보상된 (MC) 현재 프레임 (2066) 은 결합기 (2062) 에 의해 병합되거나 결합된다. 결과로서 생성되는 디인터레이싱된 현재 프레임 (2064), 즉, 순차 프레임은 ME/MC (2068) 에 다시 제공되어, 디인터레이싱된 이전 프레임 (2070) 으로서 이용되고, 또한 예를 들어, 압축 및 디스플레이 단말기로의 송신과 같은 후속 처리를 위해 디인터레이서 외부로 통신된다. 도 20 에 도시된 여러 양태는 아래에서 보다 상세히 설명된다.The lower part of FIG. 20 illustrates an aspect of determining motion information (eg, motion vector candidate, motion estimation, motion compensation) of interlaced multimedia data. In particular, FIG. 20 shows the "final" sequential frame (shown as deinterlaced current frame 2064), which is then used to generate a motion compensated temporary sequential frame of the selected frame and then combined with the Wmed temporary frame. A motion estimation and motion compensation scheme to form is shown. In some aspects, a motion vector (MV) candidate (or estimate) of interlaced multimedia data is provided to a deinterlacer from an external motion estimator and used to provide a starting point for a bidirectional motion estimator and compensator 2068 (ME / MC). do. In some aspects, the MV candidate selector 2072 is previously determined for the neighboring block for the MV candidate of the processing block, such as, for example, the MV of a previously processed block, such as the block in the deinterlaced previous frame 2070. Use MV. Motion compensation can be done in both directions based on the previous deinterlaced frame 2070 and the next (eg, future) Wmed frame 2058. The current Wmed frame 2060 and the motion compensated (MC) current frame 2066 are merged or combined by combiner 2062. The resulting deinterlaced current frame 2064, i.e., sequential frame, is provided back to the ME / MC 2068 to be used as the deinterlaced previous frame 2070 and also to, for example, a compression and display terminal. It is communicated outside the deinterlacer for subsequent processing such as transmission. Various aspects shown in FIG. 20 are described in more detail below.

도 25 는 멀티미디어 데이터를 처리하여 인터레이싱된 프레임의 시퀀스로부터 순차 프레임의 시퀀스를 생성하는 프로세스 (2500) 를 도시한다. 일 양태에서, 순차 프레임은 도 4 에 도시된 디인터레이서 (405) 에 의해 생성된다. 블록 2502 에서, 프로세스 (2500; 프로세스 A) 는 선택된 프레임에 대한 공간-시간 정보를 생성한다. 공간-시간 정보는 멀티미디어 데이터의 모션 레벨을 분류하여 모션 세기 맵을 생성하는데 이용되는 정보를 포함할 수 있고, Wmed 임시 디인터레이싱된 프레임 및 그 프레임을 생성하는데 이용되는 정보 (예를 들어, 식 (26) 내지 식 (33) 에서 이용되는 정보) 를 포함한다. 이러한 프로세스는 도 20 의 상부에 도시된 바와 같이, Wmed 필터 (2054) 에 의해 수행될 수 있고, 그 연관된 처리는 아래에서 더 상세히 설명된다. 도 26 에 도시된 프로세스 A 에서는, 아래에서 더 설명되는 바와 같이, 블록 2602 에서, 영역들은 상이한 모션 레벨의 필드로 분류된다.25 illustrates a process 2500 for processing multimedia data to generate a sequence of sequential frames from a sequence of interlaced frames. In one aspect, the sequential frame is generated by the deinterlacer 405 shown in FIG. At block 2502, process 2500 (process A) generates space-time information for the selected frame. The spatio-temporal information may include information used to classify the motion level of the multimedia data to generate a motion intensity map, and may include the Wmed temporary deinterlaced frame and the information used to generate the frame (e.g., equation (26). ) To information used in formula (33). This process may be performed by the Wmed filter 2054, as shown at the top of FIG. 20, the associated process described in more detail below. In process A shown in FIG. 26, as further described below, in block 2602, regions are classified into fields of different motion levels.

다음에, 블록 2504 (프로세스 B) 에서, 프로세스 2500 는 선택된 프레임에 대한 모션 보상 정보를 생성한다. 일 양태에서, 도 20 의 하부에 도시되어 있는 양방향 모션 추정기/모션 보상기 (2068) 는 이러한 프로세스를 수행할 수 있다. 그 다음에, 프로세스 (2500) 는 블록 2506 으로 진행하여, 공간-시간 정보 및 모션 보상 정보에 기초하여 선택된 프레임의 필드를 디인터레이싱함으로써 선택된 프레임과 연관된 순차 프레임을 형성한다. 이는 도 20 의 하부에 도시되어 있는 결합기 (2062) 에 의해 수행될 수 있다.Next, at block 2504 (process B), process 2500 generates motion compensation information for the selected frame. In one aspect, the bidirectional motion estimator / motion compensator 2068 shown at the bottom of FIG. 20 may perform this process. Process 2500 then proceeds to block 2506 to form a sequential frame associated with the selected frame by deinterlacing the fields of the selected frame based on the space-time information and the motion compensation information. This can be done by the coupler 2062 shown at the bottom of FIG. 20.

모션 세기 맵Motion intensity map

각각의 프레임에 대해, 상이한 "모션" 의 영역들을 결정하도록 현재 필드에서 화소를 처리함으로써 모션 세기 맵 (2052) 을 결정할 수 있다. 3 개 카테고리의 모션 세기 맵을 결정하는 예시적인 양태는 도 21 내지 도 24 를 참조하여 아래에서 설명된다. 모션 세기 맵은 동일 패리티 필드 및 상이한 패리티 필드에 서 화소를 비교하는 것에 기초하여 각각의 프레임의 영역을 정적 영역, 슬로우-모션 영역, 및 패스트 모션 영역으로서 지정한다.For each frame, the motion intensity map 2052 can be determined by processing the pixel in the current field to determine regions of different “motion”. Exemplary aspects for determining the three categories of motion intensity maps are described below with reference to FIGS. 21-24. The motion intensity map designates regions of each frame as static regions, slow-motion regions, and fast motion regions based on comparing pixels in the same parity field and different parity fields.

정적 영역Static area

모션 맵의 정적 영역을 결정하는 것은, 일정 화소(들)의 휘도 차이가 일정 기준을 만족하는지 여부를 결정하기 위해 인접 필드의 근방에 있는 화소를 처리하는 것을 포함할 수 있다. 몇몇 양태에서, 모션 맵의 정적 영역을 결정하는 것은 5 개의 인접 필드 (현재 필드 (C), 현재 필드에 대해 시간상 이전에 있는 2 개의 필드, 및 현재 필드에 대해 시간상 다음에 있는 2 개의 필드) 의 근방에 있는 화소를 처리하여, 일정 화소(들)의 휘도 차이가 일정 임계값을 만족하는지 여부를 결정하는 것을 포함한다. 이들 5 개의 필드는 도 20 에 도시되어 있고, Z^-1 은 일 필드의 지연을 나타낸다. 즉, 통상, 5 개의 인접 필드는 Z^-1 의 시간 지연을 갖는 그러한 시퀀스에서 표시될 것이다.Determining the static region of the motion map may include processing a pixel in the vicinity of the adjacent field to determine whether the luminance difference of the given pixel (s) meets a certain criterion. In some aspects, determining the static area of the motion map includes determining of five adjacent fields (current field (C), two fields in time before the current field, and two fields in time after the current field). Processing nearby pixels to determine whether the luminance difference of the given pixel (s) satisfies a certain threshold. These five fields are shown in FIG. 20, where Z- ¹ represents the delay of one field. That is, typically five adjacent fields will be indicated in such a sequence with a time delay of Z ⁻¹ .

도 21 은 몇몇 양태에 따른 공간-시간 필터링에 이용될 수 있는 5 개의 필드 각각의 일정 화소를 식별하는 애퍼처를 도시한다. 애퍼처는, 왼쪽에서 오른쪽으로, 이전 이전 필드 (PP), 이전 필드 (P), 현재 필드 (C), 다음 필드 (N), 및 다음 다음 필드 (NN) 의 3×3 화소 그룹을 포함한다. 몇몇 양태에서, 현재 필드의 영역은, 식 (26) 내지 식 (28) 에 설명된 기준인, 도 21 에 도시되어 있는 화소 위치 및 그 대응 필드를 만족하는 경우에, 모션 맵에서 정적으로 간주된다.21 illustrates an aperture identifying a constant pixel of each of the five fields that may be used for space-time filtering in accordance with some aspects. The aperture contains a 3x3 pixel group of previous previous field (PP), previous field (P), current field (C), next field (N), and next next field (NN), from left to right. . In some aspects, the area of the current field is considered static in the motion map if it satisfies the pixel position shown in FIG. 21 and its corresponding field, the criteria described in equations (26) to (28). .

및And

또는or

여기서, T ₁ 은 임계값이고,Where T ₁ is the threshold,

L _P 는 P 필드에 위치한 화소 P 의 휘도이고, L _P is the luminance of pixel P located in the P field,

L _N 은 N 필드에 위치한 화소 N 의 휘도이고, L _N is the luminance of pixel N located in the N field,

L _B 는 현재 필드에 위치한 화소 B 의 휘도이고, L _B is the luminance of pixel B located in the current field,

L _E 는 현재 필드에 위치한 화소 E 의 휘도이고, L _E is the luminance of pixel E located in the current field,

L _BPP 는 PP 필드에 위치한 화소 B_PP 의 휘도이고, L _BPP is the luminance of pixel B _PP located in the PP field,

L _EPP 는 PP 필드에 위치한 화소 E_PP 의 휘도이고, L _EPP is the luminance of pixel E _PP located in the PP field,

L _BNN 은 NN 필드에 위치한 화소 B_NN 의 휘도이며, L _BNN is the luminance of pixel B _NN located in the NN field,

L _ENN 은 NN 필드에 위치한 화소 E_NN 의 휘도이다. L _ENN is the luminance of pixel E _NN located in the NN field.

임계값 T ₁ 은 특정 값으로 미리 결정 및 설정되고, 디인터레이싱 이외의 프로세스에 의해 결정 및 (예를 들어, 디인터레이싱되는 비디오에 대한 메타데이터로 서) 제공되거나, 디인터레이싱 동안에 동적으로 결정될 수 있다.The threshold T ₁ may be predetermined and set to a specific value, determined and provided by a process other than deinterlacing (eg, as metadata for the deinterlaced video), or may be determined dynamically during deinterlacing.

식 (26), 식 (27), 및 식 (28) 에서 상술한 정적 영역 기준은 적어도 2 가지 이유때문에 통상적인 디인터레이싱 기술보다 많은 필드를 이용한다. 첫 번째로, 동일 패리티 필드 간의 비교는 상이한 패리티 필드 간의 비교보다 낮은 에일리어스 및 위상-불일치를 갖는다. 그러나, 처리 중인 필드와 그 가장 인접한 동일 패리티 필드 근방 간의 최소 시간 차 (이에 따른 상관) 는 2 개의 필드로서, 그 상이한 패리티 필드 근방으로부터의 최소 시간 차보다 크다. 보다 신뢰성 있는 상이한 패리티 필드와 보다 낮은 에일리어스의 동일 패리티 필드의 조합은 정적 영역 검출의 정확도를 향상시킬 수 있다.The static region criteria described above in Equations (26), (27), and (28) use more fields than conventional deinterlacing techniques for at least two reasons. First, comparisons between the same parity fields have lower aliases and phase-inconsistencies than comparisons between different parity fields. However, the minimum time difference (and thus the correlation) between the field being processed and its vicinity near the same parity field is two fields, which is greater than the minimum time difference from near its different parity field. The combination of more reliable different parity fields and lower aliased parity fields can improve the accuracy of static region detection.

또한, 도 21 에 도시된 바와 같이, 5 개의 필드는 현재 프레임 C 내의 화소 X 에 대해 과거 및 현재에서 대칭으로 분포될 수 있다. 정적 영역은 3 개의 카테고리로 세분될 수 있다: 순방향 정적 (이전 프레임에 대해 정적), 역방향 정적 (다음 프레임에 대해 정적), 또는 양방향 정적 (순방향 기준과 역방향 기준이 모두 만족되는 경우). 이러한 정적 영역의 세분화는 특히 장면 전환 및 대상 출몰 시 성능을 향상시킬 수 있다.In addition, as shown in FIG. 21, five fields may be distributed symmetrically in the past and the present with respect to the pixel X in the current frame C. FIG. The static region can be subdivided into three categories: forward static (static for previous frame), reverse static (static for next frame), or bidirectional static (if both forward and reverse criteria are met). This segmentation of static regions can improve performance, especially during scene transitions and target appearances.

슬로우-모션 영역Slow-motion area

일정 화소의 휘도 값이 정적 영역으로 지정되는 기준을 만족하지 않지만 슬로우-모션 영역으로 지정되는 기준을 만족하는 경우에, 모션-맵에서 슬로우-모션 영역으로 간주될 수 있다. 아래의 식 (29) 는 슬로우-모션 영역을 결정하는데 이용될 수 있는 기준을 정의한다. 도 22 를 참조하면, 식 (29) 에서 식별된 화 소 Ia, Ic, Ja, Jc, Ka, Kc, La, Lc, P 및 N 의 로케이션이 화소 X 를 중심으로 하는 애퍼처에 도시되어 있다. 그 애퍼처는 현재 필드 (C) 의 3×7 화소 근방 및 다음 필드 (N) 와 이전 필드 (P) 의 3×5 근방을 포함한다. 화소 X 는, 상기 기재된 정적 영역에 대한 기준을 만족하지 않고 애퍼처 내의 화소가 다음 식 (29) 에 도시된 기준을 만족하는 경우에, 슬로우-모션 영역의 일부로서 간주된다.When the luminance value of a certain pixel does not satisfy the criteria designated as the static region but satisfies the criteria designated as the slow-motion region, it may be regarded as a slow-motion region in the motion-map. Equation (29) below defines the criteria that can be used to determine the slow-motion region. Referring to Fig. 22, the locations of pixels Ia, Ic, Ja, Jc, Ka, Kc, La, Lc, P, and N identified in equation (29) are shown in the aperture centered on pixel X. The aperture includes a neighborhood of 3x7 pixels of the current field C and a neighborhood of 3x5 of the next field N and the previous field P. FIG. The pixel X is regarded as part of the slow-motion region when the pixel in the aperture does not satisfy the criteria for the static region described above and satisfies the criteria shown in the following equation (29).

여기서, T ₂ 는 임계값이고,Where T ₂ is the threshold,

은 각각 화소 Ia, Ic, Ja, Jc, Ka, Kc, La, Lc, P 및 N 에 대한 휘도 값이다.

Are the luminance values for pixels Ia, Ic, Ja, Jc, Ka, Kc, La, Lc, P and N, respectively.

또한, 임계값 T ₂ 는 특정 값으로 미리 결정 및 설정되거나, 디인터레이싱 이외의 프로세스에 의해 결정 및 (예를 들어, 디인터레이싱되는 비디오에 대한 메타데이터로서) 제공되거나, 또는 디인터레이싱 동안에 동적으로 결정될 수 있다.The threshold T ₂ may also be predetermined and set to a specific value, provided by a process other than deinterlacing (eg, as metadata for the deinterlaced video), or determined dynamically during deinterlacing.

그 에지 검출 능력의 각도때문에, 일 필터는 수평인 (예를 들어, 수직으로 정렬된 것으로부터 45°보다 큰) 에지를 블러링할 수 있다는 것에 주목해야 한다. 예를 들어, 도 22 에 도시된 애퍼처 (필터) 의 에지 검출 능력은 화소 A 및 F 또는 C 및 D 에 의해 형성된 각도에 의해 영향을 받는다. 최적으로 보간되지 않을 그러한 각도보다 수평인 임의의 에지 및 이에 따른 계단형 아티팩트가 그 에지에서 나타날 수도 있다. 몇몇 양태에서, 슬로우-모션 카테고리는 2 개의 하위 카테고리로 나뉠 수 있다. "수평 에지" 및 "기타" 는 이러한 에지 검출 효 과를 설명한다. 슬로우-모션 화소는, 아래에 도시된 식 (30) 의 기준을 만족하면 수평 에지로서 분류되고, 식 (30) 의 기준을 만족하지 못하면 소위 "기타" 카테고리로서 분류될 수 있다.It should be noted that because of the angle of its edge detection capability, one filter can blur edges that are horizontal (eg, greater than 45 ° from vertically aligned). For example, the edge detection capability of the aperture (filter) shown in FIG. 22 is influenced by the angle formed by the pixels A and F or C and D. Any edge that is horizontal than such an angle that will not be optimally interpolated, and therefore stepped artifacts, may appear at that edge. In some aspects, the slow-motion category can be divided into two subcategories. "Horizontal edge" and "other" describe this edge detection effect. The slow-motion pixel may be classified as a horizontal edge if it satisfies the criterion of equation (30) shown below, and may be classified as a so-called "other" category if it does not satisfy the criterion of equation (30).

여기서, T ₃ 는 임계값이고, LA, LB, LC, LD, LE, 및 LF 는 화소 A, B, C, D, E, 및 F 의 휘도 값이다.Here, T ₃ is a threshold value, and LA, LB, LC, LD, LE, and LF are luminance values of pixels A, B, C, D, E, and F.

수평 에지 및 기타 카테고리 각각에 대해 상이한 보간 방법이 이용될 수 있다.Different interpolation methods may be used for each of the horizontal edges and other categories.

패스트-모션 영역Fast-motion zone

정적 영역에 대한 기준 및 슬로우-모션 영역에 대한 기준을 만족하지 않으면, 화소는 패스트-모션 영역에 있는 것으로 간주될 수 있다.If the criteria for the static region and the criteria for the slow-motion region are not satisfied, the pixel may be considered to be in the fast-motion region.

선택된 프레임에서 화소를 분류한 다음에, 프로세스 A (도 26) 는 블록 2604 로 진행하여, 모션 세기 맵에 기초하여 임시 디인터레이싱된 프레임을 생성한다. 본 양태에서, Wmed 필터 (2054; 도 20) 는 선택된 필드 및 필요한 인접 필드(들)를 필터링하여 다음과 같이 정의될 수 있는 후보 풀-프레임 이미지 F ₀ 를 제공한다.After classifying the pixels in the selected frame, process A (FIG. 26) proceeds to block 2604 to generate a temporary deinterlaced frame based on the motion intensity map. In this aspect, the Wmed filter 2054 (FIG. 20) filters the selected field and the necessary adjacent field (s) to provide a candidate full-frame image F ₀ that can be defined as follows.

여기서, α _i (i = 0, 1, 2, 3) 는 다음과 같이 계산되는 정수 가중치이다. _{Where, α i (i = 0,} 1, 2, 3) is a constant weight is calculated as follows:

도 20 의 하부에 도시된 바와 같이, Wmed 필터링되는 임시 디인터레이싱된 프레임은 모션 추정 및 모션 보상 프로세스와 관련하여 후속 처리를 위해 제공된다.As shown at the bottom of FIG. 20, the Wmed filtered temporary deinterlaced frame is provided for subsequent processing in connection with the motion estimation and motion compensation process.

식 (31) 에 도시되며 상술한 바와 같이, 정적 보간은 필드 간 보간을 포함하고, 슬로우-모션 및 패스트-모션 보간은 필드 내 보간을 포함한다. 동일 패리티 필드의 시간 (예를 들어, 필드 간) 보간이 바람직하지 않은 일정 양태에서, 시간 보간은 임계값 T₁ (식 (4) 내지 식 (6)) 을 0 (T₁ = 0) 으로 설정함으로써 "디스에이블" 될 수 있다. 시간 보간이 디스에이블된 상태에서 현재 필드를 처리함으로써, 모션-레벨 맵의 어떤 영역도 정적으로서 분류하지 않게 되고, Wmed 필터 (2054; 도 20) 는 현재 필드 및 2 개의 인접한 넌-패리티 필드에 대해 작용하는 도 22 의 애퍼처에 도시된 3 개의 필드를 이용한다.As shown in equation (31) and described above, static interpolation includes interfield interpolation, and slow-motion and fast-motion interpolation include intrafield interpolation. In certain aspects in which time (eg, inter-field) interpolation of the same parity field is undesirable, time interpolation sets the threshold T ₁ (Equations (4) to (6)) to 0 ( T ₁ = 0). Can be "disabled" by doing so. By processing the current field with temporal interpolation disabled, no area of the motion-level map is classified as static, and the Wmed filter 2054 (FIG. 20) is applied to the current field and two adjacent non-parity fields. Three fields shown in the aperture of FIG. 22 are used.

디노이징Denoising

일정 양태에서, 디노이저는 모션 보상 정보를 이용하여 더 처리되기 전에 후보 Wmed 프레임으로부터 잡음을 제거하는데 이용될 수 있다. 디노이저는 Wmed 필터에 존재하는 잡음을 제거할 수 있고, 신호의 주파수 콘텐츠에 관계없이 존재하는 신호를 유지할 수 있다. 웨이브렛 필터를 비롯한 다양한 타입의 디노이징 필터가 이용될 수 있다. 웨이브렛은 공간 영역과 스케일링 영역 모두에서 소정의 신호를 로컬화하는데 이용되는 함수 클래스이다. 웨이브렛의 기초가 되는 사상은, 웨이브렛 표현의 작은 변화가 그에 따라 원래 신호의 작은 변화를 생성하도록 상이한 스케일 또는 해상도로 신호를 분석하는 것이다.In some aspects, the denoiser may be used to remove noise from the candidate Wmed frame before further processing using the motion compensation information. The denoiser can remove the noise present in the Wmed filter and maintain the existing signal regardless of the frequency content of the signal. Various types of denoising filters can be used, including wavelet filters. Wavelets are function classes used to localize a given signal in both spatial and scaling domains. The idea underlying a wavelet is to analyze the signal at different scales or resolutions such that a small change in the wavelet representation thus produces a small change in the original signal.

몇몇 양태에서, 디노이징 필터는 (4, 2) 배직교 (bi-orthogonal) 3 차 B-스플라인 웨이브렛 필터의 일 양태에 기초한다. 그러한 일 필터는 다음 순방향 및 역방향 변환에 의해 정의될 수 있다.In some embodiments, the denoising filter is based on one aspect of a (4, 2) bi-orthogonal tertiary B-spline wavelet filter. One such filter can be defined by the following forward and inverse transforms.

및And

디노이징 필터를 적용함으로써, 잡음 환경에서 모션 보상의 정확도를 증가시킬 수 있다. 비디오 시퀀스의 잡음은 가산 백색 가우스인 것으로 가정한다. 추정된 잡음의 분산은

로 표시된다. 이는 최고 주파수 부대역 계수를 0.6745 로 나눈 값의 평균 절대 편차 (median absolute deviation) 로서 추정될 수 있다. 그러한 필터의 구현은 "Ideal spatial adaptation by wavelet shrinkage," D.L. Donoho and I.M. Johnstone, Biometrika, vol. 8, pp. 425-455, 1994 에 더 설명되어 있으며, 이는 본원에서 그 전체를 참조로서 병합하고 있다.By applying a denoising filter, the accuracy of motion compensation in a noisy environment can be increased. The noise in the video sequence is assumed to be an additive white gauss. The variance of the estimated noise is

Is displayed. This can be estimated as the median absolute deviation of the highest frequency subband coefficient divided by 0.6745. Implementations of such filters are described in “Ideal spatial adaptation by wavelet shrinkage,” DL Donoho and IM Johnstone, Biometrika , vol. 8, pp. 425-455, 1994, which is hereby incorporated by reference in its entirety.

또한, 웨이브렛 축소 또는 웨이브렛 위너 필터가 디노이저로서 적용될 수 있다. 웨이브렛 축소 디노이징은 웨이브렛 변환 영역에서의 축소를 수반할 수 있고, 통상 3 개의 단계를 포함한다: 선형 순방향 웨이브렛 변환, 비선형 축소 디노이징, 및 선형 역방향 웨이브렛 변환. 위너 필터는 가산 잡음 및 블러링에 의해 열화된 이미지를 개선하는데 이용될 수 있는 MSE-최적 선형 필터이다. 그러한 필터는 당해 기술분야에서 널리 공지되어 있으며, 예를 들어, 상기 참조된 "Ideal spatial adaptation by wavelet shrinkage," 및 S.P.Ghael, A.M.Sayeed, 및 R.G.Baraniuk 에 의한 "Improvement Wavelet denoising via empirical Wiener filtering," Proceedings of SPIE, vol 3169, pp. 389-399, San Diego, July 1997 에 설명되어 있다.In addition, wavelet reduction or wavelet winner filters may be applied as the denoiser. Wavelet reduction denoising may involve reduction in the wavelet transform region, and typically includes three steps: linear forward wavelet transform, nonlinear reduced denoising, and linear reverse wavelet transform. The Wiener filter is an MSE-optimal linear filter that can be used to improve images degraded by additive noise and blurring. Such filters are well known in the art and described, for example, "Ideal spatial adaptation by wavelet shrinkage," and "Improvement Wavelet denoising via empirical Wiener filtering," Proceedings of SPIE , by SPGhael, AMSayeed, and RGBaraniuk. , vol 3169, pp. 389-399, San Diego, July 1997.

모션 보상Motion compensation

도 27 을 참조하면, 블록 2702 에서, 프로세스 B 는 양방향 모션 추정을 수행하고, 그 다음에, 블록 104 에서 모션 추정치를 이용하여 모션 보상을 수행하는데, 이는 도 20 에 더 도시되어 있으며 아래의 예시적인 양태에서 설명된다. Wmed 필터와 모션 보상 기반 디인터레이서 사이에는 일 필드의 "지연 (lag)" 이 존재한다. 도 23 에 도시된 바와 같이, 현재 필드 "C" 의 "잃어버린" 데이터 (화소 데이터의 원래가 아닌 (non-original) 행) 에 대한 모션 보상 정보는 이전 프레임 "P" 와 다음 프레임 "N" 모두의 정보로부터 예측된다. 현재 필드 (도 23) 에서, 실선은 원래 화소 데이터가 나가는 행을 표현하고, 점선은 Wmed-보간 화소 데이터가 나가는 행을 표현한다. 몇몇 양태에서, 모션 보상은 4 행 × 8 열 화소 근방에서 수행된다. 그러나, 이러한 화소 근방은 설명을 위한 일 예이며, 당업자라면, 다른 양태에서는 상이한 개수의 행과 상이한 개수의 열을 포함하는 화소 근방에 기초하여 모션 보상을 수행할 수도 있고, 그 선택은 예를 들어, 계산 속도, 이용 가능한 처리 전력, 또는 디인터레이싱되는 멀티미디어 데이터의 특징을 비롯한 다수의 인자에 기초할 수 있다는 것을 알 수 있다. 현재 필드가 그 행의 절반만을 가지므로, 실제 매칭되는 4 개의 행은 8 화소 × 8 화소 영역에 대응한다.Referring to FIG. 27, at block 2702, process B performs bidirectional motion estimation, and then at block 104 performs motion compensation using the motion estimate, which is further illustrated in FIG. 20 and illustrated below. It is described in the aspect. There is a field of "lag" between the Wmed filter and the motion compensation based deinterlacer. As shown in Fig. 23, the motion compensation information for the "lost" data (non-original rows of pixel data) of the current field "C" includes both the previous frame "P" and the next frame "N". Is predicted from the information. In the current field (FIG. 23), the solid line represents the row where the original pixel data exits, and the dotted line represents the row where the Wmed-interpolated pixel data exits. In some aspects, motion compensation is performed near four rows by eight columns of pixels. However, such a pixel neighborhood is an example for explanation, and those skilled in the art may perform motion compensation based on the pixel neighborhood including a different number of rows and a different number of columns in another aspect, and the selection is for example. It can be appreciated that a number of factors can be based on calculation speed, available processing power, or features of the deinterlaced multimedia data. Since the current field has only half of that row, the four rows that actually match correspond to an 8 pixel by 8 pixel area.

도 20 을 참조하면, 양방향 ME/MC (2068) 는 SSE (sum of squared error) 를 이용하여, 디인터레이싱된 현재 프레임 (2070) 및 Wmed 다음 프레임 (2058) 에 대한 Wmed 현재 프레임 (2060) 에 대해 예측하는 블록과 예측된 블록 간의 유사도를 측정할 수 있다. 그 다음에, 모션 보상된 현재 프레임 (2066) 의 생성은 가장 유사한 매칭 블록으로부터의 화소 정보를 이용하여 원래 화소 라인 사이에 잃어버린 데이터를 채운다. 몇몇 양태에서, 양방향 ME/MC (2068) 는 디인터레이싱된 이전 프레임 (2070) 정보로부터의 화소 정보에 더 많은 가중치를 주거나 바이어싱하는데, 그 이유는, 그 프레임은 모션 보상 정보 및 Wmed 정보에 의해 생성되었지만, Wmed 다음 프레임 (2058) 은 공간-시간 필터링에 의해 디인터레이싱될 뿐이기 때문이다.Referring to FIG. 20, the bidirectional ME / MC 2068 predicts for the Wmed current frame 2060 for the deinterlaced current frame 2070 and the Wmed next frame 2058 using sum of squared error (SSE). The similarity between the block and the predicted block can be measured. The generation of the motion compensated current frame 2066 then fills in the missing data between the original pixel lines using the pixel information from the most similar matching block. In some aspects, the bidirectional ME / MC 2068 weights or biases pixel information from the deinterlaced previous frame 2070 information, because the frame is generated by motion compensation information and Wmed information. This is because the Wmed next frame 2058 is only deinterlaced by space-time filtering.

몇몇 양태에서는, 유사한 루마 영역을 갖지만 상이한 크로마 영역을 갖는 필드의 영역에서 매칭 성능을 개선하기 위해, (예를 들어, 하나의 4 행 × 8 열 루마 블록과 같은) 화소들 중 하나 이상의 루마 그룹 및 (예를 들어, 2 개의 2 행 × 4 열 크로마 블록 U 및 V 와 같은) 화소들 중 하나 이상의 크로마 그룹의 화소 값의 기여를 포함하는 메트릭을 이용할 수 있다. 그러한 접근법은 컬러에 민감한 영역에서의 불일치를 효과적으로 줄인다.In some aspects, one or more luma groups of pixels (eg, such as one 4 row by 8 column luma block) to improve matching performance in areas of a field having similar luma areas but with different chroma areas and A metric can be used that includes the contribution of pixel values of one or more chroma groups of pixels (such as, for example, two two row by four column chroma blocks U and V). Such an approach effectively reduces mismatch in color sensitive areas.

모션 벡터 (MV) 는 수직 차원으로 1/2 화소의 입도 (granularity) 및, 수평 방향으로 1/2 또는 1/4 화소의 입도를 갖는다. 프랙셔널 화소 샘플을 얻기 위해, 보간 필터를 이용할 수 있다. 예를 들어, 절반 화소 샘플을 획득하는데 이용될 수 있는 몇몇 필터는 이중 선형 (bilinear) 필터 (1, 1), H.263/AVC:의 권고 사항인 보간 필터 (1, -5, 20, 20, -5, 1), 및 6-탭 (six-tap) 해밍 윈도우드 sinc 함수 필터 (3, -21, 147, 147, -21, 3) 를 포함한다. 1/4-화소 샘플은 이중 선형 필터를 적용함으로써 전체 화소와 절반 화소 샘플로부터 생성될 수 있다.The motion vector MV has a granularity of 1/2 pixel in the vertical dimension and a particle size of 1/2 or 1/4 pixel in the horizontal direction. To obtain fractional pixel samples, interpolation filters can be used. For example, some filters that can be used to obtain half pixel samples are interlinear filters (1, -5, 20, 20), which are recommended by H.263 / AVC: bilinear filters (1, 1). , -5, 1), and a six-tap Hamming window sinc function filter (3, -21, 147, 147, -21, 3). The quarter-pixel samples can be generated from full and half pixel samples by applying a dual linear filter.

몇몇 양태에서, 모션 보상은 다양한 타입의 검색 프로세스를 이용하여, 다른 프레임 (예를 들어, 다음 프레임 또는 이전 프레임) 내의 상이한 위치에 있는 대응 데이터에 대해 현재 프레임의 일정 위치에 있는 데이터 (예를 들어, 대상을 나타내는 데이터) 를 일치시킬 수 있고, 각각의 프레임 내의 위치 차이는 대상의 모션을 나타낸다. 예를 들어, 검색 프로세스는 보다 큰 검색 영역을 커버할 수도 있는 풀 모션 검색을 이용하거나, 보다 적은 화소를 이용할 수 있는 고속 모션 검색을 이용하고/하거나 검색 패턴에서 이용되는 선택된 화소는 예를 들어, 다이아몬드 형상과 같이 특정 형상을 가질 수 있다. 고속 모션 검색의 경우에, 검색 영역은 모션 추정치, 또는 모션 후보를 중심으로 할 수 있으며, 이는 인접 프레임을 검색하기 위한 시작 포인트로서 이용될 수 있다. 몇몇 양태에서, MV 후보는 외부의 모션 추정기로부터 생성되어 디인터레이서에 제공될 수 있다. 또한, 이전 모션 보상된 인접 프레임에서 대응하는 근방으로부터의 매크로블록의 모션 벡터가 모션 추정치로부터 이용될 수 있다. 몇몇 양태에서, MV 후보는 대응하는 이전 및 다음 프레임의 매크로블록 (예를 들어, 3 매크로블록 × 3 매크로블록) 의 근방을 검색함으로써 생성될 수 있다.In some aspects, motion compensation uses various types of retrieval processes to provide data (eg, at some location in the current frame) with respect to corresponding data at different locations within another frame (eg, next frame or previous frame). , Data representing the object), and the positional difference in each frame represents the motion of the object. For example, the search process may use a full motion search that may cover a larger search area, a fast motion search that may use fewer pixels, and / or a selected pixel used in a search pattern, for example. It may have a specific shape, such as a diamond shape. In the case of a fast motion search, the search region can be centered around a motion estimate, or motion candidate, which can be used as a starting point for searching for adjacent frames. In some aspects, MV candidates may be generated from an external motion estimator and provided to a deinterlacer. In addition, the motion vector of the macroblock from the corresponding neighborhood in the previous motion compensated adjacent frame may be used from the motion estimate. In some aspects, the MV candidate can be generated by searching for the vicinity of the macroblocks (eg, 3 macroblocks x 3 macroblocks) of the corresponding previous and next frames.

도 24 는 2 개의 MV 맵인 MV_P 및 MV_N 의 일 예를 도시한 것으로서, 이 2 개의 MV 맵은 도 23 에 도시된 바와 같이 이전 프레임과 다음 프레임의 근방을 검색함으로써 모션 추정/보상 동안에 생성될 수 있다. MV_P 와 MV_N 모두에 있어서, 모션 정보를 결정하기 위해 처리될 블록은 "X" 로 표시되는 중심 블록이다. MV_P 와 MV_N 모두에 있어서, 처리 중인 현재 블록 X 의 모션 추정 동안에 이용될 수 있는 9 개의 MV 후보가 존재한다. 본 예에서, MV 후보 중 4 개의 MV 후보는 이전에 수행된 모션 검색으로부터 동일 필드에 존재하고, MV_P 및 MV_N 내의 보다 밝은 컬러의 블록으로 표시된다 (도 24). 보다 어두운 컬러의 블록으로 표시되는 5 개의 다른 MV 후보는 이전에 처리된 프레임의 모션 정보 (또는 맵) 로부터 복제되었다.FIG. 24 shows an example of two MV maps, MV _P and MV _N , which are generated during motion estimation / compensation by searching for the vicinity of the previous frame and the next frame as shown in FIG. Can be. For both MV _P and MV _N , the block to be processed to determine the motion information is the center block indicated by "X". For both MV _P and MV _N , there are nine MV candidates that can be used during motion estimation of the current block X being processed. In this example, four of the MV candidates are in the same field from the previously performed motion search and are represented by brighter colored blocks in MV _P and MV _N (FIG. 24). Five other MV candidates, represented by darker colored blocks, were duplicated from the motion information (or map) of the previously processed frame.

모션 추정/보상이 완료된 후에, 2 개의 보간 결과는 잃어버린 행 (도 23 에서 점선으로 표시됨) 으로부터 일어날 수도 있다: Wmed 필터에 의해 생성된 일 보간 결과 (Wmed 현재 프레임 (2060) 도 20), 및 모션 보상기의 모션 추정 처리에 의해 생성된 일 보간 결과 (MC 현재 프레임 (2066)). 통상, 결합기 (2062) 는 Wmed 현재 프레임 (2060) 과 MC 현재 프레임 (2066) 의 적어도 일부를 이용함으로써 Wmed 현재 프레임 (2060) 과 MC 현재 프레임 (2066) 을 병합하여, 현재 디인터레이싱된 프레임 (2064) 을 생성한다. 그러나, 일정 조건 하에서, 결합기 (2062) 는 현재 프레임 (2060) 또는 MC 현재 프레임 (2066) 중 하나만을 이용하여 현재 디인터레이싱된 프레임을 생성할 수도 있다. 일 예에서, 결합기 (2062) 는 Wmed 현재 프레임 (2060) 과 MC 현재 프레임 (2066) 을 병합하여, 식 (36) 에 도시된 것과 같은 디인터레이싱된 출력 신호를 생성한다.After motion estimation / compensation is completed, two interpolation results may arise from the missing row (indicated by the dashed line in FIG. 23): one interpolation result generated by the Wmed filter (Wmed current frame 2060 FIG. 20), and motion One interpolation result generated by the motion estimation process of the compensator (MC current frame 2066). Typically, the combiner 2062 merges the Wmed current frame 2060 and the MC current frame 2066 by using at least a portion of the Wmed current frame 2060 and the MC current frame 2066, so that the currently deinterlaced frame 2064 Create However, under certain conditions, the combiner 2062 may generate the current deinterlaced frame using only either the current frame 2060 or the MC current frame 2066. In one example, combiner 2062 merges Wmed current frame 2060 and MC current frame 2066 to produce a deinterlaced output signal as shown in equation (36).

여기서,

은 ^t 가 전치 행렬인 위치

에 있는 필드

에서의 휘도 값을 위해 이용된다. 다음과 같이 정의된 클립 함수를 이용하면,here,

Is a position where ^t is a transpose matrix

Field in

Is used for the luminance value at. Using the clip function defined as

k ₁ 은 다음과 같이 계산될 수 있다. k ₁ Can be calculated as follows.

여기서, C ₁ 은 강건성 파라미터 (robustness parameter) 이고, Diff 는 (기존 필드로부터 선택된) 예측된 프레임 내의 이용 가능한 화소와 예측하는 프레임 화소 간의 루마이다. C ₁ 을 적절히 선택함으로써, 평균 제곱 에러의 상대 중요도를 조정하는 것이 가능하다. k ₂ 는 식 (39) 에 도시된 것과 같이 계산될 수 있다.Where C ₁ is a robustness parameter and Diff is a luma between the available pixels in the predicted frame (selected from existing fields) and the predicting frame pixels. By appropriately selecting C ₁ , it is possible to adjust the relative importance of the mean squared error. k ₂ can be calculated as shown in equation (39).

여기서,

는 모션 벡터이고, δ 는 0 으로 나누는 것을 방지하기 위한 작은 상수이다. 또한, 필터링을 위한 클립핑 함수를 이용하여 디인터레이싱하는 것은 "De-interlacing of video data," G.D.Haan and E.B.Bellers, IEEE Transactions on Consumer Electronics, Vol.43, NO.3, pp.819- 825, 1997 에 설명되어 있으며, 그 전체를 본원에서 참조로서 병합하고 있다.here,

Is a motion vector and δ is a small constant to prevent division by zero. Deinterlacing using a clipping function for filtering is also described in "De-interlacing of video data," GDHaan and EBBellers, IEEE Transactions on Consumer Electronics, Vol. 43, NO.3, pp.819-825, 1997. The entirety of which is incorporated herein by reference.

몇몇 양태에서, 결합기 (2062) 는 높은 PSNR 및 강건한 결과를 달성하기 위해 다음 식을 시도 및 유지하도록 구성될 수 있다.In some aspects, coupler 2062 can be configured to try and maintain the following equation to achieve high PSNR and robust results.

Wmed + MC 디인터레이싱 방식을 이용하여 필드 간 보간을 포함한 디인터레이싱 예측 방식을 필드 내 보간으로부터 분리하는 것이 가능하다. 즉, 공간-시간 Wmed 필터링은 필드 내 보간을 위해 주로 이용될 수 있지만, 필드 내 보간은 모션 보상 동안에 수행될 수 있다. 이로 인해, Wmed 결과의 피크 신호대 잡음비가 감소하지만, 모션 보상 후의 시각 품질은 더 만족스럽게 되는데, 그 이유는 정확하지 않은 필드 간 예측 모드 결정에 따라 불량 화소가 Wmed 필터링 프로세스로부터 제거될 것이기 때문이다.It is possible to separate the de-interlacing prediction scheme, including inter-field interpolation, from the inter-field interpolation using the Wmed + MC deinterlacing scheme. That is, space-time Wmed filtering can be used primarily for intrafield interpolation, while intrafield interpolation can be performed during motion compensation. This reduces the peak signal-to-noise ratio of the Wmed result, but makes the visual quality after motion compensation more satisfactory, because bad pixels will be removed from the Wmed filtering process as a result of inaccurate interfield prediction mode determination.

크로마 처리는 연관된 루마 처리와 양립할 수 있다. 모션 맵 생성의 관점에서, 크로마 화소의 모션 레벨은 4 개의 연관된 루마 화소의 모션 레벨을 관측함으로써 얻어진다. 그 동작은 보팅 (voting) 에 기초할 수 있다 (크로마 모션 레벨은 지배적인 루마 모션 레벨을 차용함). 그러나, 다음과 같은 통상적인 접근법을 이용할 것을 제안한다. 4 개의 루마 화소 중 어느 하나가 패스트 모션 레벨을 갖는 경우에, 크로마 모션 레벨은 패스트 모션일 것이다; 한편, 4 개의 루마 화소 중 어느 하나가 슬로우 모션 레벨을 갖는 경우에, 크로마 모션 레벨은 슬로우 모션일 것이다; 그 밖의 경우에는, 크로마 모션 레벨은 정적이다. 통상적인 접근법은 최고 PSNR 을 달성하지 못할 수도 있지만, 크로마 모션 레벨에 모호성이 존재하는 경우에는 언제나 INTER 예측을 이용하는 위험을 회피하게 된다.Chroma processing may be compatible with associated luma processing. In terms of motion map generation, the motion level of the chroma pixel is obtained by observing the motion level of the four associated luma pixels. The operation can be based on voting (the chroma motion level borrows the dominant luma motion level). However, we suggest using the following conventional approach. If any one of the four luma pixels has a fast motion level, the chroma motion level will be fast motion; On the other hand, if any one of the four luma pixels has a slow motion level, the chroma motion level will be slow motion; In other cases, the chroma motion level is static. The conventional approach may not achieve the highest PSNR, but avoids the risk of using INTER prediction whenever there is ambiguity in the chroma motion level.

멀티미디어 데이터 시퀀스는 본원에서 설명된 Wmed 알고리즘만을 이용하거나, Wmed 알고리즘과 모션 보상된 알고리즘을 함께 이용하여 디인터레이싱되었다. 또한, 동일 멀티미디어 데이터 시퀀스는 화소 블렌딩 (또는 평균화) 알고리즘 및 "디인터레이싱 없음 (no-deinterlacing)" 케이스를 이용하여 디인터레이싱되었고, 이때, 필드는 어떤 보간이나 블렌딩 없이 단순히 결합되었다. 그 결과로서 생성된 프레임을 분석하여, PSNR 을 결정하였고, 다음 테이블에 도시되어 있다.The multimedia data sequences were deinterlaced using only the Wmed algorithm described herein, or using the Wmed algorithm and a motion compensated algorithm together. In addition, the same multimedia data sequence was deinterlaced using a pixel blending (or averaging) algorithm and a "no-deinterlacing" case, where the fields were simply combined without any interpolation or blending. The resulting frame was analyzed to determine the PSNR, which is shown in the following table.

PSNR
(dB)
시퀀스PSNR
(dB)
sequence 디인터레이싱
없음De-Interlacing
none 블렌딩Blending WmedWmed Wmed +
MCWmed +
MC soccersoccer 8.9551948.955194 11.3821511.38215 19.2622119.26221 19.5052819.50528 citycity 11.6418311.64183 12.9398112.93981 15.0330315.03303 15.0985915.09859 crewcrew 13.3243513.32435 15.6638715.66387 22.3650122.36501 22.5877722.58777

Wmed 에 더해 MC 를 이용하여 디인터레이싱함으로써 한계 개선 (marginal improvement) 만이 존재하더라도, Wmed 와 MC 보간 결과를 조합함으로써 생성되는 디인터레이싱된 이미지의 시각 품질은 시각적으로 더 만족스럽게 되는데, 그 이유는, 상술한 바와 같이, Wmed 결과와 MC 결과를 조합함으로써 짝수 필드와 홀수 필드 간의 에일리어스 및 잡음을 억제하기 때문이다.Although there is only marginal improvement by deinterlacing with MC in addition to Wmed, the visual quality of the deinterlaced image produced by combining Wmed and MC interpolation results is visually more satisfactory, because, as described above, Similarly, combining Wmed and MC results suppresses aliasing and noise between even and odd fields.

몇몇 리샘플링 양태에서는, 영상 크기 리사이징을 위해 다상 (poly-phase) 리샘플러가 구현된다. 다운샘플링의 일 예에서, 원래 영상과 리사이징된 영상 간의 비율은 p/q 일 수 있고, 여기서, p 와 q 는 서로 소인 정수이다. 총 위상 수는 p 이다. 몇몇 양태에서, 다상 필터의 차단 주파수는 약 0.5 의 리사이징 계수에 대해 0.6 이다. 리사이징된 시퀀스의 고주파수 응답을 높이기 위해, 차단 주파수가 리사이징 비율과 정확히 일치하는 것은 아니다. 이는 필연적으로 약간의 에일리어싱을 허용한다. 그러나, 사람의 눈이 흐리고 에일리어싱이 없는 영상보다 뚜렷하지만 약간의 에일리어싱이 있는 영상을 선호한다는 것이 널리 공지되어 있다.In some resampling aspects, a poly-phase resampler is implemented for image size resizing. In one example of downsampling, the ratio between the original image and the resized image may be p / q , where p and q are integers that are prime to each other. The total number of phases is p . In some embodiments, the cutoff frequency of the polyphase filter is 0.6 for a resizing coefficient of about 0.5. To increase the high frequency response of the resized sequence, the cutoff frequency does not exactly match the resizing rate. This inevitably allows some aliasing. However, it is well known that humans prefer a clear but slightly aliased image over a blurred, aliased image.

도 42 는 리사이징 비율 3/4 인 경우의 위상을 도시하는 다상 리샘플링의 일 예를 도시한다. 또한, 도 42 에 도시된 차단 주파수는 3/4 이다. 원래 화소는 상기 도 42 에서 수직축으로 도시되어 있다. 또한, 그 수직축을 중심으로 sinc 함수를 도시하여 필터 파형을 표현한다. 리샘플링 비율과 정확히 동일하도록 차단 주파수를 선택하기 때문에, sinc 함수의 제로는 리사이징 후의 화소 위치와 겹치게 되며, 이는 도 42 에 교차점으로 도시되어 있다. 리사이징 후의 화소 값을 구하기 위해, 다음 식에 도시된 것과 같이 원래 화소로부터의 기여분을 합계할 수 있다.42 shows an example of polyphase resampling showing the phase when the resizing ratio is 3/4. In addition, the cutoff frequency shown in FIG. 42 is 3/4. The original pixel is shown in the vertical axis in FIG. 42 above. In addition, the filter waveform is represented by showing the sinc function about its vertical axis. Since the cutoff frequency is chosen to be exactly the same as the resampling ratio, zero of the sinc function overlaps the pixel position after resizing, which is shown by the intersection in FIG. To obtain the pixel value after resizing, the contributions from the original pixel can be summed as shown in the following equation.

여기서, f _c 는 차단 주파수이다. 상기 1-D 다상 필터는 수평 차원과 수직 차원 모두에 적용될 수 있다.Where f _c is the cutoff frequency. The 1-D polyphase filter can be applied to both horizontal and vertical dimensions.

리샘플링 (리사이징) 의 다른 양태는 오버스캔 (overscan) 을 설명하고 있다. NTSC 텔레비전 신호에 있어서, 일 이미지는 486 개의 스캔 라인을 갖고, 디지털 비디오에 있어서, 각각의 스캔 라인마다 720 개의 화소를 가질 수 있다. 그러나, 크기와 화면 포맷 간의 불일치 때문에 텔레비전에서 전체 이미지의 모두를 볼 수 있는 것은 아니다. 볼 수 없는 이미지의 일부가 오버스캔으로 지칭된다.Another aspect of resampling (resizing) describes overscan. In an NTSC television signal, one image may have 486 scan lines, and in digital video, each pixel may have 720 pixels. However, because of the discrepancy between size and screen format, not all of the entire image can be seen on a television. The part of the image that is not visible is referred to as overscan.

가능한 한 많은 텔레비전이 볼 수 있는 영역에 유용한 정보를 브로드캐스터가 제공하는 것을 돕기 위해, SMPTE (Society of Motion Picture & Television Engineers) 는 세이프 액션 (safe action) 영역과 세이프 타이틀 (safe title) 영역으로 지칭되는 액션 프레임의 특정 크기를 규정하였다. Specifications for Safe Action and Safe Title Areas Test Pattern for Television Systems 에 대한 SMPTE RP (recommended practice) 27.3-1989 참조. SMPTE 에 의해 세이프 액션 영역은 "모든 중요한 액션이 일어나야 하는" 영역으로서 규정된다. 세이프 타이틀 영역은 "모든 유용한 정보가 다수의 가정용 텔레비전 수신기에 대한 가시성을 확보하도록 한정될 수 있는" 영역으로서 규정된다. 예를 들어, 도 43 에 도시된 바와 같이, 세이프 액션 영역 (4310) 은 화면 중심 90% 를 점유하여, 주변 모두에 5% 경계를 준다. 세이프 타이틀 영역 (4305) 는 화면의 중심 80% 를 점유하여, 10% 경계를 준다.To help broadcasters provide useful information in as many television-viewable areas as possible, SMPTE (Society of Motion Picture & Television Engineers) is referred to as the safe action area and the safe title area. The specific size of the action frame to be defined. See SMPTE RP (recommended practice) 27.3-1989 for Specifications for Safe Action and Safe Title Areas Test Pattern for Television Systems. The SMPTE defines the safe action area as the "where all important actions must take place". The safe title area is defined as an area where "all useful information can be defined to ensure visibility for multiple home television receivers." For example, as shown in FIG. 43, the safe action area 4310 occupies 90% of the screen center, giving a 5% boundary to all surroundings. The safe title area 4305 occupies 80% of the center of the screen, giving a 10% border.

도 44 를 참조하면, 세이프 타이틀 영역이 너무 작아서 그 이미지에 보다 많은 콘텐츠를 추가할 수 없기 때문에, 몇몇 브로드캐스트는 세이프 액션 영역에 텍스트를 포함시킬 것이고, 이는 백색 직사각형 윈도우 (4415) 내부에 있다. 통상, 오버스캔에서는 검은색의 경계를 볼 수 있다. 예를 들어, 도 44 에서, 검은색의 경계는 이미지의 상부 (4420) 와 하부 (4425) 에서 나타난다. 이들 검 은색의 경계를 오버스캔에서 제거할 수 있는데, 그 이유는 H.264 비디오가 모션 추정 시 경계 확장을 이용하기 때문이다. 확장된 검은색의 경계는 레지듀얼 (residual) 을 증가시킬 수 있다. 통상적으로, 경계를 2% 만큼 자를 수 있고, 그 다음에, 리사이징을 행한다. 이에 따라, 리사이징을 위한 필터가 생성될 수 있다. 다상 다운샘플링 전에 오버스캔을 제거하기 위해 잘라 버림 (truncation) 이 수행된다.Referring to FIG. 44, some broadcasts will include text in the safe action area because the safe title area is too small to add more content to the image, which is inside the white rectangular window 4415. Normally, black borders are visible in overscan. For example, in FIG. 44, black borders appear at the top 4420 and bottom 4425 of the image. These black borders can be removed from overscan because H.264 video uses boundary expansion in motion estimation. The expanded black border can increase the residual. Normally, the boundary can be cut by 2% and then resized. Accordingly, a filter for resizing may be generated. Truncation is performed to eliminate overscan before polyphase downsampling.

디블록킹/디링잉Deblocking / Deringing

디블록킹 처리의 일 예에서, 프레임의 경계에 있는 에지 및 디블록킹 필터 프로세스가 디스에이블되는 임의의 에지를 제외하고는, 일 프레임의 모든 4 × 4 블록 에지에 디블록킹 필터가 적용될 수 있다. 이러한 필터링 프로세스는, 프레임 내의 모든 매크로블록이 매크로블록 주소가 증가하는 순서로 처리되는 프레임 구성 프로세스의 완료 후에 매크로블록에 기초하여 수행될 것이다. 각각의 매크로블록에 대해, 처음에 수직 에지가 필터링되고, 좌측에서 우측으로, 그 다음에 수평 에지가 상부에서 하부로 필터링된다. 도 39 에 도시된 바와 같이, 수평 방향과 수직 방향에 대해, 루마 디블록킹 필터 프로세스는 4 개의 16-샘플 에지에 대해 수행되고, 각각의 크로마 성분에 대한 디블록킹 필터 프로세스는 2 개의 8-샘플 에지에 대해 수행된다. 이전 매크로블록에 대한 디블록킹 프로세스 동작에 의해 이미 수정되었을 수도 있는 현재 매크로블록의 위쪽 및 왼쪽에 있는 샘플 값은 현재 매크로블록에 대한 디블록킹 필터 프로세스에 대한 입력으로서 이용될 것이며, 또한 현재 매크로블록의 필터링 동안에 수정될 수도 있다. 수직 에지의 필터링 동안에 수정된 샘플 값은 동일 매크로블록에 대한 수평 에지의 필터링에 대한 입력으로서 이용될 수 있다. 디블록킹 프로세스는 루마와 크로마 성분에 대해 개별적으로 호출될 수 있다.In one example of the deblocking process, a deblocking filter may be applied to all 4 × 4 block edges of a frame except for the edge at the border of the frame and any edge at which the deblocking filter process is disabled. This filtering process will be performed based on the macroblock after completion of the frame construction process in which all macroblocks in the frame are processed in order of increasing macroblock addresses. For each macroblock, the vertical edges are filtered first, left to right, then horizontal edges from top to bottom. As shown in FIG. 39, for the horizontal and vertical directions, the luma deblocking filter process is performed on four 16-sample edges, and the deblocking filter process for each chroma component is two 8-sample edges. Is performed for. Sample values at the top and left of the current macroblock that may have already been modified by the deblocking process operation for the previous macroblock will be used as input to the deblocking filter process for the current macroblock, and also It may be modified during filtering. The sample values modified during the filtering of the vertical edges can be used as input to the filtering of the horizontal edges for the same macroblock. The deblocking process can be called separately for the luma and chroma components.

디링잉 처리의 일 예에 있어서, 2-D 필터는 에지 근방의 영역을 평활화 (smooth out) 하는데 적응적으로 적용될 수 있다. 에지 화소는 블러링을 회피하기 위해 거의 필터링을 받지 않거나 전혀 필터링을 받지 않는다.In one example of the de-ringing process, the 2-D filter may be adaptively applied to smooth out the area near the edge. Edge pixels are rarely filtered or filtered at all to avoid blurring.

GOP 분할기GOP Splitter

이하, GOP 분할기에 포함될 수 있는 적응 GOP 분할, 샷 검출 및 대역폭 맵 생성을 비롯한 처리의 예시적인 실시형태가 설명된다.Hereinafter, exemplary embodiments of processing, including adaptive GOP segmentation, shot detection, and bandwidth map generation, which may be included in a GOP divider, are described.

대역폭 맵 생성Generate Bandwidth Map

인간 시각 품질 V 는 인코딩 복잡도 C 와 할당된 비트 B (또한, 대역폭으로서 지칭됨) 모두의 함수일 수 있다. 도 29 는 이러한 관계를 도시한 그래프이다. 인코딩 복잡도 메트릭 C 는 인간의 시각 관점으로부터의 공간 및 시간 주파수를 고려한다는 것에 주목해야 한다. 인간의 눈에 더 민감한 왜곡의 경우에, 이에 대응하여 복잡도 값이 높아진다. 통상, V 는 C 에서는 단조 감소하고 있으며, B 에서는 단조 증가하고 있는 것으로 가정할 수 있다.Human visual quality V may be a function of both encoding complexity C and allocated bit B (also referred to as bandwidth). 29 is a graph showing this relationship. It should be noted that the encoding complexity metric C takes into account spatial and temporal frequencies from the human visual point of view. In the case of distortion which is more sensitive to the human eye, the complexity value is correspondingly higher. Usually, it can be assumed that V is monotonically decreasing at C and monotonically increasing at B.

일정한 시각 품질을 달성하기 위해, 대역폭 (Bi) 이 i 번째 대상 (프레임 또는 MB) 에 할당되어, 바로 아래의 2 개의 식에서 표현되는 기준을 만족하도록 인코딩된다.In order to achieve a constant visual quality, a bandwidth Bi is assigned to the i th object (frame or MB) and encoded to satisfy the criterion represented in the two equations immediately below.

바로 위에 있는 2 개의 식에 있어서, Ci 는 i 번째 대상의 인코딩 복잡도이고, B 는 총 이용 가능한 대역폭이며, V 는 일 대상에 대해 달성된 시각 품질이다.In the two equations directly above, Ci is the encoding complexity of the i th object, B is the total available bandwidth, and V is the visual quality achieved for one object.

인간 시각 품질을 소정의 식으로서 공식화하는 것은 어렵다. 따라서, 상기 식 세트가 정확하게 정의되는 것은 아니다. 그러나, 3-D 모델이 모든 변수에 있어서 연속인 것으로 가정하면, 대역폭 비율

은 (C, V) 쌍의 근방 내에서 변하지 않는 것으로서 취급될 수 있다. 대역폭 비율 βi 는 아래에 도시된 식에서 정의된다:It is difficult to formulate human visual quality as a predetermined equation. Thus, the equation set is not precisely defined. However, assuming the 3-D model is continuous for all variables, the bandwidth ratio

May be treated as unchanged in the vicinity of the (C, V) pair. The bandwidth ratio βi is defined in the equation shown below:

그 다음에, 비트 할당은 다음 식에서 표현된 것과 같이 정의될 수 있다.Then, the bit allocation can be defined as expressed in the following equation.

여기서, δ 는 "근방" 을 나타낸다.Here, δ represents "near".

인코딩 복잡도는 공간 및 시간 모두의 인간 시각 감도에 의해 영향을 받는다. 지로드 (Girod) 의 인간 시각 모델은 공간 복잡도를 정의하는데 이용될 수 있는 모델의 일 예이다. 이러한 모델은 로컬 공간 주파수와 주위 조명을 고려한다. 그 결과로서 생성된 메트릭은 D_csat 로 지칭된다. 프로세스 내의 전처리 포인트에서, 영상이 인트라 코딩될 것인지 인터 코딩될 것인지 여부는 알려지지 않고, 양쪽에 대한 대역폭 비율이 생성된다. 상이한 비디오 대상의 β _INTRA 간의 비율에 따라 비트가 할당된다. 인트라 코딩된 영상의 경우에, 대역폭 비율은 다음 식에서 표현된다.Encoding complexity is influenced by human visual sensitivity in both space and time. Gird's human visual model is an example of a model that can be used to define spatial complexity. These models take into account local spatial frequencies and ambient lighting. The resulting metric is referred to as D _csat . At the preprocessing point in the process, it is not known whether the image will be intra coded or inter coded, and bandwidth ratios for both are generated. Bits are allocated according to the ratio between β _INTRAs of different video objects. In the case of an intra coded image, the bandwidth ratio is represented by the following equation.

상기 식에 있어서, Y 는 매크로블록의 평균 휘도 성분이고, α _INTRA 는 휘도 제곱 및 그 다음에 오는 D _csat 항에 대한 가중 계수이며,

는

을 보장하기 위한 정규화 계수이다. 예를 들어, α _INTRA = 4 인 값은 양호한 시각 품질을 달성한다. 콘텐츠 정보 (예를 들어, 콘텐츠 분류) 를 이용하여, 특정 비디오 콘텐츠에 대해 원하는 양호한 시각 품질 레벨에 대응하는 값으로 α _INTRA 를 설정할 수 있다. 일 예에서, 비디오 콘텐츠가 "talking head" 뉴스 브로드캐스트를 포함하면, 비디오의 정보 이미지 또는 디스플레이 가능 부분이 오디오 부분보다 낮은 중요도를 갖는 것으로 간주될 수도 있으므로, 시각 품질 레벨을 낮게 설정할 수도 있고, 그 데이터를 인코딩하는데 더 적은 비트를 할당할 수 있다. 다른 예에 있어서, 비디오 콘텐츠가 스포츠 이벤트를 포함하면, 표시된 이미지가 시 청자에게 더 중요할 수도 있으므로, 콘텐츠 정보를 이용하여, 보다 높은 시각 품질 레벨에 대응하는 값으로 α _INTRA 를 설정할 수도 있고, 이에 따라, 그 데이터를 인코딩하는데 더 많은 비트를 할당할 수 있다.In the above formula, Y is the average luminance component of the macroblock, α _INTRA is the weighting coefficient for the luminance squared and the following d _csat term,

Is

Normalization factor to ensure For example, a value with α _INTRA = 4 achieves good visual quality. Using content information (eg, content classification), α _INTRA can be set to a value corresponding to a desired good visual quality level for specific video content. In one example, if the video content includes a "talking head" news broadcast, the visual quality level may be set lower because the information image or displayable portion of the video may be considered to have lower importance than the audio portion, and Less bits can be allocated for encoding the data. In another example, if the video content includes a sporting event, since the displayed image may be more important to the viewer, the content information may be used to set α _INTRA to a value corresponding to a higher visual quality level. Thus, more bits can be allocated to encode the data.

이러한 관계를 이해하기 위해서는, 대역폭이 인코딩 복잡도에 따라 대수적으로 할당된다는 것에 주목해야 한다. 휘도를 제곱한 항 Y ² 은 더 큰 크기를 갖는 계수가 인코딩을 위해 더 많은 비트를 이용한다는 것을 나타낸다. 로그가 음의 값으로 되는 것을 방지하기 위해, 괄호 안의 항에 1 이 더해진다. 또한, 다른 밑을 갖는 로그를 이용할 수 있다.To understand this relationship, it should be noted that bandwidth is allocated algebraically according to encoding complexity. The term Y ^{2, which} is the luminance squared, indicates that coefficients with larger magnitudes use more bits for encoding. To prevent the log from becoming negative, 1 is added to the terms in parentheses. It is also possible to use logs with different bases.

시간 복잡도는 프레임 차이 메트릭을 측정하여 결정되는 것으로서, SAD (sum of the absolute difference) 와 같은 프레임 차이 메트릭과 함께 모션 양 (예를 들어, 모션 벡터) 을 고려하여 2 개의 연속적인 프레임 간의 차이를 측정한다.The time complexity is determined by measuring the frame difference metric, which measures the difference between two consecutive frames, taking into account the amount of motion (for example, the motion vector) together with a frame difference metric such as sum of the absolute difference (SAD). do.

인터 코딩된 영상에 대한 비트 할당은 시간 복잡도뿐만 아니라 공간 복잡도도 고려할 수 있다. 이는 다음 식과 같이 표현된다.Bit allocation for the inter coded image may take into account spatial complexity as well as time complexity. This is expressed as

상기 식에 있어서, MV_P 와 MV_N 은 현재 MB 에 대한 순방향 및 역방향 모션 벡터이다. 인트라 코딩된 대역폭 공식 내의 Y² 은 SSD (sum of squared difference) 로 대체됨을 알 수 있다. 상기 식에서

의 역할 을 이해하기 위해서는, 인간 시각 시스템의 다음 특징에 주목하자: 평활하며 예측 가능한 모션 (작은

) 을 경험한 영역이 주의를 끌고, 인간 눈에 의해 추적될 수 있으며, 통상 정지 영역보다 임의의 많은 왜곡을 허용할 수는 없다. 그러나, 고속이거나 예측할 수 없는 모션 (큰

) 을 경험한 영역은 추적될 수 없고, 상당한 양자화를 허용할 수 있다. 실험예는 α _INTRA = 1,

은 양호한 시각 품질을 달성한다는 것을 나타낸다.In the above formula, MV _P and MV _N are the forward and reverse motion vectors for the current MB. It can be seen that Y ² in the intra coded bandwidth formula is replaced by the sum of squared difference (SSD). In the above formula

To understand the role of, note the following features of the human visual system: smooth and predictable motion (small

The area experiencing the) can be attracted, tracked by the human eye, and typically cannot allow any more distortion than the stationary area. However, high speed or unpredictable motion (big

) May not be tracked and may allow significant quantization. Experimental example is α _INTRA = 1,

Indicates that good visual quality is achieved.

샷 검출Shot detection

이하, 샷 검출의 예시적인 실시형태를 설명한다. 그러한 컴포넌트 및 프로세스는 GOP 분할기 (412) 에 포함될 수 있다 (도 4).Hereinafter, exemplary embodiments of shot detection will be described. Such components and processes may be included in the GOP divider 412 (FIG. 4).

모션 보상기 (23) 는 비디오 내의 프레임에 관한 양방향 모션 정보를 결정하도록 구성될 수 있다. 또한, 모션 보상기 (23) 는 예를 들어, SAD (sum of absolute difference) 또는 SSD (sum of squared difference) 와 같은 하나 이상의 차이 메트릭을 결정하고, 하나 이상의 프레임에 대한 휘도 정보 (예를 들어, 매크로블록 (MB) 휘도 평균 또는 차이), 휘도 히스토그램 차이, 및 프레임 차이 메트릭을 비롯한 다른 정보를 계산하도록 구성될 수 있고, 그 예들은 식 (1) 내지 식 (3) 을 참조하여 설명되어 있다. 샷 분류기는 모션 보상기에 의해 결정된 정보를 이용하여 비디오 내의 프레임을 2 개 이상의 "샷" 카테고리로 분류하도록 구성될 수 있다. 인코더는 샷 분류에 기초하여 복수의 프레임을 적응 인코딩하도록 구 성된다. 이하, 식 (1) 내지 식 (10) 을 참조하여 모션 보상기, 샷 분류기, 및 인코더를 설명한다.Motion compensator 23 can be configured to determine bidirectional motion information about a frame within the video. In addition, motion compensator 23 determines one or more difference metrics, such as, for example, sum of absolute difference (SAD) or sum of squared difference (SSD), and calculates luminance information (eg, macros) for one or more frames. Block (MB) luminance average or difference), luminance histogram difference, and frame difference metric, and other information, examples of which are described with reference to equations (1) through (3). The shot classifier may be configured to classify the frames in the video into two or more "shot" categories using information determined by the motion compensator. The encoder is configured to adaptively encode a plurality of frames based on shot classification. Hereinafter, a motion compensator, a shot classifier, and an encoder will be described with reference to equations (1) to (10).

도 28 은 몇몇 양태에 따른 샷 검출 및 기타 전처리 동작을 위해 구성된 프로세서 (2831) 를 포함하는 전처리기 (202) 의 블록도이다. 디지털 비디오 소스는 도 4 에 도시된 것과 같이 전처리기 (202) 외부에 있는 소스에 의해 제공되어 전처리기 (202) 내의 통신 모듈 (2836) 로 통신될 수 있다. 전처리기 (202) 는 프로세서 (2831) 와 통신하는 저장 매체 (2835) 를 포함하고, 저장 매체 (2835) 와 프로세서 (2831) 모두는 통신 모듈 (2836) 과 통신한다. 프로세서 (2831) 는 모션 보상기 (2832), 샷 분류기 (2833), 및 기타 전처리 모듈 (2034) 을 포함하여, 본원에서 설명되는 것과 같이, 모션 정보를 생성하고, 비디오 데이터의 프레임 내의 샷을 분류하며 기타 전처리 테스트를 수행하도록 동작할 수 있다. 모션 보상기, 샷 분류기, 및 기타 모듈은 도 4 의 대응하는 모듈과 유사한 프로세스를 포함할 수 있고, 아래에서 설명되는 정보를 결정하도록 비디오를 처리할 수 있다. 특히, 프로세서 (2831) 는 복수의 비디오 프레임의 인접 프레임 간의 차이를 나타내는 메트릭 (이 메트릭은 양방향 모션 정보 및 휘도 정보를 포함함) 을 획득하고, 상기 메트릭에 기초하여 복수의 비디오 프레임의 샷 전환을 결정하며, 샷 전환에 기초하여 복수의 프레임을 적응 인코딩하는 구성을 가질 수 있다. 몇몇 양태에서, 그 메트릭은 프로세서 (2831; 이 프로세서는 또한 전처리기 (202) 외부에 있을 수 있음) 외부에 있는 디바이스나 프로세스에 의해 계산될 수 있고, 다른 디바이스나 메모리를 통해 직접 또는 간접적으로 프로세서 (2831) 로 통신될 수 있다. 또한, 그 메트릭은 예를 들어, 모션 보상기 (2832) 와 같은 프로세서 (2831) 에 의해 계산될 수 있다.28 is a block diagram of a preprocessor 202 that includes a processor 2831 configured for shot detection and other preprocessing operations in accordance with some aspects. The digital video source may be provided by a source external to the preprocessor 202 as shown in FIG. 4 and communicated to the communication module 2836 within the preprocessor 202. The preprocessor 202 includes a storage medium 2835 in communication with the processor 2831, and both the storage medium 2835 and the processor 2831 are in communication with the communication module 2836. Processor 2831 includes motion compensator 2832, shot classifier 2833, and other preprocessing module 2034 to generate motion information, classify shots within frames of video data, and as described herein. Operate to perform other preprocessing tests. Motion compensators, shot classifiers, and other modules may include processes similar to the corresponding modules of FIG. 4 and may process video to determine the information described below. In particular, processor 2831 obtains a metric representing a difference between adjacent frames of the plurality of video frames, the metric comprising bidirectional motion information and luminance information, and performing shot transitions of the plurality of video frames based on the metric. Determine, and adaptively encode a plurality of frames based on shot transitions. In some aspects, the metric can be calculated by a device or process that is external to the processor 2831 (which can also be external to the preprocessor 202) and directly or indirectly through another device or memory. (2831). The metric can also be calculated by a processor 2831, such as, for example, a motion compensator 2832.

전처리기 (202) 는 추가적인 처리, 인코딩 및 송신을 위해 비디오 및 메타데이터를 예를 들어, 단말기 (160; 도 1) 에 제공한다. 몇몇 양태에서, 인코딩된 데이터는 베이스 계층 및 인핸스먼트 계층을 포함할 수 있는 스케일러블 다층 인코딩된 비디오일 수 있다. 스케일러블 계층 인코딩은, 본원의 양수인이 소유하고 있고, 본원에서 그 전체를 참조로서 병합하고 있으며, 발명의 명칭이 "SCALABLE VIDEO CODING WITH TWO LAYER ENCODING AND SINGLE LAYER DECODING" 인 동시 계속중인 미국 특허 번호 [Attorney docket no. 050078] 에 더 설명되어 있다.Preprocessor 202 provides video and metadata, for example, to terminal 160 (FIG. 1) for further processing, encoding, and transmission. In some aspects, the encoded data can be scalable multi-layer encoded video that can include a base layer and an enhancement layer. The scalable layer encoding is owned by the assignee of the present application and is hereby incorporated by reference in its entirety, and the concurrent US patent number entitled "SCALABLE VIDEO CODING WITH TWO LAYER ENCODING AND SINGLE LAYER DECODING". Attorney docket no. 050078, which is further described.

몇몇 양태에서, 도 28 과 관련하여 설명된 여러 예시적인 논리 블록, 컴포넌트, 모듈 및 회로와, 본원에 개시된 다른 예 및 도면은, 범용 프로세서, DSP (digital signal processor), ASIC (application specific integrated circuit), FPGA (field programmable gate array) 나 기타 프로그램 가능 논리 디바이스, 별도의 게이트나 트랜지스터 논리, 별도의 하드웨어 컴포넌트, 또는 본원에 설명된 기능을 수행하도록 설계된 그 임의의 조합으로 구현되거나 수행될 수도 있다. 도 28 에 도시된 것과 같은 범용 프로세서는 마이크로프로세서일 수도 있지만, 다른 방법으로는, 그 프로세서는 임의의 통상적인 프로세서, 컨트롤러, 마이크로컨트롤러, 또는 상태 머신일 수도 있다. 또한, 프로세서는 컴퓨팅 디바이스의 조합, 예를 들어, DSP 와 마이크로프로세서의 조합, 복수의 마이크로프로세서, DSP 코어와 결합한 하나 이상의 마이크로프로세서, 또는 임의의 다른 그러한 구성으로서 구현될 수도 있다.In some aspects, various exemplary logic blocks, components, modules, and circuits described in connection with FIG. 28, and other examples and figures disclosed herein, include general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs). May be implemented or performed in a field programmable gate array (FPGA) or other programmable logic device, separate gate or transistor logic, separate hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor such as that shown in FIG. 28 may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

통상, 비디오 인코딩은 구조화된 GOP (group of picture) 에 대해 작용한다. GOP 는 보통 인트라 코딩된 프레임 (I-프레임) 에서 시작하고, 일련의 P (예측) 또는 B (양방향) 프레임으로 진행한다. 통상, I-프레임은 프레임을 표시하기 위한 모든 데이터를 저장할 수 있고, B-프레임은 이전 및 다음 프레임 내의 데이터 (예를 들어, 단지 이전 프레임으로부터 변경된 데이터를 포함하거나 다음 프레임 내의 데이터와는 상이함) 에 의존하며, P-프레임은 이전 프레임으로부터 변경된 데이터를 포함한다.Typically, video encoding works for structured groups of pictures (GOP). GOP usually starts with an intra coded frame (I-frame) and proceeds to a series of P (prediction) or B (bidirectional) frames. Typically, an I-frame can store all the data for representing a frame, and a B-frame can contain data in the previous and next frames (eg, only contain data that has changed from the previous frame or differ from the data in the next frame). ), The P-frame contains the changed data from the previous frame.

일반적인 이용 시, I-프레임은 인코딩된 비디오에서 P-프레임과 B-프레임에 산재된다. 크기 (예를 들어, 프레임을 인코딩하는데 이용되는 비트 수) 면에서, I-프레임은 통상 P-프레임보다 훨씬 더 크고, P-프레임은 또한 B-프레임보다 크다. 효율적인 인코딩, 송신 및 디코딩 처리를 위해, GOP 길이는 큰 I-프레임으로부터의 유효 손실을 줄이기 위해 충분히 길어야 하고, 인코더와 디코더 간의 불일치나 채널 손상을 해결하기 위해 충분히 짧아야 한다. 또한, 동일한 이유로, P 프레임 내의 매크로블록 (MB) 은 인트라코딩될 수 있다.In general use, I-frames are interspersed between P- and B-frames in encoded video. In terms of size (eg, the number of bits used to encode the frame), I-frames are typically much larger than P-frames, and P-frames are also larger than B-frames. For efficient encoding, transmission and decoding processing, the GOP length should be long enough to reduce the effective loss from large I-frames and short enough to resolve the discrepancy or channel damage between the encoder and decoder. Also, for the same reason, macroblocks (MBs) in P frames can be intracoded.

장면 전환 검출은, I-프레임을 일정한 간격으로 삽입하는 대신에, 적당한 GOP 길이를 결정하고, GOP 길이에 기초하여 I-프레임을 삽입하도록 비디오 인코더에서 이용될 수 있다. 실제 스트리밍 비디오 시스템에서, 통신 채널은 통상 비트 에러나 패킷 손실에 의해 손상된다. I 프레임 또는 I 매크로블록을 어디에 배치하는지에 따라, 디코딩된 비디오 품질과 시청 경험을 상당한 영향을 줄 수도 있다. 일 인코딩 방식은 연관된 이전 영상 또는 영상 부분으로부터의 상당한 변화를 갖는 영상 또는 영상 부분에 대해 인트라 코딩된 프레임을 이용하는 것이다. 보통, 이들 영역은 모션 추정을 이용하여 효과적ㆍ효율적으로 예측될 수 없고, 그러한 영역이 프레임 간 코딩 기술 (예를 들어, B-프레임 및 P-프레임을 이용하는 인코딩) 에서 제외되면 더 효율적으로 인코딩을 행할 수 있다. 채널 손상의 환경에서, 그들 영역은 에러 전파를 받을 가능성이 있는데, 이러한 에러 전파는 프레임 내 인코딩에 의해 감소 또는 제거될 수 있다 (또는 거의 감소 또는 제거될 수 있음).Scene transition detection may be used in the video encoder to determine the appropriate GOP length and insert the I-frame based on the GOP length, instead of inserting the I-frames at regular intervals. In a real streaming video system, communication channels are usually corrupted by bit errors or packet loss. Depending on where I frame or I macroblock is placed, the decoded video quality and viewing experience may have a significant impact. One encoding scheme is to use intra coded frames for an image or image portion that has a significant change from the associated previous image or image portion. Usually, these areas cannot be predicted effectively and efficiently using motion estimation, and such areas are more efficiently encoded if excluded from interframe coding techniques (e.g., encoding using B-frames and P-frames). I can do it. In the context of channel damage, those areas are likely to receive error propagation, which can be reduced or eliminated (or nearly reduced or eliminated) by intra-frame encoding.

GOP 비디오의 일부는 2 개 이상의 카테고리로 분류될 수 있는데, 각각의 영역은 특정 구현예에 따라 다를 수도 있는 상이한 프레임 내 인코딩 기준을 가질 수 있다. 일 예로서, 비디오는 3 개의 카테고리로 분류될 수 있다: 갑작스런 장면 전환, 크로스-페이딩 (cross-fading) 과 기타 느린 장면 전환, 및 카메라 플래시라이트. 갑작스런 장면 전환은 통상 카메라 조작에 의해 일어나는 이전 프레임과는 상당히 상이한 프레임을 포함한다. 이들 프레임의 콘텐츠가 이전 프레임의 콘텐츠와는 상이하기 때문에, 갑작스런 장면 전환은 I 프레임으로서 인코딩되어야 한다. 크로스-페이딩 및 기타 느린 장면 전환은 통상 카메라 샷의 컴퓨터 처리에 의해 일어나는 느린 장면 전환을 포함한다. 2 개의 상이한 장면의 점진 블렌딩은 인간 눈에는 더 만족스럽게 보일 수도 있지만, 비디오 코딩하기가 어렵다. 모션 보상은 그들 프레임의 비트레이트를 효과적으로 줄일 수 없고, 이들 프레임에 대해 더 많은 인트라 MB 가 갱신될 수 있다.Some of the GOP video may be classified into two or more categories, each region having different in-frame encoding criteria that may vary depending on the particular implementation. As one example, video can be classified into three categories: abrupt scene transitions, cross-fading and other slow scene transitions, and camera flashlights. Sudden scene transitions usually involve a significantly different frame from the previous frame caused by camera manipulation. Because the contents of these frames are different from the contents of the previous frame, the sudden scene transition should be encoded as an I frame. Cross-fading and other slow scene transitions usually involve slow scene transitions caused by computer processing of camera shots. Gradual blending of two different scenes may seem more satisfactory to the human eye, but is difficult to video code. Motion compensation cannot effectively reduce the bitrate of those frames, and more intra MBs can be updated for these frames.

카메라 플래시라이트, 또는 카메라 플래시 이벤트는, 프레임의 콘텐츠가 카메라 플래시를 포함하는 경우에, 일어난다. 그러한 플래시는 지속기간이 비교적 짧고 (예를 들어, 1 프레임), 매우 밝아서, 플래시를 나타내는 프레임 내의 화소는 인접 프레임 상의 대응하는 영역에 비해 매우 높은 휘도를 나타내게 된다. 카메라 플래시라이트는 영상의 휘도를 갑작스럽고 빠르게 변화시킨다. 통상, 카메라 플래시라이트의 지속기간은 HVS (human vision system) 의 시간 마스킹 지속기간보다 짧은데, 이는 통상 44 ms 가 되도록 정의된다. 인간 눈이 이들 짧은 밝기 버스트의 품질에 민감하지 않으므로, 대강 인코딩되어도 된다. 모션 보상을 이용하여 플래시라이트 프레임을 효과적으로 처리할 수 없으며 장래 프레임에 대해 열악한 예측 후보이므로, 이들 프레임의 대강 인코딩은 장래 프레임의 인코딩 효율을 줄이지 않는다. 플래시라이트로서 분류된 장면은 "인위적인" 고휘도로 인해 다른 프레임을 예측하는데 이용되지 않아야 하고, 동일한 이유로 이들 프레임을 예측하는데 다른 프레임을 효과적으로 이용할 수 없다. 일단 식별되면, 비교적 많은 양의 처리를 필요로 할 수 있기 때문에, 이들 프레임을 제거할 수 있다. 한 가지 옵션은 카메라 플래시라이트 프레임을 제거하고 그 대신에 DC 계수를 인코딩하는 것이다; 그러한 솔루션은 간단하고, 계산이 빠르며 많은 비트를 절약한다.Camera flashlight, or camera flash event, occurs when the content of a frame includes a camera flash. Such a flash has a relatively short duration (e.g., one frame), and is so bright that the pixels in the frame representing the flash exhibit very high luminance compared to the corresponding area on adjacent frames. Camera flashlights change the brightness of an image suddenly and quickly. Typically, the duration of the camera flashlight is shorter than the time masking duration of the human vision system (HVS), which is typically defined to be 44 ms. Since the human eye is not sensitive to the quality of these short brightness bursts, they may be roughly encoded. Because motion compensation cannot be used to efficiently process flashlight frames and is a poor prediction candidate for future frames, the rough encoding of these frames does not reduce the encoding efficiency of future frames. Scenes classified as flashlights should not be used to predict other frames due to "artificial" high brightness, and for the same reason other frames cannot be effectively used to predict these frames. Once identified, these frames can be eliminated because they may require a relatively large amount of processing. One option is to remove the camera flashlight frame and encode the DC coefficients instead; Such a solution is simple, fast to calculate and saves a lot of bits.

상기 프레임들 중 임의의 프레임이 검출되면, 샷 이벤트가 선언된다. 샷 검출은 인코딩 품질을 개선하는데 있어서 유용할 뿐만 아니라, 비디오 콘텐츠 검색 및 인덱싱의 식별에 있어서도 유용할 수 있다. 이하, 샷 검출 프로세스의 일 양태를 설명한다.If any of the frames are detected, a shot event is declared. Shot detection is useful not only for improving encoding quality but also for identifying video content search and indexing. Hereinafter, one aspect of the shot detection process will be described.

도 30 은 몇몇 양태에서 비디오 프레임 내의 샷 검출에 기초하여 비디오를 인코딩하는데 이용될 수 있는 GOP 에 대한 작용하는 프로세스 (3000) 를 도시하는 것으로서, 프로세스 (3000) 의 부분들 (또는 하위 프로세스) 는 도 30 내지 도 40 을 참조하여 설명 및 도시되어 있다. 프로세서 (2831) 는 프로세스 (3000) 를 포함하도록 구성될 수 있다. 프로세스 (3000) 가 시작한 다음에, 블록 3042 로 진행하여, 비디오 프레임에 대해 메트릭 (정보) 을 획득하는데, 이 메트릭은 인접 프레임 간의 차이를 나타내는 정보를 포함한다. 그 메트릭은 그 다음에 샷 분류에 이용될 수 있는 인접 프레임 간에 일어난 전환을 결정하기 위한 휘도 기반 정보와 양방향 모션 정보를 포함한다. 그러한 메트릭은 다른 디바이스나 프로세스로부터 획득되거나, 예를 들어, 프로세서 (2831) 에 의해 계산될 수 있다. 이하, 도 31 의 프로세스 A 를 참조하여 예시적인 메트릭 생성의 실시형태를 설명한다.FIG. 30 illustrates a working process 3000 for a GOP that may be used to encode video based on shot detection within a video frame in some aspects, wherein portions (or subprocesses) of the process 3000 are shown in FIG. It is described and illustrated with reference to 30-40. Processor 2831 can be configured to include a process 3000. After the process 3000 begins, flow proceeds to block 3042 to obtain a metric (information) for the video frame, which includes information indicative of the difference between adjacent frames. The metric then includes luminance based information and bidirectional motion information to determine a transition that has occurred between adjacent frames that can be used for shot classification. Such a metric can be obtained from another device or process, or calculated by, for example, the processor 2831. An embodiment of exemplary metric generation is described below with reference to process A in FIG. 31.

그 다음에, 프로세스 (3000) 는 블록 3044 로 진행하여, 메트릭에 기초하여 비디오의 샷 전환을 결정한다. 비디오 프레임은 어떤 타입의 샷 (예를 들어, 갑작스런 장면 전환, 느린 장면 전환, 또는 고휘도 값 (카메라 플래시) 을 포함하는 장면) 이 프레임에 포함되어 있는지에 대한 2 개 이상의 카테고리로 분류될 수 있다. 일정 구현예의 인코딩은 다른 카테고리를 필요로 할 수도 있다. 샷 분류의 예시적인 예는 각각 도 32 의 프로세스 B 를 참조하여 설명되며, 도 34 내지 도 36 의 프로세스 D, E, 및 F 를 참조하여 더 상세히 설명된다.Process 3000 then proceeds to block 3044 to determine shot transitions of the video based on the metric. Video frames can be classified into two or more categories of which types of shots (eg, sudden scene transitions, slow scene transitions, or scenes containing high brightness values (camera flashes)) are included in the frame. The encoding of some implementations may require other categories. An illustrative example of shot classification is described with reference to process B of FIG. 32, respectively, and in more detail with reference to processes D, E, and F of FIGS. 34-36.

일단 프레임이 분류되면, 프로세스 (3000) 는 블록 3046 으로 진행하여, 샷 분류 결과를 이용하여 프레임을 인코딩하거나 인코딩을 위해 지정할 수 있다. 그러한 결과는 인트라 코딩된 프레임으로 프레임을 인코딩할지 예측 프레임 (예를 들어, P-프레임 또는 B-프레임) 으로 인코딩할지 여부에 영향을 줄 수 있다. 도 33 의 프로세스 C 는 샷 결과를 이용하는 인코딩 방식의 일 예를 도시한다.Once the frames have been classified, process 3000 proceeds to block 3046 to encode or specify the frames for encoding using the shot classification result. Such a result may affect whether to encode a frame into an intra coded frame or to a predictive frame (eg, a P-frame or a B-frame). Process C of FIG. 33 shows an example of an encoding scheme using a shot result.

도 31 은 비디오의 메트릭을 획득하는 프로세스의 일 예를 도시한다. 도 31 은 도 30 의 블록 3042 에서 일어나는 일정 단계를 도시한다. 계속 도 31 을 참조하면, 블록 3152 에서, 프로세스 A 는 비디오의 양방향 모션 추정 및 보상 정보를 획득하거나 결정한다. 도 28 의 모션 보상기 (2832) 는 프레임에 대한 양방향 모션 추정을 수행하고, 다음 샷 분류에 이용될 수 있는 모션 보상 정보를 결정하도록 구성될 수 있다. 그 다음에, 프로세스 A 는 블록 3154 로 진행하여, 현재 또는 선택된 프레임 및 하나 이상의 인접 프레임에 대한 휘도 차이 히스토그램을 포함한 휘도 정보를 생성한다. 그 다음에, 끝으로, 프로세스 A 는 블록 3156 으로 진행하여, 프레임에 포함된 샷을 나타내는 메트릭을 계산한다. 그러한 일 메트릭은 식 (4) 및 식 (10) 의 2 개의 예에 도시되어 있는 프레임 차이 메트릭이다. 이하, 모션 정보, 휘도 정보 및 프레임 차이 메트릭을 결정하는 예시적인 예를 설명한다.31 shows an example of a process of obtaining a metric of a video. FIG. 31 illustrates certain steps occurring at block 3042 of FIG. 30. With continued reference to FIG. 31, at block 3152, process A obtains or determines bidirectional motion estimation and compensation information of the video. The motion compensator 2832 of FIG. 28 may be configured to perform bidirectional motion estimation for the frame and determine motion compensation information that may be used for the next shot classification. Process A then proceeds to block 3154 to generate luminance information including a luminance difference histogram for the current or selected frame and one or more adjacent frames. Then, finally, process A proceeds to block 3156 to calculate a metric representing a shot included in the frame. One such metric is the frame difference metric shown in two examples of equations (4) and (10). An example of determining motion information, luminance information, and frame difference metrics is described below.

모션 보상Motion compensation

양방향 모션 추정/보상을 수행하기 위해, 비디오 시퀀스는 프레임들 중 2 개의 프레임, 즉 가장 인접한 이웃 프레임 (과거의 일 프레임과 장래의 일 프레임) 내의 블록과 현재 프레임의 모든 8 × 8 블록을 일치시키는 양방향 모션 보상기로 전처리될 수 있다. 모션 보상기는 모든 블록마다 모션 벡터와 차이 메트릭을 생성한다. 도 37 은 이러한 개념을 나타내는 것으로서, 현재 프레임 C 의 화소를 이전 프레임 P 및 장래 프레임 (또는 다음) 프레임 N 과 일치시키는 일 예를 도시하는 개념을 나타낸 것으로서, 일치된 화소에 대한 모션 벡터 (이전 모션 벡터 MV_P 및 장래 모션 벡터 MV_N) 를 나타낸다. 이하, 양방향 모션 벡터 생성 및 관련 인코딩의 예시적인 양태의 간단한 설명이 계속된다.To perform bidirectional motion estimation / compensation, the video sequence matches two blocks of the frames, that is, blocks in the nearest neighboring frame (one past frame and one future frame) with all 8x8 blocks of the current frame. It can be preprocessed with a bidirectional motion compensator. The motion compensator generates motion vectors and difference metrics for every block. FIG. 37 illustrates this concept, illustrating an example of matching a pixel of a current frame C to a previous frame P and a future frame (or next) frame N, wherein the motion vector (previous motion) for the matched pixel is shown. Vector MV _P and future motion vector MV _N ). The following briefly describes exemplary aspects of bidirectional motion vector generation and associated encoding.

도 40 은 예를 들어, MPEG-4 에 있어서 모션 벡터 결정 프로세스와 예측 프레임 인코딩의 일 예를 도시한다. 도 40 에 도시된 프로세스는 도 31 의 블록 3152 에서 일어날 수 있는 예시적인 프로세스를 더 상세히 도시한다. 도 40 에서, 현재 영상 (4034) 은 5 × 5 매크로블록으로 이루어지고, 본 예에서 매크로블록의 수는 임의적이다. 매크로블록은 16 × 16 화소로 이루어진다. 화소는 8 비트 휘도 값 (Y) 과 2 개의 8 비트 색차 값 (Cr 및 Cb) 으로 정의될 수 있다.40 shows an example of a motion vector determination process and predictive frame encoding, for example in MPEG-4. The process shown in FIG. 40 illustrates in more detail an example process that can occur at block 3152 of FIG. 31. In FIG. 40, the current image 4034 consists of 5 x 5 macroblocks, and the number of macroblocks in this example is arbitrary. The macroblock consists of 16 x 16 pixels. The pixel may be defined by an 8 bit luminance value Y and two 8 bit color difference values Cr and Cb.

MPEG 에 있어서, Y, Cr 및 Cb 성분은 4:2:0 포맷으로 저장될 수 있고, Cr 및 Cb 성분은 X 및 Y 방향으로 2 씩 다운샘플링된다. 따라서, 각각의 매크로블록은 256 개의 Y 성분, 64 개의 Cr 성분 및 64 개의 Cb 성분으로 이루어질 것이다. 현재 영상 (4034) 의 매크로블록 (4036) 은 현재 영상 (4034) 과는 상이한 시점에서 기준 영상 (4032) 으로부터 예측된다. 인코딩되는 현재 매크로블록 (4036) 에 대한 Y, Cr 및 Cb 값의 관점에서 가장 가까운 최적의 매칭 매크로블록 (4038) 을 찾기 위해, 기준 영상 (4032) 을 검색한다. 기준 영상 (4032) 내에 있는 최적의 매칭 매크로블록 (4038) 의 위치는 모션 벡터 (4040) 에서 인코딩된다. 기준 영상 (4032) 은, 현재 영상 (4034) 의 구성 전에 디코더가 복원할 I-프레임 또는 P-프레임일 수 있다. 최적의 매칭 매크로블록 (4038) 을 현재 매크로블록 (4036) 에서 빼면 (Y, Cr 및 Cb 성분 각각에 대한 차이를 계산), 레지듀얼 에러 (4042) 가 얻어진다. 레지듀얼 에러 (4042) 를 2D 이산 코사인 변환 (DCT) 으로 인코딩한 다음에, 양자화 (4046) 한다. 양자화 (4046) 를 수행하여, 예를 들어, 저주파수 계수에 더 많은 비트를 할당하면서 고주파수 계수에 더 적은 비트를 할당함으로써 공간 압축을 제공할 수 있다. 정보를 식별하는 기준 영상 (4034) 및 모션 벡터 (4040) 와 함께 레지듀얼 에러 (4042) 의 양자화된 계수는 현재 매크로블록 (4036) 을 나타내는 인코딩된 정보이다. 인코딩된 정보는 추후 이용을 위해 메모리에 저장되거나, 예를 들어, 에러 정정이나 이미지 개선 (image enhancement) 을 위해 처리되거나, 또는 네트워크 (140) 를 통해 송신될 수 있다.In MPEG, the Y, Cr and Cb components can be stored in 4: 2: 0 format, and the Cr and Cb components are downsampled by two in the X and Y directions. Thus, each macroblock will consist of 256 Y components, 64 Cr components and 64 Cb components. The macroblock 4036 of the current image 4034 is predicted from the reference image 4032 at a different time point than the current image 4034. To find the closest best matching macroblock 4038 in terms of the Y, Cr and Cb values for the current macroblock 4036 being encoded, search the reference image 4032. The position of the best matching macroblock 4038 within the reference image 4032 is encoded in the motion vector 4040. The reference picture 4032 may be an I-frame or a P-frame that the decoder will reconstruct before the configuration of the current picture 4034. Subtracting the optimal matching macroblock 4038 from the current macroblock 4036 (calculating the differences for each of the Y, Cr, and Cb components), a residual error 4042 is obtained. The residual error 4042 is encoded with a 2D Discrete Cosine Transform (DCT), followed by quantization 4046. Quantization 4046 may be performed to provide spatial compression, for example, by assigning more bits to the low frequency coefficients while allocating fewer bits to the high frequency coefficients. The quantized coefficients of the residual error 4042 together with the reference image 4034 and the motion vector 4040 identifying the information are encoded information representing the current macroblock 4036. The encoded information may be stored in memory for later use, processed for example for error correction or image enhancement, or transmitted over network 140.

인코딩된 모션 벡터 (4040) 와 함께 레지듀얼 에러 (4042) 의 인코딩된 양자화된 계수를 이용하여, 후속하는 모션 추정 및 보상을 위한 기준 프레임의 일부로서 이용되는 인코더에서 현재 매크로블록 (4036) 을 복원할 수 있다. 인코더는 이러한 P-프레임 복원을 위해 디코더의 절차를 에뮬레이션할 수 있다. 디코더의 에뮬레이션으로 인해, 인코더와 디코더 모두는 동일한 기준 영상과 함께 작업하게 된다. 이때, 추후 인터-코딩을 위해 인코더에서 행해지는지 디코더에서 행 해지는지 여부에 관계없이 복원 프로세스가 제공된다. P-프레임의 복원은 기준 프레임 (또는 참조되고 있는 영상이나 프레임의 일부) 이 복원된 후에 시작할 수 있다. 인코딩된 양자화된 계수는 역양자화 (4050) 된 다음에, 2D 역 DCT, 또는 IDCT (4052) 를 수행하여, 디코딩되거나 복원된 레지듀얼 에러 (4054) 를 얻는다. 인코딩된 모션 벡터 (4040) 를 디코딩하고, 이를 이용하여 이미 복원된 기준 영상 (4032) 에서 이미 복원된 최적의 매칭 매크로블록 (4056) 을 찾는다. 그 다음에, 복원된 레지듀얼 에러 (4054) 를 복원된 최적의 매칭 매크로블록 (4056) 에 더하여, 복원된 매크로블록 (4058) 을 형성한다. 복원된 매크로블록 (4058) 은 메모리에 저장되거나, 다른 복원된 매크로블록과 함께 하나의 영상으로 또는 개별적으로 표시되거나, 또는 이미지 개선을 위해 더 처리될 수 있다.Using the encoded quantized coefficients of the residual error 4042 together with the encoded motion vector 4040, reconstruct the current macroblock 4036 at the encoder used as part of the reference frame for subsequent motion estimation and compensation. can do. The encoder can emulate the decoder's procedure for this P-frame recovery. Due to the emulation of the decoder, both the encoder and the decoder work with the same reference picture. At this point, a restoration process is provided regardless of whether it is done at the encoder or at the decoder for later inter-coding. Reconstruction of a P-frame may begin after the reference frame (or part of the image or frame being referenced) has been reconstructed. The encoded quantized coefficients are dequantized 4050 and then subjected to 2D inverse DCT, or IDCT 4042, to obtain decoded or reconstructed residual error 4054. Decode the encoded motion vector 4040 and use it to find the best matching macroblock 4056 that has already been reconstructed from the already reconstructed reference picture 4032. Then, the reconstructed residual error 4054 is added to the reconstructed optimal matching macroblock 4056 to form a reconstructed macroblock 4058. The reconstructed macroblock 4058 may be stored in memory, displayed as one image or separately with other reconstructed macroblocks, or further processed for image enhancement.

B-프레임 (또는 양방향 예측으로 코딩된 임의의 섹션) 을 이용하는 인코딩은 현재 영상 내의 영역과 이전 영상 내의 최적의 매칭 예측 영역과 후속 영상 내의 최적의 매칭 예측 영역 간의 시간 리던던시를 활용할 수 있다. 후속하는 최적의 매칭 예측 영역과 이전에 있는 최적의 매칭 예측 영역을 결합하여, 결합된 양방향 예측 영역을 형성한다. 현재 영상 영역과 최적의 매칭 결합된 양방향 예측 영역 간의 차이가 레지듀얼 에러 (또는 예측 에러) 이다. 후속하는 기준 영상 내의 최적의 매칭 예측 영역 및 이전에 있는 기준 영상 내의 최적의 매칭 예측 영역의 위치는 2 개의 모션 벡터로 인코딩될 수 있다.Encoding using B-frames (or any section coded with bi-prediction) may utilize time redundancy between an area in the current picture and an optimal matching prediction area in the previous picture and an optimal matching prediction area in the subsequent picture. The subsequent best matching prediction region and the previous best matching prediction region are combined to form a combined bidirectional prediction region. The difference between the current video region and the best match combined bidirectional prediction region is the residual error (or prediction error). The position of the optimal matching prediction region in the subsequent reference image and the optimal matching prediction region in the previous reference image may be encoded into two motion vectors.

휘도 히스토그램 차이Luminance Histogram Difference

모션 보상기는 모든 블록에 대해 차이 메트릭을 생성할 수 있다. 그 차이 메트릭은 SSD (sum of square difference) 또는 SAD (sum of absolute difference) 일 수 있다. 본원에서, 일반성을 잃지 않으면서, SAD 는 일 예로서 이용된다.The motion compensator may generate difference metrics for all blocks. The difference metric can be a sum of square difference (SSD) or a sum of absolute difference (SAD). Here, SAD is used as an example without losing generality.

모든 프레임에 대해, SAD 비율은 다음과 같이 계산된다.For all frames, the SAD ratio is calculated as follows.

여기서, SAD _P 와 SAD _N 은 각각 순방향 및 역방향 차이 메트릭의 절대 차이 합이다. "0 으로 나누는" 에러를 방지하기 위해 분모는 작은 양의 수 ε 을 포함한다는 것에 주목하자. 또한, 분자는 분모에서 1 의 영향을 밸런싱하기 위해 ε 을 포함한다. 예를 들어, 이전 프레임, 현재 프레임, 및 다음 프레임이 동일하면, 모션 검색은 SAD_P = SAD_N = 0 을 생성하게 된다. 이 경우에, 상기 계산은 0 또는 무한대 대신에

을 생성한다.Where SAD _P and SAD _N are the sum of absolute differences of the forward and reverse difference metrics, respectively. Note that the denominator contains a small amount ε to prevent a "divide by zero" error. The molecule also contains ε to balance the effect of 1 in the denominator. For example, if the previous frame, the current frame, and the next frame are the same, the motion search will generate SAD _P = SAD _N = 0. In this case, the calculation is instead of zero or infinity

.

휘도 히스토그램은 모든 프레임에 대해 계산될 수 있다. 통상, 멀티미디어 이미지는 8 비트의 휘도 깊이 (예를 들어, "빈 (bin)" 개수) 를 갖는다. 몇몇 양태에 따른 휘도 히스토그램을 계산하는데 있어서 이용되는 휘도 깊이를 16 으로 설정하여, 히스토그램을 얻을 수 있다. 다른 양태에서는, 처리되는 데이터 타입, 가용 계산 전력, 또는 기타 소정의 기준에 따라 다를 수도 있는 적절한 숫자로 휘도 깊이를 설정할 수 있다. 몇몇 양태에서, 휘도 깊이는 데이터 콘텐츠와 같은 계산되거나 수신된 메트릭에 기초하여 동적으로 설정될 수 있다.The luminance histogram can be calculated for every frame. Typically, multimedia images have an 8-bit luminance depth (eg, a "bin" number). The histogram can be obtained by setting the luminance depth used to calculate the luminance histogram according to some embodiments to 16. In other aspects, the brightness depth may be set to an appropriate number that may vary depending on the type of data being processed, available computational power, or other predetermined criteria. In some aspects, the luminance depth can be set dynamically based on calculated or received metrics, such as data content.

식 (49) 는 휘도 히스토그램 차이 (lambda) 를 계산하는 일 예를 도시한다.Equation (49) shows an example of calculating the luminance histogram difference (lambda).

여기서, N_Pi 는 이전 프레임에 대한 i 번째 빈에서의 블록 수이고, N_Ci 는 현재 프레임에 대한 i 번째 빈에서의 블록 수이며, N 은 일 프레임 내의 전체 블록 수이다. 이전 프레임과 현재 프레임의 휘도 히스토그램 차이가 완전히 다르면 (disjoint), λ = 2 이다.Where N _Pi is the number of blocks in the i-th bin for the previous frame, N _Ci is the number of blocks in the i-th bin for the current frame, and N is the total number of blocks in one frame. If the luminance histogram difference between the previous frame and the current frame is completely different (disjoint), then λ = 2.

도 5 의 블록 56 을 참조하여 설명되는 프레임 차이 메트릭 D 는 식 (50) 에 도시된 것과 같이 계산될 수 있다.The frame difference metric D described with reference to block 56 of FIG. 5 may be calculated as shown in equation (50).

여기서, A 는 애플리케이션에 의해 선택된 상수이고,

, 및

이다.Where A is a constant selected by the application,

, And

to be.

도 32 는 비디오에 대해 획득되거나 결정된 메트릭을 이용하여 샷 (또는 장면) 전환의 3 개의 카테고리를 결정하는 프로세스 B 의 일 예를 도시한다. 도 32 는 도 30 의 블록 3044 의 일 양태에서 일어나는 일정 단계를 도시한다. 도 32 를 다시 참조하면, 블록 3262 에서, 프로세스 B 는 우선 프레임이 갑작스런 장면 전환으로 지정되도록 하는 기준을 만족하는지 여부를 결정한다. 도 34 의 프로세스 D 는 이러한 결정의 일 예를 도시한다. 그 다음에, 프로세스 B 는 블록 3264 로 진행하여, 프레임이 느리게 전환하는 장면의 일부인지 여부를 결정한다. 도 35 의 프로세스 C 는 느리게 전환하는 장면을 결정하는 일 예를 도시한다. 끝으로, 블록 3366 에서, 프로세스 B 는 프레임이 카메라 플래시를 포함하는지 여부, 즉, 이전 프레임과는 상이한 큰 휘도 값을 포함하는지 여부를 결정한다. 도 36 의 프로세스 F 는 카메라 플래시를 포함하는 프레임을 결정하는 일 예를 도시한다. 이하, 이들 프로세스의 예시적인 예를 설명한다.32 shows an example of process B for determining three categories of shot (or scene) transitions using metrics obtained or determined for video. 32 illustrates certain steps occurring in an aspect of block 3044 of FIG. 30. Referring back to FIG. 32, at block 3262, process B first determines whether the frame satisfies the criteria for being designated as a sudden scene transition. Process D of FIG. 34 shows an example of such a determination. Process B then proceeds to block 3264 to determine whether the frame is part of a slow transition scene. Process C of FIG. 35 shows an example of determining a scene to switch slowly. Finally, at block 3366, process B determines whether the frame includes a camera flash, that is, includes a large luminance value that is different from the previous frame. Process F of FIG. 36 shows an example of determining a frame including a camera flash. Illustrative examples of these processes are described below.

갑작스런 장면 전환Sudden Cutaway

도 34 는 갑작스런 장면 전환을 결정하는 프로세스를 도시한 흐름도이다. 도 34 는 도 32 의 블록 3262 의 몇몇 양태에서 일어날 수 있는 일정 단계를 더 상세히 설명한다. 블록 3482 에서, 프레임 차이 메트릭 D 가 식 (51) 에 도시된 기준을 만족하는지 여부를 확인한다.34 is a flowchart illustrating a process of determining a sudden scene change. FIG. 34 describes in more detail certain steps that may occur in some aspects of block 3262 of FIG. 32. In block 3482, check whether the frame difference metric D satisfies the criteria shown in equation (51).

여기서, A 는 애플리케이션에 의해 선택된 상수이고, T ₁ 은 임계값이다. 그 기준을 만족하면, 블록 3484 에서, 프로세스 D 는 그 프레임을 갑작스런 장면 전환으로서 지정하는데, 본 예에서는, 추가적인 샷 분류를 필요로 하지 않는다.Where A is a constant selected by the application and T ₁ is a threshold. If the criterion is satisfied, at block 3484, process D designates the frame as a sudden scene change, which in this example does not require additional shot classification.

일 예에서, 시뮬레이션은, A = 1, 및 T₁ = 5 로 설정하면, 양호한 검출 성능을 달성함을 보여준다. 현재 프레임이 갑작스런 장면 전환 프레임이면,

는 커야 하고,

는 작아야 한다. 비율

를

단독 대신에 이용하여, 그 메트릭을 콘텍스트의 액티비티 레벨로 정규화할 수 있다.In one example, the simulation shows that setting A = 1, and T ₁ = 5 achieves good detection performance. If the current frame is a sudden transition frame,

Should be large,

Should be small. ratio

To

Instead of being used alone, the metric can be normalized to the activity level of the context.

상기 기준은 휘도 히스토그램 차이 람다 (λ) 를 비선형 방식으로 이용한다는 것에 주목해야 한다. 도 39 는 λ＊(2λ+1) 이 볼록 함수임을 나타낸다. λ 가 작으면 (예를 들어, 0 에 가까우면), 프리엠퍼시스 (preemphasis) 가 거의 없다. λ 가 커지면, 그 함수에 의해 더 많은 엠퍼시스가 행해진다. 1.4 보다 큰 임의의 λ 에 대해 이러한 프리엠퍼시스를 이용하여, 임계값 T₁ 이 5 로 설정되면, 갑작스런 장면 전환이 검출된다.It should be noted that the criterion uses the luminance histogram difference lambda λ in a nonlinear manner. 39 shows that λ * (2λ + 1) is a convex function. If λ is small (eg, close to zero), there is little preemphasis. The larger lambda is, the more emulation is performed by that function. Using this pre-emphasis for any λ greater than 1.4, if the threshold T ₁ is set to 5, then a sudden scene change is detected.

크로스 페이딩 및 느린 장면 전환Cross fading and slow cutaway

또한, 도 35 는 도 32 의 블록 3264 에서 일어날 수 있는 몇몇 양태의 추가 상세를 도시한다. 도 35 를 참조하면, 블록 3592 에서, 프로세스 E 는 프레임이 느린 장면 전환을 나타내는 일련의 프레임의 일부인지 여부를 결정한다. 식 (52) 에 도시된 바와 같이, 프레임 차이 메트릭 D 가 제 1 임계값 T ₁ 보다 작으며 제 2 임계값 T ₂ 이상인 경우에, 프로세스 E 는 현재 프레임이 크로스 페이딩 또는 느린 장면 전환인 것으로 결정한다.35 also shows further details of some aspects that may occur at block 3264 of FIG. 32. Referring to FIG. 35, at block 3592, process E determines whether the frame is part of a series of frames indicating a slow scene transition. As shown in equation (52), when the frame difference metric D is less than the first threshold T _{1 and} greater than or equal to the second threshold T ₂ , process E determines that the current frame is cross fading or slow scene transition. .

일정 개수의 연속 프레임에 대해,For a certain number of consecutive frames,

여기서, T ₁ 은 상기 이용되는 동일 임계값이며, T ₂ 는 다른 임계값이다. 통상, 구현예의 차이가 가능하기 때문에, T ₁ 및 T ₂ 의 정확한 값은 정규 실험에 의해 결정된다. 기준을 만족하면, 블록 3594 에서, 프로세스 E 는 선택된 프레임의 종료를 위해 프레임을 느리게 변하는 장면 샷 분류의 일부로서 분류한다.Where T ₁ is the same threshold value used above and T ₂ is another threshold value. Usually, since differences in embodiments are possible, the exact values of T ₁ and T ₂ are determined by regular experiments. If the criterion is satisfied, at block 3594, process E classifies the frame as part of the slowly changing scene shot classification for the end of the selected frame.

카메라 플래시라이트 이벤트Camera flashlight event

도 36 에 도시된 프로세스 F 는 현재 프레임이 카메라 플래시라이트를 포함하는지 여부를 결정할 수 있는 프로세스의 일 예이다. 본 예시적인 양태의 카메라에 있어서, 휘도 히스토그램 통계치는 현재 프레임이 카메라 플래시라이트를 포함하는지 여부를 결정하는데 이용된다. 프로세스 F 는 블록 3602 에 도시된 것과 같이, 우선 현재 프레임의 휘도가 이전 프레임 및 다음 프레임의 휘도보다 큰지 여부를 결정함으로써, 선택된 프레임에서 카메라 플래시 이벤트를 결정한다. 현재 프레임의 휘도가 이전 프레임 및 다음 프레임의 휘도보다 크지 경우에, 그 프레임은 카메라 플래시 이벤트가 아니지만, 현재 프레임의 휘도가 큰 경우에, 그 프레임은 카메라 플래시 이벤트일 수도 있다. 블록 3604 에서, 프로세스 F 는 역방향 차이 메트릭이 임계값 T₃ 보다 큰지 여부를 결정하며, 순방향 차이 메트릭이 임계값 T₄ 보다 큰지 여부를 결정하고, 이들 양쪽 조건이 만족되면, 블록 3606 에서, 프로세스 F 는 현재 프레임을 카메라 플래시라이트를 갖는 것으로서 분류한다. 일 예에서, 블록 3602 에서, 식 (53) 및 식 (54) 에 도시된 것과 같이, 프로세스 F 는 현재 프레임의 평균 휘도에서 이전 프레임의 평균 휘도를 뺀 값이 임계값 T ₃ 이상인지 여부를 결정하고, 프로세스 F 는 현재 프레임의 평균 휘도에서 다음 프 레임의 평균 휘도를 뺀 값이 임계값 T ₃ 이상인지 여부를 결정한다.Process F shown in FIG. 36 is an example of a process that can determine whether the current frame includes a camera flashlight. In the camera of this exemplary aspect, the luminance histogram statistics are used to determine whether the current frame includes a camera flashlight. Process F determines the camera flash event in the selected frame by first determining whether the brightness of the current frame is greater than the brightness of the previous and next frames, as shown in block 3602. If the brightness of the current frame is greater than the brightness of the previous and next frames, the frame is not a camera flash event, but if the brightness of the current frame is large, the frame may be a camera flash event. In block 3604, process F determines whether the reverse difference metric is greater than threshold T ₃ , determines whether the forward difference metric is greater than threshold T ₄ , and if both conditions are met, in block 3606, process F Classifies the current frame as having a camera flashlight. In one example, at block 3602, as shown in equations (53) and (54), process F determines whether the average brightness of the current frame minus the average brightness of the previous frame is greater than or equal to the threshold T _3. The process F determines whether or not the average luminance of the current frame minus the average luminance of the next frame is greater than or equal to the threshold T ₃ .

기준이 만족되지 않으면, 현재 프레임은 카메라 플래시라이트를 포함하는 것으로서 분류되지 않고, 프로세스 F 는 리턴한다. 기준이 만족되면, 프로세스 F 는 블록 3604 로 진행하여, 아래의 식 (55) 및 식 (56) 에 도시된 것과 같이, 역방향 차이 메트릭 SAD _P 및 순방향 차이 메트릭 SAD _N 이 일정 임계값 T ₄ 보다 큰지 여부를 결정한다.If the criteria are not met, the current frame is not classified as containing a camera flashlight, and process F returns. If the criterion is satisfied, process F proceeds to block 3604 to determine if the reverse difference metric SAD _P and the forward difference metric SAD _N are greater than a certain threshold T ₄ , as shown in equations (55) and (56) below. Determine whether or not.

여기서,

는 현재 프레임의 평균 휘도이고,

는 이전 프레임의 평균 휘도이고,

은 다음 프레임의 평균 휘도이며, SAD _P 및 SAD _N 은 현재 프레임과 연관된 순방향 및 역방향 차이 메트릭이다. 기준이 만족되지 않으면, 프로세스 F 는 리턴한다.here,

Is the average luminance of the current frame,

Is the average luminance of the previous frame,

Is the average luminance of the next frame, and SAD _P and SAD _N are the forward and reverse difference metrics associated with the current frame. If the criteria are not met, process F returns.

통상, 설명된 프로세스의 구현으로 인해 임계값을 비롯한 동작 파라미터의 차이가 발생할 때에, T₃ 의 값이 정규 실험에 의해 결정된다. SAD 값은 카메라 플래시가 통상 일 프레임만을 차지하기 때문에 그 결정에 포함되고, 휘도 차이로 인해, 이러한 프레임은 순방향과 역방향 모두로부터의 모션 보상을 이용하여 양호하게 예측될 수 없다.Typically, when the implementation of the described process results in differences in operating parameters including thresholds, the value of T ₃ is determined by regular experiments. The SAD value is included in the determination because the camera flash usually occupies only one frame, and due to the luminance difference, such frame cannot be predicted well using motion compensation from both forward and reverse directions.

몇몇 양태에서, 임계값 T₁, T₂, T₃ 및 T₄ 중 하나 이상이 미리 결정되고, 그러한 값은 인코딩 디바이스 내의 샷 분류기 내에 통합된다. 통상, 이들 임계값은 샷 검출의 특정 구현의 시험을 통해 선택된다. 몇몇 양태에서, 임계값 T₁, T₂, T₃ 및 T₄ 중 하나 이상은 샷 분류기에 공급된 정보 (예를 들어, 메타데이터) 에 기초하여 또는 샷 분류기 자체에 의해 계산된 정보에 기초하여 (예를 들어, 동적으로) 처리 동안에 설정될 수 있다.In some aspects, one or more of the thresholds T ₁ , T ₂ , T _3, and T ₄ are predetermined, and such values are integrated into the shot classifier in the encoding device. Typically, these thresholds are selected through testing of a particular implementation of shot detection. In some embodiments, one or more of the thresholds T ₁ , T ₂ , T _3, and T ₄ are based on information supplied to the shot classifier (eg, metadata) or based on information calculated by the shot classifier itself. It can be set during processing (eg, dynamically).

이하, 선택된 프레임의 샷 분류에 기초하여, 비디오를 인코딩하거나, 비디오에 대한 인코딩 파라미터를 결정하는 프로세스 C 를 도시한 도 33 을 참조하자. 블록 3370 에서, 프로세스 C 는 선택된 프레임이 갑작스런 장면 전환으로서 분류되었는지 여부를 결정한다. 선택된 프레임이 갑작스런 장면 전환으로서 분류된 경우에, 블록 3371 에서, 현재 프레임은 갑작스런 장면 전환으로서 분류되고, 그 프레임을 I-프레임으로서 인코딩할 수 있으며, GOP 경계를 결정할 수 있다. 현재 프레임이 갑작스런 장면 전환으로서 분류되지 않은 경우에, 프로세스 C 는 블록 3372 로 진행하고, 현재 프레임이 느리게 변하는 장면의 일부로서 분류되는 경우에, 블록 3373 에서, 현재 프레임, 및 느리게 변하는 장면의 다른 프레임을 예측 프레임 (예를 들어, P-프레임 또는 B-프레임) 으로서 인코딩할 수 있다. 그 다음에, 프로세스 C 는 블록 3374 로 진행하여, 현재 프레임이 카메라 플래시라이트 를 포함하는 플래시라이트 장면으로서 분류되었는지 여부를 확인한다. 현재 프레임이 카메라 플래시라이트를 포함하는 플래시라이트 장면으로서 분류된 경우에, 블록 3375 에서, 특별한 처리, 예를 들어, 이전 프레임의 제거, 복제, 또는 그 프레임에 대한 특정 계수의 인코딩을 위해 프레임을 식별할 수 있다. 현재 프레임이 카메라 플래시라이트를 포함하는 플래시라이트 장면으로서 분류되지 않은 경우에, 현재 프레임에 대한 어떤 분류도 이루어지지 않고, 선택된 프레임을 다른 기준에 따라 인코딩할 수 있는데, I-프레임으로서 인코딩하거나 드롭할 수 있다. 프로세스 C 는 인코더에 구현될 수 있다.Reference is now made to FIG. 33, which shows a process C for encoding a video or determining an encoding parameter for a video based on a shot classification of a selected frame. In block 3370, process C determines whether the selected frame has been classified as a sudden scene change. If the selected frame is classified as a sudden scene transition, then at block 3371, the current frame is classified as a sudden scene transition, can encode that frame as an I-frame, and determine a GOP boundary. If the current frame is not classified as a sudden scene transition, process C proceeds to block 3372 and, if the current frame is classified as part of a slowly changing scene, then at block 3373, the current frame, and other frames of the slowly changing scene. Can be encoded as a predictive frame (eg, P-frame or B-frame). Process C then proceeds to block 3374 to check whether the current frame has been classified as a flashlight scene that includes a camera flashlight. If the current frame is classified as a flashlight scene that includes a camera flashlight, then at block 3375, the frame is identified for special processing, e.g., removing, duplicating, or encoding a specific coefficient for that frame. can do. If the current frame is not classified as a flashlight scene that contains camera flashlights, then no classification is done for the current frame, and the selected frame can be encoded according to different criteria, which can be encoded or dropped as an I-frame. Can be. Process C may be implemented in an encoder.

상술한 양태에 있어서, 압축될 프레임과 그 인접한 2 개 프레임 간의 차이 양은 프레임 차이 메트릭 D 에 의해 표시된다. 일방향 휘도 변화의 상당한 양이 검출되면, 그 프레임 내의 크로스-페이드 효과를 나타낸다. 크로스-페이드가 더 현저해지면, B 프레임을 이용하여 더 많은 이득이 달성될 수도 있다. 몇몇 양태에서, 수정된 프레임 차이 메트릭은 아래 식 (57) 에 도시된 것과 같이 이용된다.In the above aspect, the difference amount between the frame to be compressed and two adjacent frames thereof is indicated by the frame difference metric D. If a significant amount of one-way brightness change is detected, it indicates a cross-fade effect within that frame. As cross-fade becomes more pronounced, more gain may be achieved using B frames. In some aspects, the modified frame difference metric is used as shown in equation (57) below.

여기서,

및

은 각각 현재 프레임과 이전 프레임 간의 루마 차이이고, 현재 프레임과 다음 프레임 간의 루마 차이이고, Δ 는 정 규 실험에서 결정될 수 있는 상수 (구현예에 따라 다를 수 있음) 를 나타내고, α 는 0 과 1 사이의 값을 갖는 가중 변수이다.here,

And

Are respectively the luma difference between the current frame and the previous frame, the luma difference between the current frame and the next frame, respectively, Δ represents a constant (depending on the implementation) that can be determined in the regular experiment, and α is between 0 and 1 Weighted variable with the value of.

루마 변화의 일관된 경향이 관찰되며 그 변화 세기가 충분히 크면, 수정된 프레임 차이 메트릭 D ₁ 은 원래 프레임 차이 메트릭 D ₁ 과는 상이할 뿐이다. D ₁ 은 D 이하이다. 루마의 변화가 일정하면 (d _P = d _N), 수정된 프레임 차이 메트릭 D ₁ 은 (1-α) 의 최저 비율을 갖는 원래 프레임 차이 메트릭 D 보다 낮다.If a consistent trend of luma change is observed and the change intensity is large enough, the modified frame difference metric D ₁ is only different from the original frame difference metric D ₁ . D ₁ is D or less. If the change in luma is constant ( d _P = d _N ), the modified frame difference metric D ₁ is lower than the original frame difference metric D with the lowest ratio of (1-α).

아래의 테이블 1 은 갑작스런 장면 전환 검출을 부가함으로써 성능 개선을 나타낸다. NSC (non-scene-change) 및 SC (scene-change) 경우 모두에 있어서 I-프레임의 총 수는 대략 같다. NSC 경우에 있어서, I-프레임은 전체 시퀀스 사이에서 균일하게 분포되지만, SC 경우에 있어서, I-프레임은 갑작스런 장면 전환 프레임에 할당될 뿐이다.Table 1 below shows the performance improvement by adding a sudden scene change detection. In both the non-scene-change (NSC) and scene-change (SC) cases, the total number of I-frames is approximately equal. In the NSC case, the I-frames are distributed evenly among the entire sequence, but in the SC case, the I-frames are only assigned to sudden scene transition frames.

통상 0.2 ~ 0.3 dB 의 개선은 PSNR 단위로 달성될 수 있음을 알 수 있다. 시뮬레이션 결과는 샷 검출기가 상술한 샷 이벤트를 결정하는데 있어서 매우 정확하다는 것을 나타낸다. 정상적인 크로스-페이드 효과를 갖는 5 개의 클립의 시뮬레이션은 Δ = 5.5 및 α = 0.4 에서, 0.226031 dB 의 PSNR 이득이 동일 비트레이트로 달성됨을 나타낸다.It can be seen that an improvement of 0.2 to 0.3 dB can usually be achieved in PSNR units. Simulation results indicate that the shot detector is very accurate in determining the shot event described above. Simulation of five clips with normal cross-fade effect shows that at Δ = 5.5 and α = 0.4, a PSNR gain of 0.226031 dB is achieved with the same bitrate.

시퀀스/메트릭Sequence / metric 비트레이트(kbps)Bitrate (kbps) 평균 QPAverage QP PSNR (dB)PSNR (dB) 애니메이션 NSCAnime NSC 226.2403226.2403 31.169631.1696 35.642635.6426 애니메이션 SCAnimation SC 232.8023232.8023 29.817129.8171 36.451336.4513 음악 NSCMusic nsc 246.6394246.6394 32.852432.8524 35.933735.9337 음악 SCMusic sc 250.0994250.0994 32.320932.3209 36.120236.1202 헤드라인 NSCHeadline NSC 216.9493216.9493 29.830429.8304 38.980438.9804 헤드라인 뉴스 SCHeadline News SC 220.2512220.2512 28.901128.9011 39.315139.3151 농구 NSCBasketball NSC 256.8726256.8726 33.142933.1429 33.526233.5262 농구 SCBasketball sc 254.9242254.9242 32.434132.4341 33.863533.8635

테이블 1 : 갑작스런 장면 전환 검출의 시뮬레이션 결과Table 1: Simulation Results of Sudden Cutaway Detection

적응 GOP 구조Adaptive GOP Structure

이하, 적응 GOP 구조 동작의 예시적인 실시형태를 설명한다. 그러한 동작은 도 4 의 GOP 분할기 (412) 에 포함될 수 있다. 구형 비디오 압축 표준인 MPEG2 는, GOP 가 일정한 구조를 가질 것을 요구하지 않는다 (그러나, GOP 가 일정한 구조를 가질 것이 부과될 수 있음). MPEG2 시퀀스는, 이전 영상을 참조함 없이 인코딩된 I 프레임으로 항상 시작한다. 통상, MPEG2 GOP 포맷은 I 프레임 다음에 오는 P 또는 예측 영상의 GOP 에서 간격을 고정함으로써 인코더에서 미리 배열된다. P 프레임은 이전 I 또는 P 영상으로부터 부분적으로 예측된 영상을 말한다. 시작하는 I 프레임과 후속하는 P 프레임 간의 프레임은 B 프레임으로서 인코딩된다. "B" 프레임 (B 는 양방향을 나타냄) 은 이전 및 다음 I 또는 P 영상을 개별적으로 또는 동시에 참조로서 이용할 수 있다. 평균적으로 I-프레임을 인코딩하는데 이용되는 비트 수는 P-프레임을 인코딩하는데 이용되는 비트 수를 초과하고, 이와 마찬가지로, 평균적으로 P-프레임을 인코딩하는데 이용되는 비트 수는 B-프레임을 인코딩하는데 이용되는 비트 수를 초과한다. 스킵된 프레임은, 이용되더라도, 그 표현을 위해 어떤 비트도 이용하지 않을 수도 있다.An exemplary embodiment of the adaptive GOP structure operation is described below. Such an operation may be included in the GOP divider 412 of FIG. 4. The older video compression standard MPEG2 does not require a GOP to have a constant structure (but it can be imposed that a GOP will have a constant structure). The MPEG2 sequence always starts with an I frame encoded without reference to the previous picture. Normally, the MPEG2 GOP format is prearranged in the encoder by fixing the interval in P following the I frame or in the GOP of the predictive picture. P frame refers to an image partially predicted from a previous I or P image. The frame between the starting I frame and the subsequent P frame is encoded as a B frame. A "B" frame (B stands for bidirectional) can use the previous and next I or P pictures individually or simultaneously as a reference. On average, the number of bits used to encode an I-frame exceeds the number of bits used to encode a P-frame, and likewise, on average, the number of bits used to encode a P-frame is used to encode B-frames. Exceeds the number of bits. The skipped frame, even if used, may not use any bits for its representation.

보다 최신의 압축 알고리즘에서 P-프레임 및 B-프레임을 이용하는 일 이점으로는, 프레임을 스킵함으로써 비디오 송신 크기를 줄일 수 있다는 것이다. 시간 리던던시가 높으면, 예를 들어, 영상 간에 변화가 거의 없으면, P, B 또는 스킵된 영상을 이용하여 비디오 스트림을 효율적으로 표현하는데, 그 이유는, 먼저 디 코딩된 I 또는 P 영상을 추후에 기준으로 이용하여 다른 P 또는 B 영상을 디코딩하기 때문이다.One advantage of using P-frames and B-frames in more recent compression algorithms is that video transmission size can be reduced by skipping frames. If the time redundancy is high, for example, if there is little change between images, the video stream is efficiently represented using P, B, or skipped images, since the first decoded I or P image is later referred to. This is because another P or B image is decoded by using.

GOP (group of picture) 분할기는 프레임을 적응 인코딩하여 시간 리던던시를 최소화한다. 프레임 간의 차이가 정량화되고, 정량화된 차이에 대해 적합한 테스트가 수행된 후에 I, P, B 또는 스킵된 프레임에 의해 영상을 표현하려는 결정이 자동으로 행해진다. GOP 분할기에서의 처리는 전처리기 (202) 의 다른 동작에 의해 도움을 받아서, 노이즈 제거를 위한 필터링을 제공한다.A group of picture (GOP) divider adaptively encodes frames to minimize time redundancy. The difference between the frames is quantified and a decision is made to represent the image by I, P, B or skipped frames after an appropriate test is performed on the quantified differences. Processing at the GOP divider is aided by other operations of the preprocessor 202 to provide filtering for noise removal.

적응 인코딩 프로세스는 "고정된" 인코딩 프로세스에서 이용 가능하지 않은 이점을 갖는다. 고정된 프로세스는 콘텐츠에서 작은 변화가 일어난 가능성을 무시하지만, 적응 절차는 훨씬 더 많은 B 프레임이 각각의 I 프레임과 P 프레임 사이에 또는 2 개의 P 프레임 사이에 삽입되는 것을 허용함으로써, 프레임 시퀀스를 충분히 표현하는데 이용되는 비트 수를 줄이게 된다. 한편, 예를 들어, 고정된 인코딩 프로세스에 있어서, 비디오 콘텐츠의 변화가 상당하면, 예측된 프레임과 기준 프레임 간의 차이가 너무 크기 때문에, P 프레임의 효율은 크게 감소한다. 이들 조건 하에서, 매칭하는 대상이 모션 검색 영역 밖으로 벗어날 수도 있고, 매칭하는 대상 간의 유사도가 카메라 앵글의 변화로 인한 왜곡때문에 감소한다. 적응 인코딩 프로세스는 P 프레임이 인코딩되어야 할 때를 최적으로 결정하는데 이용될 수도 있는 이점을 갖는다.The adaptive encoding process has the advantage that it is not available in the "fixed" encoding process. The fixed process ignores the possibility of small changes in the content, but the adaptation procedure allows for far more B frames to be inserted between each I frame and P frame, or between two P frames, thereby providing sufficient frame sequence. This reduces the number of bits used to represent. On the other hand, for example, in a fixed encoding process, if the change in the video content is significant, the efficiency of the P frame is greatly reduced because the difference between the predicted frame and the reference frame is too large. Under these conditions, the matching object may be out of the motion search area, and the similarity between the matching objects is reduced due to the distortion due to the change in the camera angle. The adaptive encoding process has the advantage that it may be used to optimally determine when a P frame should be encoded.

본원에 개시된 시스템에 있어서, 상술한 조건의 타입은 자동으로 감지된다. 본원에 설명된 적응 인코딩 프로세스는 유연하며 이들 콘텐츠의 변화에 적응하 도록 만들어진다. 적응 인코딩 프로세스는 프레임 차이 메트릭을 평가하고, 이 프레임 차이 메트릭은 동일한 부가적인 거리 특성을 갖는 프레임 간의 거리의 측정으로서 간주될 수 있다. 개념적으로는, 프레임 F₁, F₂ 및 F₃ 이 프레임 간 거리 d₁₂ 및 d₂₃ 을 가지면, F₁ 과 F₃ 간의 거리는 적어도 d₁₂ + d₂₃ 인 것으로 간주된다. 프레임 할당은 거리 유사 (distance-like) 메트릭 및 다른 측정치에 기초하여 이루어진다.In the system disclosed herein, the type of condition described above is automatically detected. The adaptive encoding process described herein is flexible and is made to adapt to changes in these contents. The adaptive encoding process evaluates a frame difference metric, which can be considered as a measure of the distance between frames with the same additional distance characteristic. Conceptually, if frames F ₁ , F ₂ and F ₃ have interframe distances d ₁₂ and d ₂₃ , the distance between F ₁ and F ₃ is considered to be at least d ₁₂ + d ₂₃ . Frame allocation is made based on distance-like metrics and other measurements.

GOP 분할기 (412) 는 수신 시 프레임에 영상 타입을 할당함으로써 동작한다. 영상 타입은 각각의 블록을 코딩하는데 이용될 수도 있는 예측 방법을 나타낸다.GOP divider 412 operates by assigning an image type to a frame upon receipt. The picture type represents a prediction method that may be used to code each block.

I-영상은 다른 영상을 참조함 없이 코딩된다. I-영상은 독립형이기 때문에 디코딩이 시작할 수 있는 데이터 스트림 내에 액세스 포인트를 제공한다. I 인코딩 타입은, 그 이전 프레임까지의 "거리" 가 장면 전환 임계값을 초과하는 경우에 프레임에 할당된다.I-pictures are coded without reference to other pictures. Since I-pictures are standalone, they provide access points within the data stream where decoding can begin. The I encoding type is assigned to a frame if the "distance" to the previous frame exceeds the scene transition threshold.

P-영상은 모션 보상된 예측을 위해 이전 I 또는 P 영상을 이용할 수 있다. P-영상은 인코딩을 위한 기준으로서 예측되는 블록으로부터 옮겨질 수도 있는 이전 필드 또는 프레임 내의 블록을 이용한다. 고려 중인 블록에서 기준 블록을 뺀 다음에, 통상, 공간 중복의 제거를 위해 이산 코사인 변환을 이용하여 나머지 블록을 인코딩한다. P 프레임이 되도록 할당된 마지막 프레임과 일 프레임 간의 "거리" 가 제 2 임계값 (제 2 임계값은 통상 제 1 임계값보다 작음) 을 초과하 는 경우에, 그 프레임에 P 인코딩 타입이 할당된다.The P-picture may use the previous I or P picture for motion compensated prediction. The P-picture uses a block in the previous field or frame that may be moved from the predicted block as a reference for encoding. After subtracting the reference block from the block under consideration, typically, the residual block is encoded using a discrete cosine transform to remove spatial redundancy. If the "distance" between the last frame and one frame assigned to be a P frame exceeds a second threshold (the second threshold is typically less than the first threshold), then the P encoding type is assigned to that frame. .

상술한 것과 같이, B-프레임 영상은 모션 보상을 위해 이전 및 다음 P-영상 또는 I-영상을 이용할 수 있다. B 영상 내의 블록은 순방향, 역방향 또는 양방향 예측될 수 있고, 또는 다른 프레임을 참조함 없이 인트라-코딩될 수 있다. H.264 에 있어서, 기준 블록은 다수의 프레임 중에서 32 개 블록의 선형 조합일 수 있다. 프레임이 I 또는 P 타입이 되도록 할당될 수 없으면, 그 바로 이전 프레임까지의 "거리" 가 제 3 임계값 (통상, 제 2 임계값보다 작음) 보다 큰 경우에, B 타입이 되도록 할당된다. 프레임이 인코딩된 B 타입이 되도록 할당될 수 없으면, "스킵 프레임" 상태에 할당된다. 이러한 프레임은 이전 프레임의 사실상 복제본이므로 스킵될 수 있다.As described above, the B-frame picture may use the previous and next P-picture or I-picture for motion compensation. Blocks in a B picture may be forward, backward or bi-predicted, or may be intra-coded without reference to other frames. For H.264, the reference block can be a linear combination of 32 blocks out of multiple frames. If the frame cannot be assigned to be of type I or P, it is allocated to be of type B if the "distance" to the immediately preceding frame is greater than the third threshold (usually less than the second threshold). If the frame cannot be allocated to be an encoded B type, it is allocated to a "skip frame" state. These frames can be skipped because they are virtual copies of the previous frame.

인접 프레임 간의 차이를 디스플레이 순서로 정량화하는 메트릭을 평가하는 것은 GOP 분할기 (412) 에서 일어나는 이러한 처리의 제 1 부분이다. 이러한 메트릭은 상기 참조되는 거리이고, 그 메트릭을 이용하여, 모든 프레임은 그 적절한 타입을 찾기 위해 평가된다. 이와 같이, I 프레임과 인접한 P 프레임, 또는 2 개의 연속적인 P 프레임 간의 간격은 변할 수 있다. 메트릭의 계산은 블록 기반 모션 보상기를 이용하여 비디오 프레임을 처리함으로써 시작하고, 일 블록은 비디오 압축의 기본 단위로서, 16 × 16 화소로 보통 이루어지지만, 예를 들어 8 × 8, 4 × 4 및 8 × 16 과 같은 다른 블록 크기도 가능하다. 출력에 존재하는 2 개의 인터레이싱된 필드로 이루어진 프레임의 경우에, 모션 보상은 필드 단위로 이루어지고, 기준 블록에 대한 검색은 프레임보다는 필드에서 일어난다. 현 재 프레임의 제 1 필드 내의 블록의 경우에, 순방향 기준 블록은 그 다음에 오는 프레임의 필드 내에서 발견되고, 또한 역방향 기준 블록은 현재 필드 바로 전에 있는 프레임의 필드 내에서 발견된다. 현재 블록은 보상된 필드 내에 모아진다. 그 프로세스는 프레임의 제 2 필드에서 계속된다. 2 개의 보상된 필드가 결합되어, 순방향 및 역방향 보상된 프레임을 형성한다.Evaluating a metric that quantifies the difference between adjacent frames in display order is the first part of this process that occurs in GOP divider 412. This metric is the distance referenced above, and using that metric, every frame is evaluated to find its proper type. As such, the spacing between an I frame and an adjacent P frame, or two consecutive P frames, may vary. The calculation of the metric begins by processing video frames using block-based motion compensators, one block being the basic unit of video compression, usually consisting of 16 × 16 pixels, but for example 8 × 8, 4 × 4 and 8 Other block sizes, such as × 16, are possible. In the case of a frame consisting of two interlaced fields present in the output, motion compensation is done in units of fields, and the search for the reference block occurs in the field rather than in the frame. In the case of a block in the first field of the current frame, the forward reference block is found in the field of the next frame, and also the backward reference block is found in the field of the frame immediately before the current field. The current block is collected in the compensated field. The process continues in the second field of the frame. Two compensated fields are combined to form a forward and reverse compensated frame.

역 텔레시네 (406) 에서 생성된 프레임의 경우에, 단지 복원된 필름 프레임만이 생성되므로, 기준 블록에 대한 검색은 프레임 단위로만 이루어 질 수도 있다. 또한, 2 개의 기존 블록과 2 개의 차이 (순방향 및 역방향) 를 구하여, 순방향 및 역방향 보상된 프레임을 생성한다. 요컨대, 모션 보상기는 모든 블록에 대해 모션 벡터 및 차이 메트릭을 생성한다. 순방향 차이가 평가되는지 역방향 차이가 평가되는지 여부에 따라, 이전 필드나 프레임 또는 그 바로 다음에 오는 필드나 프레임에서, 현재 고려 중인 필드나 프레임 내의 블록과 그와 가장 잘 매칭하는 블록 사이에서 메트릭의 차이가 평가된다는 것에 주목하자. 단지 휘도 값만이 이 계산에 포함된다.In the case of the frame generated in the inverse telecine 406, since only the reconstructed film frame is generated, the search for the reference block may be made frame by frame. In addition, two differences (forward and backward) from two existing blocks are obtained to generate forward and backward compensated frames. In sum, the motion compensator generates motion vectors and difference metrics for every block. Depending on whether the forward or backward difference is evaluated, the difference in the metric between the block in the field or frame that is currently under consideration and the block that best matches it, in the previous field or frame, or the field or frame immediately following it. Note that is evaluated. Only luminance values are included in this calculation.

이와 같이, 모션 보상 단계는 2 가지 세트의 차이를 생성한다. 이들은 시간상 현재 프레임의 바로 앞 프레임과 바로 이전 프레임에 있는 프레임들로부터 선택된 기준 블록 내의 휘도 값들과 현재 휘도 값들의 블록 사이에 존재한다. 각각의 순방향 및 각각의 역방향 차이의 절대 값은 블록 내의 각 화소에 대해 결정되고, 그 각각은 전체 프레임에 걸쳐 개별적으로 합산된다. 프레임을 포함하는 디인터레이싱된 NTSC 필드가 처리될 때에, 양쪽 필드는 2 개의 덧셈에 포함된다. 이러한 방법으로, SAD_P 및 SAD_N, 즉, 순방향 및 역방향 차이의 합산된 절대 값이 구해진다.As such, the motion compensation step produces two sets of differences. They are present in time between the blocks of current luminance values and the luminance values in the reference block selected from the frames immediately preceding the current frame and the frames in the immediately preceding frame. The absolute value of each forward and each reverse difference is determined for each pixel in the block, each of which is summed separately over the entire frame. When a deinterlaced NTSC field containing a frame is processed, both fields are included in two additions. In this way, SAD _P and SAD _N , i.e., the sum of the absolute values of the forward and reverse differences are obtained.

모든 프레임에 대해, SAD 비율은 다음 관계식을 이용하여 계산된다.For all frames, the SAD ratio is calculated using the following relationship.

여기서, SAD_P 및 SAD_N 은 각각 순방향 및 역방향 차이의 합산된 절대 값이다. 작은 양의 수 ε 가 분자에 더해져서, "0 으로 나누는" 에러를 방지한다. 유사한 항 ε 가 분모에 더해져서, 또한, SAD_P 또는 SAD_N 이 0 에 가까워지는 경우에,

의 감도를 줄인다.Where SAD _P and SAD _N are the summed absolute values of the forward and reverse differences, respectively. A small amount ε is added to the molecule to prevent a "divide by zero" error. If a similar term ε is added to the denominator and SAD _P or SAD _N approaches zero,

Decrease the sensitivity of

다른 양태에서, 그 차이는 SSD (sum of squared difference), SAD (sum of absolute difference), 또는 SATD 일 수 있고, 화소 값의 블록은 블록 요소 내의 차이를 취하기 전에 2 차원 이산 코사인 변환을 적용함으로써 변환된다. 그 합들이 액티브 비디오의 영역에 걸쳐 평가되지만, 다른 양태에서는 보다 작은 영역이 이용될 수도 있다.In another aspect, the difference may be a sum of squared difference (SSD), sum of absolute difference (SAD), or SATD, and the block of pixel values is transformed by applying a two-dimensional discrete cosine transform before taking the difference in the block elements. do. The sums are evaluated over an area of active video, but in other aspects a smaller area may be used.

또한, 수신된 것과 같은 (모션 보상되지 않은) 모든 프레임의 휘도 히스토그램이 계산된다. 휘도 히스토그램은, 이용 가능한 경우에 2 차원 이산 코사인 변환을 휘도 값 블록에 적용한 결과인 16 × 16 계수 어레이에서, DC 계수, 즉, (0,0) 계수에 대해 작용한다. 이에 대응하여, 16 × 16 블록 내의 256 개의 휘도 값의 평균 값이 휘도 히스토그램에서 이용될 수도 있다. 휘도 깊이가 8 비 트인 이미지에 대해, 빈의 수는 16 으로 설정된다. 다음 메트릭은 히스토그램 차이를 계산한다.In addition, luminance histograms of all frames (not motion compensated) as received are calculated. The luminance histogram acts on the DC coefficients, i. Correspondingly, an average value of 256 luminance values in a 16x16 block may be used in the luminance histogram. For an image with 8 bits of luminance depth, the number of bins is set to 16. The following metric calculates the histogram difference.

상기와 같이,

는 i 번째 빈에서 이전 프레임으로부터의 블록 수이고,

는 i 번째 빈에 속하는 현재 프레임으로부터의 블록 수이며, N 은 프레임 내의 전체 블록 수이다.As above,

Is the number of blocks from the previous frame in the i th bin,

Is the number of blocks from the current frame belonging to the i th bin, and N is the total number of blocks in the frame.

이들 중간 결과를 모아서, 다음과 같이 현재 프레임 차이를 형성한다.These intermediate results are collected to form the current frame difference as follows.

여기서,

는 현재 프레임에 기초한 SAD 비율이고,

는 이전 프레임에 기초한 SAD 비율이다. 장면이 평활한 모션을 가지며 그 루마 히스토그램이 거의 변하지 않으면,

이다. 현재 프레임이 갑작스런 장면 전환을 표시하면,

는 커질 것이고,

는 작아야 한다.

단독 대신에, 비율

를 이용하므로, 문맥의 액티비티 레벨로 메트릭이 정규화된다.here,

Is the SAD rate based on the current frame,

Is the SAD ratio based on the previous frame. If the scene has smooth motion and the luma histogram hardly changes,

to be. If the current frame shows a sudden transition,

Will grow,

Should be small.

Instead of alone, the rate

Because we use, the metric is normalized to the activity level of the context.

도 40 의 데이터흐름 (4100) 은 프레임 차이 메트릭을 계산하는데 이용될 수도 있는 일정 컴포넌트를 도시한다. 전처리기 (4125) 는 NTSC 소스를 갖는 비디오의 경우에 인터레이싱된 필드를 전달하고, 비디오 소스가 양방향 모션 보상기 (4133) 에 대한 역 텔레시네의 결과인 경우에 필름 이미지의 프레임을 전달한다. 양방향 모션 보상기 (4133) 는 16 × 16 화소의 블록으로 분할하고 각각의 블록을 이전 프레임 필드의 정해진 영역 내의 모든 16 × 16 블록과 비교함으로써 필드 (또는 영화의 비디오 소스의 경우에는 프레임) 에 대해 작용한다. 최적의 매치를 제공하는 블록이 선택되어, 현재 블록에서 제거된다. 그 차이의 절대 값이 취해지고, 그 결과는 현재 블록을 포함하는 256 개의 화소에 걸쳐 더해진다. 이러한 절차가 필드의 모든 현재 블록에 대해 행해진 다음에 양쪽 필드에 대해 행해지면, 양 SAD_N, 즉, 역방향 차이 메트릭이 역방향 차이 모듈 (4137) 에 의해 계산된다. 유사한 절차가 순방향 차이 메트릭 (4136) 에 의해 수행될 수도 있다. 순방향 차이 모듈 (4136) 은 시간상 현재 프레임의 바로 앞에 있는 프레임을 기준 블록 소스로서 이용하여 SAD_P, 즉, 순방향 차이 메트릭을 전개한다. 동일한 추정 프로세스는, 복원된 필름 이미지를 이용하여 수행되더라도, 역 텔레시네에서 입력 프레임이 형성될 때에 일어난다. 프레임 차이 메트릭의 계산을 완료하는데 이용될 수 있는 히스토그램은 히스토그램 차이 모듈 (4141) 에 형성될 수도 있다. 각각의 16 × 16 블록은 그 휘도 값의 평균에 기초하여 빈에 할당된다. 이러한 정보는 모든 256 개의 화소 휘도 값을 일 블록 내에 함께 더하고, 필요에 따라 256 으로 정규화하고, 평균 값이 배치된 빈의 카운트를 증분함으로써 형성된다. 그 계산은 각각의 프리모션 보상된 프레임에 대해 한 번 행해지고, 현재 프레임에 대한 히스토그램은 새로운 현재 프레임이 도착할 때에 이전 프레임에 대한 히스토그램이 된다. 히스토그램 차이 모듈 (4141) 에서 블록 수로 2 개의 히스토그램을 차분 및 정규화하여 식 (59) 에 정의된 λ 를 형성한다. 이들 결과는 프레임 차이 결합기 (4143) 에서 결합되고, 이 프레임 차이 결합기 (4143) 는 히스토그램 차이 모듈 (4139), 순방향 및 역방향 차이 모듈 (4136 및 4136) 에서 구한 중간 결과를 이용하여 식 (60) 에 정의된 현재 프레임 차이를 평가한다.The dataflow 4100 of FIG. 40 illustrates certain components that may be used to calculate the frame difference metric. Preprocessor 4125 delivers the interlaced fields in the case of video with an NTSC source, and delivers frames of the film image when the video source is the result of inverse telecine to bidirectional motion compensator 4133. Bi-directional motion compensator 4133 divides into blocks of 16 by 16 pixels and operates on the field (or frame in the case of a video source of a movie) by comparing each block with all 16 by 16 blocks within a given area of the previous frame field. do. The block providing the best match is selected and removed from the current block. The absolute value of the difference is taken and the result is added over 256 pixels containing the current block. If this procedure is performed for all current blocks of the field and then for both fields, then both SAD _N , ie, reverse difference metrics, are calculated by the reverse difference module 4137. Similar procedures may be performed by the forward difference metric 4136. The forward difference module 4136 uses the frame immediately preceding the current frame in time as the reference block source to develop SAD _P , i.e., the forward difference metric. The same estimation process occurs when an input frame is formed in inverse telecine, even if performed using the reconstructed film image. A histogram that may be used to complete the calculation of the frame difference metric may be formed in the histogram difference module 4141. Each 16x16 block is assigned to a bin based on the average of its luminance values. This information is formed by adding all 256 pixel luminance values together in one block, normalizing to 256 as needed, and incrementing the count of bins where the average value is placed. The calculation is done once for each premotion compensated frame, and the histogram for the current frame becomes the histogram for the previous frame when a new current frame arrives. In histogram difference module 4141, the two histograms are differentially and normalized by the number of blocks to form [lambda] defined in equation (59). These results are combined in a frame difference combiner 4143, which is obtained in equation (60) using intermediate results obtained from histogram difference module 4139, forward and reverse difference modules 4136 and 4136. Evaluate the defined current frame difference.

흐름도 (4100) 의 시스템 및 그 컴포넌트 또는 단계들은 하드웨어, 소프트웨어, 펌웨어, 미들웨어, 마이크로코드, 또는 그 임의의 조합에 의해 구현될 수 있다. 전처리기 (4125), 양방향 모션 보상기 (4133), 순방향 및 역방향 차이 메트릭 모듈 (4136 및 4137), 히스토그램 차이 모듈 (4141), 및 프레임 차이 메트릭 결합기 (4143) 를 비롯한, 흐름도 (4100) 의 각 기능 컴포넌트는 독립형 컴포넌트로서 구현되거나, 다른 디바이스의 컴포넌트 내에 하드웨어, 펌웨어, 미들웨어로서 통합되거나, 프로세서 상에서 실행되는 마이크로코드나 소프트웨어에 구현되거나, 또는 그 조합일 수도 있다. 소프트웨어, 펌웨어, 미들웨어 또는 마이크로코드에 구현 시, 원하는 작업을 수행하는 프로그램 코드 또는 코드 세그먼트는 저장 매체와 같은 머신 판독가능 매체에 저장될 수도 있다. 코드 세그먼트는 절차, 함수, 서브프로그램, 프로그램, 루틴, 서브루틴, 모듈, 소프트웨어 패키지, 클래스, 또는 명령들, 데이터 구조들 또는 프로그램 문장들의 임의의 조합을 나타낼 수도 있다. 코드 세그먼트는 정보, 데이터, 인수 (argument), 파라미터 또는 메모리 콘텐츠를 전달하고/하거나 수신함으로써 다른 코드 세그먼트 또는 하드웨어 회로에 연결될 수도 있다.The system of the flowchart 4100 and its components or steps may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. Each function of the flowchart 4100, including the preprocessor 4125, the bidirectional motion compensator 4133, the forward and reverse difference metric modules 4136 and 4137, the histogram difference module 4141, and the frame difference metric combiner 4143. A component may be implemented as a standalone component, integrated as hardware, firmware, middleware in a component of another device, implemented in microcode or software running on a processor, or a combination thereof. When implemented in software, firmware, middleware, or microcode, program code or code segments that perform a desired task may be stored on a machine-readable medium, such as a storage medium. A code segment may represent a procedure, function, subprogram, program, routine, subroutine, module, software package, class, or any combination of instructions, data structures, or program statements. Code segments may be coupled to other code segments or hardware circuits by communicating and / or receiving information, data, arguments, parameters or memory contents.

수신되어 처리된 데이터는 프로세서에 접속된 예를 들어, 칩 구성된 저장 매 체 (예를 들어, ROM, RAM) 또는 디스크 타입 저장 매체 (예를 들어, 자기식 또는 광학식) 를 포함할 수 있다. 몇몇 양태에서, 결합기 (4143) 는 저장 매체의 일부 또는 전부를 포함할 수 있는 저장 매체에 저장될 수 있다. 도 41 의 흐름도 (4200) 는 프레임에 압축 타입을 할당하는 프로세스를 도시한다. 일 양태에서, 식 (3) 에서 정해진 현재 프레임 차이는 프레임 할당과 관련하여 이루어진 모든 결정에 대한 기초가 된다. 결정 블록 4253 이 나타내는 바와 같이, 고려 중인 프레임이 시퀀스 내의 첫 번째 프레임이면, "예" 로 표시된 결정 경로는 블록 4255 로 진행하여, 그 프레임이 I 프레임인 것으로 선언한다. 블록 4257 에서 누산된 프레임 차이는 0 으로 설정되고, 프로세스는 (블록 4258 에서) 시작 블록 4253 으로 리턴한다. 고려 중인 프레임이 시퀀스 내의 첫 번째 프레임이 아니면, "아니오" 로 표시된 경로는 결정이 이루어진 블록 4253 으로부터 진행하여, 테스트 블록 4259 에서 현재 프레임 차이를 장면 전환 임계값에 대해 테스트한다. 현재 프레임 차이가 그 임계값보다 크면, "예" 로 표시된 결정 경로는 블록 4255 로 진행하여, 다시 I-프레임을 할당하게 된다. 현재 프레임 차이가 장면 전환 임계값보다 작으면, "아니오" 경로는 블록 4261 로 진행하여, 현재 프레임 차이를 누산된 프레임 차이에 더한다.The received and processed data may include, for example, a chip-configured storage medium (eg, ROM, RAM) or a disk type storage medium (eg, magnetic or optical) connected to the processor. In some aspects, combiner 4143 can be stored in a storage medium that can include some or all of the storage medium. Flowchart 4200 of FIG. 41 shows a process for assigning a compression type to a frame. In one aspect, the current frame difference defined in equation (3) is the basis for all decisions made with respect to frame allocation. As indicated by decision block 4253, if the frame under consideration is the first frame in the sequence, the decision path marked “Yes” proceeds to block 4255, declaring that the frame is an I frame. The accumulated frame difference at block 4257 is set to zero, and the process returns to starting block 4253 (at block 4258). If the frame under consideration is not the first frame in the sequence, the path marked “No” proceeds from block 4253 where the determination is made, testing the current frame difference against the scene transition threshold in test block 4259. If the current frame difference is greater than its threshold, the decision path marked "Yes" proceeds to block 4255, again allocating an I-frame. If the current frame difference is less than the scene transition threshold, then the "no" path proceeds to block 4261, adding the current frame difference to the accumulated frame difference.

흐름도를 통해 계속 진행하여, 결정 블록 4263 에서, 누산된 프레임 차이를 임계값 t 와 비교하는데, 이 임계값 t 는 일반적으로 장면 전환 임계값보다 작다. 누산된 프레임 차이가 t 보다 크면, 제어는 블록 4265 로 이동하여, 프레임이 P 프레임이 되도록 할당되고, 그 다음에, 단계 4267 에서 누산된 프레임 차이가 리셋 된다. 누산된 프레임 차이가 t 보다 작으면, 제어는 블록 4263 에서 블록 4269 로 이동한다. 블록 4269 에서 현재 프레임 차이가 τ 와 비교되는데, 이 τ 는 t 보다 작다. 현재 프레임 차이가 τ 보다 작으면, 블록 4273 에서 그 프레임은 스킵되고, 현재 프레임 차이가 τ 보다 크면, 그 프레임은 β 가 되도록 할당된다.Continuing through the flowchart, at decision block 4263, the accumulated frame difference is compared with a threshold t, which is typically less than the scene transition threshold. If the accumulated frame difference is greater than t, control moves to block 4265 where the frame is allocated to be a P frame, and then the accumulated frame difference is reset in step 4267. If the accumulated frame difference is less than t, control moves from block 4263 to block 4269. In block 4269 the current frame difference is compared to τ, which is less than t. If the current frame difference is less than τ, the frame is skipped at block 4273, and if the current frame difference is greater than τ, the frame is allocated to be β.

다른 양태에서, 다른 프레임 인코딩 복잡도 표시자

는 다음과 같이 정의된다.In another aspect, another frame encoding complexity indicator

Is defined as

여기서, α 는 스케일러이고, SAD _P 는 순방향 모션 정보를 갖는 SAD 이고, MV _P 는 순방향 모션 보상으로부터 모션 벡터의 화소에서 측정된 길이의 합이고, s 및 m 은 SAD _P 가 s 보다 낮거나 MV _P 가 m 보다 낮은 경우에 프레임 인코딩 복잡도 표시자를 0 으로 만드는 2 개의 임계값 숫자이다.

는 도 41 의 흐름도 (4200) 에서 현재 프레임 차이 대신에 이용될 것이다. 알 수 있는 바와 같이, 순방향 모션 보상이 낮은 레벨의 움직임을 나타내는 경우에만,

은 M 과 상이하다. 이 경우에, M 은 M 보다 작다.Here, α is a scalar, and, SAD _P is the SAD with forward motion information, MV _P is the sum of the measured lengths in the pixels of the motion vector from the forward motion compensation, s and m is lower than the SAD _P s or MV _P Is two threshold numbers that make the frame encoding complexity indicator zero when m is less than m .

May be used instead of the current frame difference in the flowchart 4200 of FIG. As can be seen, only if forward motion compensation exhibits low levels of motion,

Is different from M. In this case, M is less than M.

본원에 설명된 샷 검출 및 인코딩 양태는 플로차트, 흐름도, 구조도, 또는 블록도로서 도시되는 프로세스로서 설명될 수도 있다. 도면에 도시된 플로차트가 순차 프로세스로서 동작을 설명할 수도 있지만, 다수의 동작은 병렬로 또는 동 시에 수행될 수 있다. 또한, 동작 순서가 재배열될 수도 있다. 통상, 일 프로세스는 그 동작이 완료된 때에 종료한다. 일 프로세스는 메서드, 함수, 절차, 서브루틴, 서브프로그램 등에 대응할 수도 있다. 프로세스가 함수에 대응하는 경우에, 그 종료는 호출 함수 또는 메인 함수로 그 함수가 리턴하는 것에 대응한다.The shot detection and encoding aspects described herein may be described as a process depicted as a flowchart, flow chart, structure diagram, or block diagram. Although the flowchart depicted in the figures may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of operation may be rearranged. Normally, one process ends when the operation is completed. One process may correspond to a method, function, procedure, subroutine, subprogram, or the like. If the process corresponds to a function, the termination corresponds to what the function returns to the calling or main function.

또한, 당업자라면, 본원에 개시된 디바이스의 하나 이상의 요소가 그 디바이스의 동작에 영향을 주지 않으면서 재배열될 수도 있다는 것을 알 수 있다. 이와 유사하게, 본원에 개시된 디바이스의 하나 이상의 요소는 그 디바이스의 동작에 영향을 주지 않으면서 결합될 수도 있다. 당업자라면, 여러 상이한 테크놀러지와 기술 중 임의의 것을 이용하여 정보 및 멀티미디어 데이터를 표현할 수도 있다는 것을 알 수 있다. 또한, 당업자라면, 본원에 개시된 예와 관련하여 설명된 여러 예시적인 논리 블록, 모듈, 및 알고리즘 단계가 전자 하드웨어, 펌웨어, 컴퓨터 소프트웨어, 미들웨어, 마이크로코드, 또는 그 조합으로서 구현될 수도 있다는 것을 알 수 있다. 하드웨어와 소프트웨어의 호환성을 명확히 설명하기 위해, 여러 예시적인 컴포넌트, 블록, 모듈, 회로, 및 단계를 일반적으로 그 기능면에서 상술하였다. 그러한 기능이 하드웨어 또는 소프트웨어로 구현되는지 여부는 특정 애플리케이션 및 전체 시스템에 부과된 설계 제약에 따라 다르다. 당업자라면, 각각의 특정 애플리케이션에 대해 설명된 기능을 다양한 방법으로 구현할 수도 있지만, 그러한 구현 결정은 개시된 방법의 범위로부터 일탈하는 것으로서 해석되지 않아야 한다.In addition, one of ordinary skill in the art appreciates that one or more elements of a device disclosed herein may be rearranged without affecting the operation of the device. Similarly, one or more elements of a device disclosed herein may be combined without affecting the operation of the device. Those skilled in the art will appreciate that any of a variety of different technologies and techniques may be used to represent information and multimedia data. In addition, those skilled in the art will recognize that the various illustrative logical blocks, modules, and algorithm steps described in connection with the examples disclosed herein may be implemented as electronic hardware, firmware, computer software, middleware, microcode, or combinations thereof. have. To clearly illustrate the compatibility of hardware with software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed methods.

예를 들어, 본원에 개시된 샷 검출과 인코딩 예 및 도면과 관련하여 설명된 방법 또는 알고리즘의 단계는 하드웨어에 직접, 프로세서에 의해 실행되는 소프트웨어 모듈에, 또는 양자의 조합으로 구현될 수도 있다. 그 방법 및 알고리즘은 휴대 전화기, 컴퓨터, 랩톱 컴퓨터, PDA 및 모든 타입의 개인 및 비즈니스용 통신 디바이스로 비디오를 무선 송신하는 것을 비롯한 통신 기술에 특히 적용 가능하다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈식 디스크, CD-ROM, 또는 당해 기술분야에서 공지된 임의의 다른 형태의 저장 매체에 상주할 수도 있다. 예시적인 저장 매체는 프로세서에 연결되어, 프로세서가 저장 매체로부터 정보를 판독하거나 저장 매체에 정보를 기록할 수 있도록 한다. 다른 방법으로는, 저장 매체는 프로세서와 일체로 형성될 수도 있다. 프로세서 및 저장 매체는 ASIC (Application Specific Integrated Circuit) 에 상주할 수도 있다. ASIC 은 무선 모뎀에 상주할 수도 있다. 다른 방법으로는, 프로세서와 저장 매체는 무선 모뎀에 별도의 컴포넌트로서 상주할 수도 있다.For example, the steps of a method or algorithm described in connection with the shot detection and encoding examples and figures disclosed herein may be implemented directly in hardware, in a software module executed by a processor, or in a combination of the two. The methods and algorithms are particularly applicable to communication technologies, including wireless transmission of video to cellular telephones, computers, laptop computers, PDAs, and all types of personal and business communications devices. The software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other type of storage medium known in the art. An exemplary storage medium is coupled to the processor to enable the processor to read information from or write information to the storage medium. In the alternative, the storage medium may be integrally formed with the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). The ASIC may reside in a wireless modem. In the alternative, the processor and the storage medium may reside as discrete components in a wireless modem.

또한, 본원에 개시된 예와 관련하여 설명된 여러 예시적인 논리 블록, 컴포넌트, 모듈, 및 회로는 범용 프로세서, DSP (digital signal processor), ASIC (application specific integrated circuit), FPGA (field programmable gate array) 또는 다른 프로그램 가능 논리 디바이스, 별도의 게이트 또는 트랜지스터 논리, 별도의 하드웨어 컴포넌트, 또는 본원에 설명된 기능을 수행하도록 설계된 그 임의의 조합으로 구현되거나 수행될 수도 있다. 범용 프로세서는 마이크로프로세서일 수도 있지만, 다른 방법으로는, 프로세서는 임의의 통상적인 프로세서, 컨트롤러, 마이크로컨트롤러, 또는 상태 머신일 수도 있다. 또한, 프로세서는 컴퓨팅 디바이스의 조합, 예를 들어, DSP 와 마이크로프로세서의 조합, 복수의 마이크로프로세서, DSP 코어와 결합한 하나 이상의 마이크로프로세서, 또는 임의의 기타 그러한 구성으로서 구현될 수도 있다.In addition, the various illustrative logic blocks, components, modules, and circuits described in connection with the examples disclosed herein may be used in general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or the like. It may be implemented or performed in other programmable logic devices, separate gate or transistor logic, separate hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in other ways, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

당업자라면, 개시된 예의 상술한 설명을 참조하여, 개시된 방법 및 장치를 실시하거나 이용할 수 있다. 당업자라면, 이들 예에 대한 여러 변형을 쉽게 알 수 있고, 본원에 정의된 원리는 개시된 방법 및 장치의 사상이나 범위로부터 일탈함이 없이 다른 예에 적용될 수도 있으며, 부가적인 요소가 추가될 수도 있다. 본 발명의 양태의 설명은 예시적인 것으로서, 청구항의 범위를 한정하려는 것은 아니다.Those skilled in the art, with reference to the above description of the disclosed examples, may practice or use the disclosed methods and apparatus. Those skilled in the art will readily appreciate various modifications to these examples, and the principles defined herein may be applied to other examples without departing from the spirit or scope of the disclosed methods and apparatus, and additional elements may be added. The description of the aspects of the invention is illustrative, and is not intended to limit the scope of the claims.

Claims

As a method of processing multimedia data,

Receiving interlaced video frames;

Converting the interlaced video frames into sequential video;

Generating metadata associated with the sequential video; And

Providing at least a portion of the sequential video and the metadata to an encoder used to encode the sequential video,

Generating metadata associated with the sequential video includes performing shot detection and generating compressed information based on the shot detection.

The method of claim 1,

And encoding said sequential video using said metadata.

The method of claim 1,

And converting the interlaced video frames comprises deinterlacing the interlaced video frames.

The method of claim 1,

And the metadata includes bandwidth information.

The method of claim 1,

And the metadata includes bidirectional motion information.

The method of claim 3, wherein

The deinterlacing step,

Generating spatial information and bidirectional motion information for the interlaced video frames; And

Generating the sequential video based on the interlaced video frames and using the spatial information and the bidirectional motion information.

The method of claim 4, wherein

The bandwidth information includes luminance information.

The method of claim 1,

Wherein the metadata comprises a spatial complexity value.

The method of claim 1,

Wherein the metadata comprises a time complexity value.

The method of claim 1,

Converting the interlaced video frames comprises inverse telecine of 3/2 pull-down video frames.

The method of claim 10,

Wherein said metadata includes bandwidth ratio information.

The method of claim 1,

Resizing the sequential video.

13. The method of claim 12,

And segmenting the sequential video to determine group of picture information.

delete

The method of claim 1,

Filtering the sequential video with a denoising filter.

The method of claim 1,

And the metadata includes luminance and chroma information.

An apparatus for processing multimedia data,

A receiver configured to receive interlaced video frames;

A deinterlacer configured to convert the interlaced video frames into sequential video; And

A divider configured to generate metadata associated with the sequential video and to provide the sequential video and the metadata to an encoder used to encode the sequential video,

And the divider is configured to perform shot detection and generate compressed information based on the shot detection.

The method of claim 17,

And an encoder configured to receive the sequential video from a communication module and to encode the sequential video using the provided metadata.

The method of claim 17,

And the deinterlacer is configured to perform space-time deinterlacing.

The method of claim 17,

And a noise canceling filter for noise canceling the sequential video.

The method of claim 17,

And said deinterlacer comprises an inverse teleciner.

delete

The method of claim 17,

The metadata includes image group information.

The method of claim 17,

And a resampler configured to resize the sequential frames.

The method of claim 17,

The metadata includes bandwidth information.

The method of claim 17,

And the metadata includes bidirectional motion information.

The method of claim 17,

The deinterlacer,

Generate spatial information and bidirectional motion information for the interlaced video frames,

And generate a sequential video based on the interlaced video frames and using the spatial information and the bidirectional motion information.

The method of claim 23,

And said metadata comprises a bandwidth ratio.

The method of claim 23,

And the metadata includes luminance information.

The method of claim 17,

And the metadata includes a spatial complexity value.

The method of claim 17,

And the metadata includes a time complexity value.

The method of claim 17,

And the metadata includes luminance and chroma information.

An apparatus for processing multimedia data,

Means for receiving interlaced video;

Means for converting the interlaced video into sequential video;

Means for generating metadata associated with the sequential video; And

Means for providing the at least a portion of the sequential video and the metadata to an encoder used to encode the sequential video,

The generating means is configured to perform shot detection and generate compressed information based on the shot detection.

The method of claim 33, wherein

And said converting means comprises an inverse telecine device.

The method of claim 33, wherein

And said converting means comprises a space-time deinterlacer.

delete

The method of claim 33, wherein

And said generating means is configured to generate bandwidth information.

The method of claim 33, wherein

And means for resampling to resize a sequential frame.

The method of claim 33, wherein

And means for encoding the sequential video using the provided metadata.

The method of claim 33, wherein

And means for noise canceling the sequential video.

The method of claim 33, wherein

The metadata includes image group information.

The method of claim 33, wherein

And the metadata includes bidirectional motion information.

The method of claim 33, wherein

The conversion means,

The method of claim 33, wherein

And said metadata comprises a bandwidth ratio.

The method of claim 33, wherein

And the metadata includes luminance information.

The method of claim 33, wherein

And the metadata includes a spatial complexity value.

The method of claim 33, wherein

And the metadata includes a time complexity value.

The method of claim 33, wherein

And the metadata includes luminance and chroma information.

A machine readable medium containing instructions for processing multimedia data, the machine comprising:

Instructions for processing the multimedia data, when executed, cause the machine to:

Receive interlaced video frames,

Convert the interlaced video frames into sequential video,

Generate metadata associated with the sequential video,

Provide at least a portion of the sequential video and the metadata to an encoder used to encode the sequential video,

The generation of metadata associated with the sequential video includes performing shot detection and generating compressed information based on the shot detection.

Receive interlaced video,

Convert the interlaced video into sequential video,

Generate metadata associated with the sequential video,