A key component of state-of-the art video coding is motion-compensated prediction, also called in... more A key component of state-of-the art video coding is motion-compensated prediction, also called inter prediction. Current standards allow uni- and bi-prediction, i.e. linear superposition of up to two motion-compensated prediction signals. It is well-known that by a superposition of more than two prediction signals (or hypotheses), the energy of the prediction error can be further reduced. In this paper, it is shown that allowing the encoder to choose among different weights for the individual hypotheses is beneficial from a rate-distortion perspective. A practical multi-hypothesis inter prediction scheme based on the Versatile Video Coding Test Model (VTM) is presented. For VTM-1, in the Random Access configuration according to the JVET Common Test Conditions, the average luma BD bit rate is in the range of -1.6 % to -1.9 % for different settings using up to four prediction hypotheses. For VTM-2, the corresponding BD bit rate is -0.95 %. For higher bit rates (i.e., QP values 12, 17, 22, 27) the BD bit rates are -2.2 % for VTM-1 and -1.4 % for VTM-2.
In video coding, adaptive loop filter (ALF) has attracted attention due to its increasing coding ... more In video coding, adaptive loop filter (ALF) has attracted attention due to its increasing coding performances. Recently ALF has been further developed for its extension, which introduces geometry transformation-based adaptive loop filter (GALF) outperforming the existing ALF techniques. The main idea of ALF is to apply a classification to obtain multiple classes, which gives a partition of a set of all pixel locations. After that, a Wiener filter is applied for each class. Therefore, the performance of ALF essentially relies on how its classification behaves. In this paper, we introduce a novel classification method, Multiple feature-based Classifications ALF (MCALF) extending a classification in GALF and show that it increases coding efficiency while only marginally raising encoding complexity. The key idea is to apply more than one classifier at the encoder to group all reconstructed samples and then to select a classifier with the best RD-performance to carry out the classification process. Simulation results show that around 2% bit rate reduction can be achieved on top of GALF for some selected test sequences.
Perceptual models are heavily used to guide encoding decisions Listening tests are used to determ... more Perceptual models are heavily used to guide encoding decisions Listening tests are used to determine subjective quality of coding results In picture and video coding, Perceptual models have limited use to guide encoding decisions (mainly focusing on properties of the human visual system) Viewing tests are used to determine subjective quality of coding results
Applications of Digital Image Processing XLIII, 2020
The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Ve... more The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Versatile Video Coding (VVC) standard. ISP divides a luma intra-predicted block along one dimension into 2 or 4 smaller blocks, called subpartitions, that are predicted using the same intra mode. This paper describes the design of this tool and its encoder search implementation in the VVC Test Model 7.3 (VTM-7.3) software. The main challenge of the ISP encoder search is the fact that the mode pre-selection based on the sum of absolute transformed differences typically utilized for intra prediction tools is not feasible in the ISP case, given that it would require knowing beforehand the values of the reconstructed samples of the subpartitions. For this reason, VTM employs a different strategy aimed to overcome this issue. The experimental tool-off tests carried out for the All Intra configuration show a gain of 0.52% for the 22-37 Quantization Parameter (QP) range with an associated encoder runtime of 85%. The results are improved to a 1.06% gain and an 87% encoder runtime in the case of the 32-47 QP range. Analogously, for the tool-on case the results for the 22-37 QP range are a 1.17% gain and a 134% encoder runtime and this improves in the 32-47 QP range to a 1.56% gain and a 126% encoder runtime.
The Versatile Video Coding (VVC) development has adopted the possibility to bypass the transform ... more The Versatile Video Coding (VVC) development has adopted the possibility to bypass the transform when the transform block size is equal to 4×4 from its predecessor High Efficiency Video Coding (HEVC). This so-called Transform Skip Mode (TSM) results in increased encoding time when extending it to transform block sizes up to 32×32 while the compression efficiency improvement is for screen content only. This paper presents the so-called Unified MTS scheme that makes TSM for luma transform block sizes up to 32×32 possible without the disadvantage of excessive encoding time increase by incorporating the TSM with the existing Multiple Transform Set (MTS) technique. The Unified MTS scheme achieves compression efficiency improvements, in terms of BD-rate, of about −5.0% in the All-Intra configuration and −5.4% in the Random-Access configuration, respectively, for the screen content sequences of the used test set. Compared to the straightforward extension of TSM to transform block sizes up to 32×32, the encoding time is about 25% less in the All-Intra configuration and about 21% less in the Random-Access configuration, respectively, whereas the compression efficiency improvements are only 0.04% less in the All-Intra configuration, and 0.09% less in the Random-Access configuration, respectively. Relative to the anchor using TSM for 4×4 transform blocks only, the encoding time is the same for natural content and 3% higher for screen content.
IEEE Journal of Selected Topics in Signal Processing, 2020
In the past decade deep neural networks (DNNs) have shown state-of-the-art performance on a wide ... more In the past decade deep neural networks (DNNs) have shown state-of-the-art performance on a wide range of complex machine learning tasks. Many of these results have been achieved while growing the size of DNNs, creating a demand for efficient compression and transmission of them. In this work we present DeepCABAC, a universal compression algorithm for DNNs that is based on applying Context-based Adaptive Binary Arithmetic Coder (CABAC) to the DNN parameters. CABAC was originally designed for the H.264/AVC video coding standard and became the state-of-the-art for the lossless compression part of video compression. DeepCABAC applies a novel quantization scheme that minimizes a rate-distortion function while simultaneously taking the impact of quantization to the DNN performance into account. Experimental results show that DeepCABAC consistently attains higher compression rates than previously proposed coding techniques for DNN compression. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy, thus being able to represent the entire network with merely 9 MB. The source code for encoding and decoding can be found at https://github.com/fraunhoferhhi/DeepCABAC.
APSIPA Transactions on Signal and Information Processing, 2019
In this paper we combine video compression and modern image processing methods. We construct nove... more In this paper we combine video compression and modern image processing methods. We construct novel iterative filter methods for prediction signals based on Partial Differential Equation (PDE)-based methods. The central idea of the signal adaptive filters is explained and demonstrated geometrically. The meaning of particular parameters is discussed in detail. Furthermore, thorough parameter tests are introduced which improve the overall bitrate savings. It is shown that these filters enhance the rate-distortion performance of the state-of-the-art hybrid video codecs. In particular, based on mathematical denoising techniques, two types of diffusion filters are constructed: a uniform diffusion filter using a fixed filter mask and a signal adaptive diffusion filter that incorporates the structures of the underlying prediction signal. The latter has the advantage of not attenuating existing edges while the uniform filter is less complex. The filters are embedded into a software based on ...
ACM Transactions on Multimedia Computing, Communications, and Applications, 2018
For the entropy coding of independent and identically distributed (i.i.d.) binary sources, variab... more For the entropy coding of independent and identically distributed (i.i.d.) binary sources, variable-to-variable length (V2V) codes are an interesting alternative to arithmetic coding. Such a V2V code translates variable length words of the source into variable length code words by employing two prefix-free codes. In this article, several properties of V2V codes are studied, and new concepts are developed. In particular, it is shown that the redundancy of a V2V code cannot be zero for a binary i.i.d. source {X} with 0 < p X (1) < 0.5. Furthermore, the concept of prime and composite V2V codes is proposed, and it is shown why composite V2V codes can be disregarded in the search for particular classes of minimum redundancy codes. Moreover, a canonical representation for V2V codes is proposed, which identifies V2V codes that have the same average code length function. It is shown how these concepts can be employed to greatly reduce the complexity of a search for minimum redundancy ...
IEEE Transactions on Circuits and Systems for Video Technology, 2018
The H.265/MPEG-H High Efficiency Video Coding compliant encoding process faces the challenge of h... more The H.265/MPEG-H High Efficiency Video Coding compliant encoding process faces the challenge of high computational complexity. Particularly, in the case of inter-picture prediction, most of the computational resources are allocated for the motion estimation (ME) process. In turn, ME and motion compensation enable improving coding efficiency by addressing the blocks of video frames as corresponding displacements from one or more reference blocks. These displacements do not necessarily have to be limited to integer sample positions, but may have an accuracy of half sample or quarter sample positions, which are identified during fractional sample refinement. In this paper, a context-based scheme for fractional sample refinement is proposed. The scheme takes the advantage of already obtained information in prior ME steps and provides significant flexibility in terms of parameterization. In this way, it adaptively achieves a desired tradeoff between computational complexity and coding efficiency. According to the experimental results obtained for an example algorithm utilizing the proposed framework, a significant decrease in the number of search points can be achieved. For instance, considering only 6 instead of 16 fractional sample positions results in a tradeoff of only 0.4% Bjøntegaard-Delta-rate loss for high-definition video sequences compared with the conventional interpolation-and-search method.
In video coding, there are inter-frame dependencies due to motion-compensated prediction. The ach... more In video coding, there are inter-frame dependencies due to motion-compensated prediction. The achievable rate distortion performance of an inter-coded frame depends on the coding decisions made during the encoding of its reference frames. Typically, in the encoding of a reference frame, these dependencies are either not considered at all or only via some rough heuristic. Finally, I would like to thank my family, my parents Inge and Manfred, for being there for me and for their encouragement and support during all stages of my education. Last but not least, I would like to thank someone very special who gave me the strength to pursue this thesis-not mentioned by name does not mean forgotten! vii Contents
The scalable extension of H.264/MPEG4-AVC is a current standardization project of the Joint Video... more The scalable extension of H.264/MPEG4-AVC is a current standardization project of the Joint Video Team of the ITU-T Video Coding Experts Group and the ISO/1EC Moving Picture Experts Group. This paper gives an overview of the design of the scalable H.264/MPEG4-AVC extension and describes the basic concepts for supporting temporal, spatial, and SNR scalability. The efficiency of the described concepts for providing spatial and SNR scalability is analyzed by means of simulation results and compared to H.264/MPEG4-AVC compliant single layer coding.
▪ Update step: No directly inverse motion compensation▪ Prediction data used in the update step a... more ▪ Update step: No directly inverse motion compensation▪ Prediction data used in the update step are derived from prediction data used in the prediction step▪ Uni-directional/bi-directional modes▪ Multiple reference picture, Intra mode (no update)▪ Motion-...
The extension of H.264/AVC hybrid video coding towards motion-compensated temporal filtering (MCT... more The extension of H.264/AVC hybrid video coding towards motion-compensated temporal filtering (MCTF) and scalability is presented. Utilizing the lifting approach to implement MCTF, the motion compensation features of H.264/AVC can be re-used for the MCTF prediction step and extended in a straightforward way for the MCTF update step. The MCTF extension of H.264/AVC is also incorporated into a video codec that provides SNR, spatial, and (similar to hybrid video coding) temporal scalability. The paper provides a description of these techniques and presents experimental results that validate their efficiency.
A key component of state-of-the art video coding is motion-compensated prediction, also called in... more A key component of state-of-the art video coding is motion-compensated prediction, also called inter prediction. Current standards allow uni- and bi-prediction, i.e. linear superposition of up to two motion-compensated prediction signals. It is well-known that by a superposition of more than two prediction signals (or hypotheses), the energy of the prediction error can be further reduced. In this paper, it is shown that allowing the encoder to choose among different weights for the individual hypotheses is beneficial from a rate-distortion perspective. A practical multi-hypothesis inter prediction scheme based on the Versatile Video Coding Test Model (VTM) is presented. For VTM-1, in the Random Access configuration according to the JVET Common Test Conditions, the average luma BD bit rate is in the range of -1.6 % to -1.9 % for different settings using up to four prediction hypotheses. For VTM-2, the corresponding BD bit rate is -0.95 %. For higher bit rates (i.e., QP values 12, 17, 22, 27) the BD bit rates are -2.2 % for VTM-1 and -1.4 % for VTM-2.
In video coding, adaptive loop filter (ALF) has attracted attention due to its increasing coding ... more In video coding, adaptive loop filter (ALF) has attracted attention due to its increasing coding performances. Recently ALF has been further developed for its extension, which introduces geometry transformation-based adaptive loop filter (GALF) outperforming the existing ALF techniques. The main idea of ALF is to apply a classification to obtain multiple classes, which gives a partition of a set of all pixel locations. After that, a Wiener filter is applied for each class. Therefore, the performance of ALF essentially relies on how its classification behaves. In this paper, we introduce a novel classification method, Multiple feature-based Classifications ALF (MCALF) extending a classification in GALF and show that it increases coding efficiency while only marginally raising encoding complexity. The key idea is to apply more than one classifier at the encoder to group all reconstructed samples and then to select a classifier with the best RD-performance to carry out the classification process. Simulation results show that around 2% bit rate reduction can be achieved on top of GALF for some selected test sequences.
Perceptual models are heavily used to guide encoding decisions Listening tests are used to determ... more Perceptual models are heavily used to guide encoding decisions Listening tests are used to determine subjective quality of coding results In picture and video coding, Perceptual models have limited use to guide encoding decisions (mainly focusing on properties of the human visual system) Viewing tests are used to determine subjective quality of coding results
Applications of Digital Image Processing XLIII, 2020
The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Ve... more The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Versatile Video Coding (VVC) standard. ISP divides a luma intra-predicted block along one dimension into 2 or 4 smaller blocks, called subpartitions, that are predicted using the same intra mode. This paper describes the design of this tool and its encoder search implementation in the VVC Test Model 7.3 (VTM-7.3) software. The main challenge of the ISP encoder search is the fact that the mode pre-selection based on the sum of absolute transformed differences typically utilized for intra prediction tools is not feasible in the ISP case, given that it would require knowing beforehand the values of the reconstructed samples of the subpartitions. For this reason, VTM employs a different strategy aimed to overcome this issue. The experimental tool-off tests carried out for the All Intra configuration show a gain of 0.52% for the 22-37 Quantization Parameter (QP) range with an associated encoder runtime of 85%. The results are improved to a 1.06% gain and an 87% encoder runtime in the case of the 32-47 QP range. Analogously, for the tool-on case the results for the 22-37 QP range are a 1.17% gain and a 134% encoder runtime and this improves in the 32-47 QP range to a 1.56% gain and a 126% encoder runtime.
The Versatile Video Coding (VVC) development has adopted the possibility to bypass the transform ... more The Versatile Video Coding (VVC) development has adopted the possibility to bypass the transform when the transform block size is equal to 4×4 from its predecessor High Efficiency Video Coding (HEVC). This so-called Transform Skip Mode (TSM) results in increased encoding time when extending it to transform block sizes up to 32×32 while the compression efficiency improvement is for screen content only. This paper presents the so-called Unified MTS scheme that makes TSM for luma transform block sizes up to 32×32 possible without the disadvantage of excessive encoding time increase by incorporating the TSM with the existing Multiple Transform Set (MTS) technique. The Unified MTS scheme achieves compression efficiency improvements, in terms of BD-rate, of about −5.0% in the All-Intra configuration and −5.4% in the Random-Access configuration, respectively, for the screen content sequences of the used test set. Compared to the straightforward extension of TSM to transform block sizes up to 32×32, the encoding time is about 25% less in the All-Intra configuration and about 21% less in the Random-Access configuration, respectively, whereas the compression efficiency improvements are only 0.04% less in the All-Intra configuration, and 0.09% less in the Random-Access configuration, respectively. Relative to the anchor using TSM for 4×4 transform blocks only, the encoding time is the same for natural content and 3% higher for screen content.
IEEE Journal of Selected Topics in Signal Processing, 2020
In the past decade deep neural networks (DNNs) have shown state-of-the-art performance on a wide ... more In the past decade deep neural networks (DNNs) have shown state-of-the-art performance on a wide range of complex machine learning tasks. Many of these results have been achieved while growing the size of DNNs, creating a demand for efficient compression and transmission of them. In this work we present DeepCABAC, a universal compression algorithm for DNNs that is based on applying Context-based Adaptive Binary Arithmetic Coder (CABAC) to the DNN parameters. CABAC was originally designed for the H.264/AVC video coding standard and became the state-of-the-art for the lossless compression part of video compression. DeepCABAC applies a novel quantization scheme that minimizes a rate-distortion function while simultaneously taking the impact of quantization to the DNN performance into account. Experimental results show that DeepCABAC consistently attains higher compression rates than previously proposed coding techniques for DNN compression. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy, thus being able to represent the entire network with merely 9 MB. The source code for encoding and decoding can be found at https://github.com/fraunhoferhhi/DeepCABAC.
APSIPA Transactions on Signal and Information Processing, 2019
In this paper we combine video compression and modern image processing methods. We construct nove... more In this paper we combine video compression and modern image processing methods. We construct novel iterative filter methods for prediction signals based on Partial Differential Equation (PDE)-based methods. The central idea of the signal adaptive filters is explained and demonstrated geometrically. The meaning of particular parameters is discussed in detail. Furthermore, thorough parameter tests are introduced which improve the overall bitrate savings. It is shown that these filters enhance the rate-distortion performance of the state-of-the-art hybrid video codecs. In particular, based on mathematical denoising techniques, two types of diffusion filters are constructed: a uniform diffusion filter using a fixed filter mask and a signal adaptive diffusion filter that incorporates the structures of the underlying prediction signal. The latter has the advantage of not attenuating existing edges while the uniform filter is less complex. The filters are embedded into a software based on ...
ACM Transactions on Multimedia Computing, Communications, and Applications, 2018
For the entropy coding of independent and identically distributed (i.i.d.) binary sources, variab... more For the entropy coding of independent and identically distributed (i.i.d.) binary sources, variable-to-variable length (V2V) codes are an interesting alternative to arithmetic coding. Such a V2V code translates variable length words of the source into variable length code words by employing two prefix-free codes. In this article, several properties of V2V codes are studied, and new concepts are developed. In particular, it is shown that the redundancy of a V2V code cannot be zero for a binary i.i.d. source {X} with 0 < p X (1) < 0.5. Furthermore, the concept of prime and composite V2V codes is proposed, and it is shown why composite V2V codes can be disregarded in the search for particular classes of minimum redundancy codes. Moreover, a canonical representation for V2V codes is proposed, which identifies V2V codes that have the same average code length function. It is shown how these concepts can be employed to greatly reduce the complexity of a search for minimum redundancy ...
IEEE Transactions on Circuits and Systems for Video Technology, 2018
The H.265/MPEG-H High Efficiency Video Coding compliant encoding process faces the challenge of h... more The H.265/MPEG-H High Efficiency Video Coding compliant encoding process faces the challenge of high computational complexity. Particularly, in the case of inter-picture prediction, most of the computational resources are allocated for the motion estimation (ME) process. In turn, ME and motion compensation enable improving coding efficiency by addressing the blocks of video frames as corresponding displacements from one or more reference blocks. These displacements do not necessarily have to be limited to integer sample positions, but may have an accuracy of half sample or quarter sample positions, which are identified during fractional sample refinement. In this paper, a context-based scheme for fractional sample refinement is proposed. The scheme takes the advantage of already obtained information in prior ME steps and provides significant flexibility in terms of parameterization. In this way, it adaptively achieves a desired tradeoff between computational complexity and coding efficiency. According to the experimental results obtained for an example algorithm utilizing the proposed framework, a significant decrease in the number of search points can be achieved. For instance, considering only 6 instead of 16 fractional sample positions results in a tradeoff of only 0.4% Bjøntegaard-Delta-rate loss for high-definition video sequences compared with the conventional interpolation-and-search method.
In video coding, there are inter-frame dependencies due to motion-compensated prediction. The ach... more In video coding, there are inter-frame dependencies due to motion-compensated prediction. The achievable rate distortion performance of an inter-coded frame depends on the coding decisions made during the encoding of its reference frames. Typically, in the encoding of a reference frame, these dependencies are either not considered at all or only via some rough heuristic. Finally, I would like to thank my family, my parents Inge and Manfred, for being there for me and for their encouragement and support during all stages of my education. Last but not least, I would like to thank someone very special who gave me the strength to pursue this thesis-not mentioned by name does not mean forgotten! vii Contents
The scalable extension of H.264/MPEG4-AVC is a current standardization project of the Joint Video... more The scalable extension of H.264/MPEG4-AVC is a current standardization project of the Joint Video Team of the ITU-T Video Coding Experts Group and the ISO/1EC Moving Picture Experts Group. This paper gives an overview of the design of the scalable H.264/MPEG4-AVC extension and describes the basic concepts for supporting temporal, spatial, and SNR scalability. The efficiency of the described concepts for providing spatial and SNR scalability is analyzed by means of simulation results and compared to H.264/MPEG4-AVC compliant single layer coding.
▪ Update step: No directly inverse motion compensation▪ Prediction data used in the update step a... more ▪ Update step: No directly inverse motion compensation▪ Prediction data used in the update step are derived from prediction data used in the prediction step▪ Uni-directional/bi-directional modes▪ Multiple reference picture, Intra mode (no update)▪ Motion-...
The extension of H.264/AVC hybrid video coding towards motion-compensated temporal filtering (MCT... more The extension of H.264/AVC hybrid video coding towards motion-compensated temporal filtering (MCTF) and scalability is presented. Utilizing the lifting approach to implement MCTF, the motion compensation features of H.264/AVC can be re-used for the MCTF prediction step and extended in a straightforward way for the MCTF update step. The MCTF extension of H.264/AVC is also incorporated into a video codec that provides SNR, spatial, and (similar to hybrid video coding) temporal scalability. The paper provides a description of these techniques and presents experimental results that validate their efficiency.
Uploads
Papers by Heiko Schwarz