Academia.eduAcademia.edu

Context-Based Fractional Sample Refinement for HEVC Compliant Encoding

2018, IEEE Transactions on Circuits and Systems for Video Technology

The H.265/MPEG-H High Efficiency Video Coding compliant encoding process faces the challenge of high computational complexity. Particularly, in the case of inter-picture prediction, most of the computational resources are allocated for the motion estimation (ME) process. In turn, ME and motion compensation enable improving coding efficiency by addressing the blocks of video frames as corresponding displacements from one or more reference blocks. These displacements do not necessarily have to be limited to integer sample positions, but may have an accuracy of half sample or quarter sample positions, which are identified during fractional sample refinement. In this paper, a context-based scheme for fractional sample refinement is proposed. The scheme takes the advantage of already obtained information in prior ME steps and provides significant flexibility in terms of parameterization. In this way, it adaptively achieves a desired tradeoff between computational complexity and coding efficiency. According to the experimental results obtained for an example algorithm utilizing the proposed framework, a significant decrease in the number of search points can be achieved. For instance, considering only 6 instead of 16 fractional sample positions results in a tradeoff of only 0.4% Bjøntegaard-Delta-rate loss for high-definition video sequences compared with the conventional interpolation-and-search method.

528 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 28, NO. 2, FEBRUARY 2018 Context-Based Fractional Sample Refinement for HEVC Compliant Encoding Georg Maier, Benjamin Bross, Student Member, IEEE, Dan Grois, Senior Member, IEEE, Detlev Marpe, Fellow, IEEE, Heiko Schwarz, Remco C. Veltkamp, and Thomas Wiegand, Fellow, IEEE Abstract— The H.265/MPEG-H High Efficiency Video Coding compliant encoding process faces the challenge of high computational complexity. Particularly, in the case of inter-picture prediction, most of the computational resources are allocated for the motion estimation (ME) process. In turn, ME and motion compensation enable improving coding efficiency by addressing the blocks of video frames as corresponding displacements from one or more reference blocks. These displacements do not necessarily have to be limited to integer sample positions, but may have an accuracy of half sample or quarter sample positions, which are identified during fractional sample refinement. In this paper, a context-based scheme for fractional sample refinement is proposed. The scheme takes the advantage of already obtained information in prior ME steps and provides significant flexibility in terms of parameterization. In this way, it adaptively achieves a desired tradeoff between computational complexity and coding efficiency. According to the experimental results obtained for an example algorithm utilizing the proposed framework, a significant decrease in the number of search points can be achieved. For instance, considering only 6 instead of 16 fractional sample positions results in a tradeoff of only 0.4% BjøntegaardDelta-rate loss for high-definition video sequences compared with the conventional interpolation-and-search method. Index Terms— Fractional sample refinement, H.265/MPEG-H High Efficiency Video Coding (HEVC), motion estimation (ME). I. I NTRODUCTION T HE H.265/MPEG-H High Efficiency Video Coding (HEVC) standard allows bit-rate savings of about 50% for essentially the same subjective quality compared with its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard. Thereby, it efficiently tackles the challenges posed on modern communication networks and storage media resulting from the dramatically increasing video bandwidth demands. Manuscript received April 1, 2016; revised June 28, 2016 and August 15, 2016; accepted September 15, 2016. Date of publication September 27, 2016; date of current version February 13, 2018. This paper was recommended by Associate Editor L. Zhou. G. Maier is with the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (Fraunhofer IOSB), 76131 Karlsruhe, Germany (e-mail: [email protected]). B. Bross, D. Grois, D. Marpe, H. Schwarz, and T. Wiegand are with the Video Coding and Analytics Department, Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, 10587 Berlin, Germany (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; thomas. [email protected]). R. C. Veltkamp is with the Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, The Netherlands (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at https://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2016.2613910 The H.265/MPEG-H HEVC standard was designed to be applicable for almost all existing H.264/MPEG-4 AVC applications. Also, special emphasis is put on high-definition (HD) and ultra-HD (UHD) video content, since need for these formats is expected to increase significantly during the next years. However, the abovementioned coding performance gain comes at the cost of tremendously increased computational complexity, mainly due to supporting a relatively large number of coding modes [1], [2]. During an HEVC encoding process, the mode decision process determines whether a coding unit (CU) should be encoded by using the intra- or inter-picture prediction techniques in addition to determining the quadtree block partitioning [3], [4]. Thereby, either spatial or temporal redundancies are exploited. The decision process is commonly implemented by means of a rate-distortion optimization technique. A cost function, typically denoted as J = D + λR, has to be minimized, whereas the overall cost J is based on the bit-rate cost R and on the distortion cost D, which are weighted by using a Lagrange multiplier λ [5]. Therefore, in order to determine the best coding mode, the CU is usually encoded in a plurality of coding modes, which leads to a high computational burden at the encoder end. Inter-picture prediction plays a crucial role in modern video coding applications due to its high potential to significantly improve the coding efficiency. Temporal redundancy is removed by encoding a block in terms of a displacement to one or more reference blocks, which are located in prior encoded reference frames. The displacement information is encoded as a so-called motion vector (MV), which is identified by executing the motion estimation (ME) process. Generally, in recent video encoders, ME is usually regarded as a three-step process, including the MV prediction, integer sample accuracy search, and fractional sample refinement. Regarding the latter, the HEVC standard allows to address the motion information on a quarter sample precision level. Such fractional sample positions may be obtained by applying computationally costly interpolation methods, followed by a search around the position of the prior-determined integer sample MV. Many fast integer sample accuracy search algorithms have been presented in the past, which significantly reduce the encoding time at the cost of a reasonable small coding efficiency loss. However, when the time spent for integer motion search is reduced, the contribution of fractional motion search to the overall motion search time becomes more significant. In the HEVC reference software (HM), which already applies fast integer motion 1051-8215 © 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications_standards/publications/rights/index.html for more information. MAIER et al.: CONTEXT-BASED FRACTIONAL SAMPLE REFINEMENT FOR HEVC COMPLIANT ENCODING search, interpolation makes up for approximately 19.5% of the encoding time [1]. Hence, fast fractional motion searches are of particular interest when fast integer motion searches are applied. In this paper, a context-based software framework for fractional sample refinement is proposed. Given a certain context, conditional probabilities are derived. These probabilities indicate the likelihood that specific fractional sample positions yield a coding efficiency gain compared with the prior-selected integer sample position. More precisely, a context is defined in terms of a function that evaluates neighboring integer samples to determine the most promising fractional sample positions. To demonstrate the success of the proposed method, a specific context is presented, for which a good tradeoff in terms of computational complexity and coding efficiency is reported. While the algorithm itself has been implemented in an HEVC software encoder, suitable hardware implementations could be an interesting topic for future work. This paper is organized as follows. The background along with an overview of related work is provided in Section II. Following that, a detailed description of the proposed, generic context-based fractional sample refinement framework is given in Section III. In Section IV, an example algorithm based on the proposed framework is introduced. The test methodology and experimental results for this specific context are discussed in detail in Section V. Finally, this paper is concluded in Section VI, further providing a brief overview of future research perspectives. II. BACKGROUND AND R ELATED W ORK A comprehensive overview of the HEVC standard is provided in [6] and a detailed analysis of the coding efficiency can be found in [7]. In addition, a comparison to other recent video coding schemes is presented in [8] with a particular focus on low-delay applications in [9]. Also, additional studies with regard to the HEVC decoding performance and complexity are shown in [10], while making a special emphasis on 4Kresolution videos. Furthermore, with regard to video coding standards that employ inter-picture prediction, intensive research has been conducted for carrying out suboptimal strategies in the field of integer sample accuracy ME. Those strategies typically target achieving a tradeoff between computational complexity and coding efficiency. A recent survey on fast block matching algorithms is provided in [11]. Furthermore, the application of pattern-based approaches in the context of HEVC is studied in [12]. In the following, several traditional search techniques regarding fractional sample ME are discussed. Section II-A overviews the conventional interpolation-and-search method, Section II-B discusses the recent pattern-based approaches, and Section II-C reviews several error surface approximation techniques. Although this paper and related work reviewed in the following mainly focus on the software implementations of fast fractional sample ME schemes, a lot of research has also been conducted regarding efficient hardware designs. In [13], such an implementation for the interpolation task is presented. 529 Fig. 1. Example of the conventional interpolation-and-search method. Circles: integer sample positions. Larger squares: half sample positions. Smaller squares: quarter sample positions. A hardware design including both interpolation and search is presented in [14]. The proposed approach also includes a module responsible for integer sample ME. Also, in [15] a design for performing bilinear quarter pixel approximation and a search pattern based on it is presented. A. Interpolation-and-Search Method The traditional so-called interpolation-and-search method for fractional sample refinement is presented in Fig. 1. It relies on the common assumption that an optimal fractional sample position is located adjacent to the optimal integer sample position. This assumption in turn gives rise to 48 possible quarter sample positions. However, the corresponding search is typically divided into two steps as follows. First, the interpolation and search is performed on a half sample accuracy level, as shown in Fig. 1(a). Second, this procedure is repeated on a quarter sample accuracy level, while reducing the search space to the neighborhood of the previously selected half sample position, as further shown in Fig. 1(b). Consequently, this approach is limited to eight half sample positions and eight quarter sample positions. Also, if no better fractional sample position can be determined, the MV stays on an integer sample position. B. Pattern-Based Approaches In light of the popularity of approaches that have been developed for integer sample search, various pattern-based strategies have also been developed with regard to fractional sample refinement. However, while adopting the assumption that the optimal fractional sample position is always adjacent to the selected integer sample position, it is important to note that the search space has rather a limited size. Additional attempts have been made to subsample this search space by first trying to predict a restricted search space and then to conduct a search within the predicted fractional sample space. For example, Zhang et al. [16] propose to reduce the number of search points from eight to six points both for the half sample and the quarter sample search. This is done by first checking only four “near-located” samples and subsequently two “far-located” samples, which in turn are positioned next to the best “near-located” sample. They treat samples that are adjacently located in the horizontal and vertical directions as “near-located” samples and those adjacently located in 530 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 28, NO. 2, FEBRUARY 2018 the diagonal directions as “far-located” samples. As a result, this approach clearly favors the “near-located” samples over the “far-located” samples by providing them a larger weight during the decision process. Another example is shown in [17], where the authors propose to limit the search to one quadrant. The corresponding quadrant is determined by checking two fixed samples. In addition, in [18], the search space is restricted to only a few points depending on the direction of the integer sample accuracy MV. Furthermore, in [19], pattern application dependent on the distortion distribution of surrounding integer sample positions is proposed. C. Error Surface Approximation Techniques Another type of techniques related to fractional sample refinement attempts to approximate the error surface on a fractional sample level around the selected integer sample position. Suh et al. [20] propose three models for determining a fractional sample accuracy MV. This approach requires the computation of nine integer sample position errors. Depending on the applied integer sample search algorithm, some of those can be already known. Interpolation and search on a fractional sample level is omitted. While they present results and a half sample accuracy level, their approach can easily be extended to quarter sample accuracy. In addition, in [21], those models are applied by considering the obtained coding costs for the required integer sample positions rather than the distortions values. The approach also includes an early termination criterion. It is applied whenever, based on the shape of the resulting surface, no significant improvement through the fractional sample refinement is expected. III. P ROPOSED C ONTEXT-BASED F RACTIONAL S AMPLE R EFINEMENT F RAMEWORK While the context-based approaches have been successfully applied in video compression, e.g., for entropy coding [22], neither applications in integer sample nor in fractional sample ME have been reported. In this section, a general software framework for context-based fractional sample refinement is established. The proposed approach consists of an offline as well as an online phase, as described in detail in Sections III-A and III-B, respectively. Briefly, the goal of the offline phase is to collect data for each possible fractional sample position. Based on this data, an average increase in encoding efficiency over the prior chosen integer sample position given a certain context can be calculated. During the actual encoding process, i.e., the online phase, those average gains are used to steer the fractional sample refinement, independent of the given sequence. In this section, the context is kept generic. A specific context, for which reasonable results in terms of the tradeoff between computational complexity reduction and coding efficiency loss are obtained, is introduced for an example algorithm in Section IV. in terms of coding efficiency. Let m  be the MV obtained from an arbitrary integer sample accuracy search algorithm and let S be a set of allowed fractional sample displacements s surrounding m.  Furthermore, let f be a cost function, calculating  means that the cost of an MV, for instance, where f ( a ) > f (b)  b is to be preferred over a . Also, the performance gain g(s ) that s provides over m  is supposed to be defined according to g(s ) = f (m)  − f (s ). (1) In addition, an arbitrary context ct x(C) is assumed with C being the input information and ct x being a function that processes this input information. In order to calculate the average gain, sample data need to be collected during the offline phase. Let N be a set of N samples where each sample consists of the context input Cn and the gains gn (s ) for each fractional sample displacement s ∈ S with n = 1 . . . N. Then, for each s ∈ S given a specific context ct x(C), the average performance gain ḡ(s |ct x(C)) resulting from selecting s over m  by using the set of samples N is calculated as shown in Nct x = {n|ct  x(Cn ) = ct x(C)} s) n∈Nct x gn ( . (2) ḡ(s |ct x(C)) = |Nct x | Typical example sequences can be used to obtain the sample set N , in the remainder referred to as training set. As a result, the fractional sample positions can be ranked from the one yielding the highest to the one yielding the lowest average performance gain. Let Srank be the set of r ank − 1 best fractional sample positions. Then, the recursive formula provided in (3) can be used to generate a lookup table (LUT) containing a fractional position srank,ct x(C ) for a given r ank and context ct x(C) as1 srank,ct x(C ) = arg max ḡ(s |ct x(C)) s∈S \Srank Srank =  ∅ rank = 1 {si,ct x(C ) |i = 1 . . . rank − 1} rank > 1. (3) A. Offline Phase The effectiveness of the selected context model can be determined by applying the results on an alternative set of sequences. Also, different frames thereof can be used. Two metrics may be used for evaluation. One is the overall success rate, i.e., the relative frequency of finding optimal fractional sample positions. The second is the generated encoding cost overhead, i.e., the overhead of the encoding cost compared with the corresponding optimal position. It should be noted that a low overall success rate may still correspond to a wellperforming context model in the case of generally selecting a near-optimal position. This is the case whenever the position chosen yields an encoding cost sufficiently close to the optimum. Obviously, provided a large training set (i.e., a large set N ), the described offline phase may require exhaustive simulations. As already mentioned in the introduction, the estimated LUT is not changed anymore and further used to steer the fractional sample refinement in the final algorithm as explained in Section III-B. Hence, simulations only need to be performed once to obtain the LUT when designing an algorithm. During the offline phase, the goal is to order the fractional sample positions according to their average performance gain 1 If more than one s ∈ S \ S s |ct x(C)), we let srank,ct x (C ) rank minimize ḡ( be the first s amongst them. MAIER et al.: CONTEXT-BASED FRACTIONAL SAMPLE REFINEMENT FOR HEVC COMPLIANT ENCODING 531 Algorithm 1 Context-Based Fractional Sample Search bestMv ← m  bestCost ← f (m)  rank ← 1 while cri t holds do srank,ct x(C ) ← MV with rank r ank given ct x(C) from the LUT calculated during offline phase based on the highest average gain. If necessary, perform up-sampling for srank,ct x(C ) newCost ← f (srank,ct x(C ) ) if newCost < bestCost then bestCost ← newCost bestMV ← srank,ct x(C ) end if rank ← rank + 1 end while return bestMv Fig. 2. Notation for the positions. (a) Half sample positions (b) Quarter sample positions. in Section IV-B. Around the best half sample MV, the quarter sample MV search is performed as a second implementation of the framework and explained in detail in Section IV-C. This hierarchical two-level approach adopts the assumption that the optimal quarter sample position is adjacent to the optimal half sample position as has been discussed in the context of the conventional interpolation-and-search method, see Section II-A. It is noted that all the training data in this section was obtained for the first ten frames for each of the common test condition [24] sequences presented in Table III. In Section IV-D, an example search step illustrates the combination of half and quarter sample refinement. B. Online Phase During the encoding process, the information obtained from the offline phase is applied by using the algorithm skeleton shown in Algorithm 1. Here, the cost of the fractional sample position s1,ct x(C ) having the highest average gain (rank = 1) given a context ct x(C) is compared with the costs of the current best sample position. In this regard, this corresponds to the chosen integer sample position m.  Obviously, the calculation of the context may result in an overhead as additional information needs to be derived. However, compared with complexity caused by the interpolation methods, this overhead is negligible. According to [1], for a block of size N × N, 8 +(56/N) 8-b and eight 16-b multiply accumulate operations are required per sample only for the luma component in software. In comparison, context calculation requires only 8×3 multiplications of integer distortions with the corresponding weights and two additions, provided that integer distortions have already been calculated during integer ME. The criterion cri t decides whether the online search algorithm should proceed with the next fractional sample position from the ranking LUT determined during the offline phase (rank = 2, 3, . . .). While the approach is fairly simple, its advantage lies in its flexibility in designing the termination criterion. An example criterion is presented in Section IV. IV. E XAMPLE F RACTIONAL S AMPLE R EFINEMENT BASED ON THE P ROPOSED F RAMEWORK In this section, a fractional sample MV search algorithm is introduced as an example, which is built upon the generic framework illustrated in Section III. The fractional sample MV search was integrated on top of the fast integer sample MV search in the HEVC test model (HM) reference software encoder [25]. A specific context, which has to be evaluated in both the offline and the online phase, is derived first in Section IV-A. Then, the proposed framework is used to determine the best half sample position as described A. Studied Context In the course of this paper, it was observed that the optimal fractional sample positions are evenly distributed. Therefore, no assumptions regarding the general priorities of positions can be made. However, several studies, as those presented for example in [19] and [20], report a correlation between distortions obtained at integer sample positions adjacent to the chosen position and the best fractional sample position. In order to define the integer sample positions for the context evaluation input function, let x1 , . . . , x8 be the integer positions surrounding m  from the top-left to bottom-right, as shown in Fig. 2. In HM, the sum of absolute differences (SAD) is used as a distortion metric for integer sample MVs. Hence, x i ). The the distortion value for position xi is given by dSAD ( context input C is defined in (4) as the vector containing the SAD distortion values for all adjacent integer sample positions ⎞ ⎛ x1) dSAD ( ⎜dSAD( x 2 )⎟ ⎜ ⎟ ⎟ ( d C = dSAD = ⎜ (4) ⎜ SAD x 3 )⎟ . ⎝ ... ⎠ x8) dSAD ( Let Mi be the ith row of an 8 × 8 weighting matrix M, and then the calculation of context ct x(dSAD ) is given by2 ct x(dSAD ) = arg min(Mi · dSAD ). (5) i=1...8 Hence, ct x(dSAD ) identifies the minimal weighted linear combination of the directly neighboring integer sample distortions dSAD as defined in (4). The intention of this approach is to utilize information obtained on the integer sample accuracy level. This choice is also in line with the results presented in [20] and [21]. From these works, it can be 2 If more than one i = 1 . . . 8 minimize M · d i SAD , we let ct x(dSAD ) be the smallest i amongst them. 532 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 28, NO. 2, FEBRUARY 2018 Fig. 3. Visual example of the weights and integer sample positions involved in the linear combination when multiplying the first row of the weighting  matrix M with the distortion vector d. concluded that the distortions obtained from integer sample positions surrounding a candidate fraction sample position provide diagnostically conclusive information regarding the likeliness of it yielding an encoding efficiency gain. Based on this, numerous simulations to test different combinations of surrounding integer sample positions and their associated distortions have been carried out. Those included evaluation based on different weights for the surrounding integer sample distortions. Here, a weighting for which promising results have been obtained is presented in (6) and used in the following. However, optimization strategies, such as machine learning approaches, for instance, have not been carried out. Therefore, it is expected that even better performing weights can be found. ⎤ ⎡ 3 2 0 2 0 0 0 0 ⎢2 3 2 0 0 0 0 0⎥ ⎥ ⎢ ⎢0 2 3 0 2 0 0 0⎥ ⎥ ⎢ ⎢2 0 0 3 0 2 0 0⎥ ⎥ (6) M =⎢ ⎢0 0 2 0 3 0 0 2⎥ . ⎥ ⎢ ⎢0 0 0 2 0 3 2 0⎥ ⎥ ⎢ ⎣0 0 0 0 0 2 3 2⎦ 0 0 0 0 2 0 2 3 The notation is further schematically shown in Fig. 3 by providing an example for the multiplication of the first row of M with the vector dSAD . The black square position corresponds to the center of the search and the gray cells correspond to positions, where the distortion values are not considered. Scalars indicated within the white cells represent the distortion weights of each corresponding position, thereby resulting in the following overall distortion calculation as given in (7) for the example x 1 ) + 2dSAD ( x 2 ) + 2dSAD ( x 4 ). (7) (M1 · dSAD ) = 3dSAD( B. Half Sample MV Refinement Given the hierarchical two-level approach, the half sample MV refinement needs to be performed before subsequent quarter sample refinement. Hence, in the following, the complete half sample MV refinement is described in detail, including the training set, the corresponding offline LUT derivation, and the online search algorithm. 1) Training Set: As already defined in Section III-A, each sample from the training set N includes the context input Cn and the gains gn (s ) for the considered fractional samples positions s. Here, the context input Cn equals the SAD values dSAD,n of the integer MVs adjacent to the selected integer MV m  n . The fractional sample positions s considered here are all eight half sample positions hi (i = 1 . . . 8) surrounding m.  It should be noted that HM uses the sum of Hadamard transformed differences as a distortion measure for all fractional Fig. 4. Examples of the quarter sample distortion maps, where (0,0) position corresponds to the selected integer sample MV. Preferably none! (a) Quarter sample distortion map (b) Quarter sample distortion map. TABLE I H ALF S AMPLE P OSITIONS hrank,ct x (d O RDERED BY L IKELINESS TO SAD ) Y IELD AN E NCODING P ERFORMANCE G AIN OVER I NTEGER S AMPLE MV ( Rank) G IVEN A C ONTEXT ct x (dSAD ) sample positions. Consequently, the calculation of the gain is based on HAD distortion values for both the half sample and the integer sample positions as shown in (8). Two examples of such HAD fractional sample distortion maps are provided in Fig. 4.  − dHAD(hi ). gn (hi ) = dHAD(m) (8) 2) Offline Phase: Using the samples from the training set N , average performance gains given the context ct x(dSAD) from (5) are obtained as illustrated in Nct x = {n|ct x(dSAD,n ) = ct x(dSAD )}  gn (hi ) ḡ(hi |ct x(dSAD )) = n∈Nct x |Nct x | . (9) (10) The result of (9) can be used to generate an LUT as described in (3). Table I shows the resulting LUT containing the half sample positions hrank,ct x(dSAD ) , ordered by r ank for each possible context ct x(dSAD ) from (5). Here, again a rank of 1 denotes the highest and a rank of 8 denotes the lowest average performance gain. One interesting observation is that the best-ranked half sample position hi for each context always lies in the middle between the current integer sample position m  and the adjacent integer sample position xi that influences mainly the corresponding context. Taking h1,1 = h1 as an example, the context ct x(dSAD ) = 1 means that the linear combination of the distortions M1 · dSAD minimizes (5). From (7), it can be seen that this linear combination is centered around the top-left neighboring integer sample position x1 that also has the highest weight. This illustrates the correlation between the neighboring integer sample distortions and the MAIER et al.: CONTEXT-BASED FRACTIONAL SAMPLE REFINEMENT FOR HEVC COMPLIANT ENCODING Algorithm 2 Online Phase Implementation for Half Sample Search bestMv ← m  bestCost ← dHAD (m)  rank ← 1 while rank ≤ u do hrank,ct x(dSAD ) ← MV from Table I If necessary, perform up-sampling for hrank,ct x(dSAD ) newCost ← dHAD (hrank,ct x(dSAD ) ) if newCost < bestCost then bestCost ← newCost bestMV ← hrank,ct x(dSAD ) end if rank ← rank + 1 end while return bestMv best fractional sample position. The precalculated LUT serves as the basis for any sequence to be encoded using the algorithm described in the online phase. In order to obtain this information for the quarter sample accuracy level, the online phase needs to be simulated to obtain its result in the proposed algorithm. Hence, before proceeding to the quarter sample accuracy level, the online phase for the half sample accuracy level is described. 3) Online Phase: The online phase is implemented according to Algorithm 1 and context calculation is performed as illustrated in (5). A detailed description of the algorithm is provided in Algorithm 2. It can be seen that the cost function f is replaced by the HAD distortion measure dHAD used for fractional sample positions. For the search termination criterion cri t discussed in Section III-B, a fairly simple implementation is considered. An upper bound u of ranks to be searched is used. It is possible to either use a fixed u to limit the number of fractional sample position checks, or the encoder may choose a value adaptively. Clearly, a reasonable choice depends on the application of a corresponding encoder. For instance, in times of high computational load and tight real-time requirements, an encoder may favor low values of u over large ones, while in loadwise more relaxed situations the preference would be vice versa. However, such an adaptive mechanism would require a way of monitoring the systems state, which is not part of the work presented here. In Section V-A, results for various fixed values of u are presented. C. Quarter Sample MV Refinement After having discussed the implementation for half sample MV refinement, the following provides a detailed description of the corresponding components, i.e., training set, offline phase, and online phase, for the quarter sample MV refinement. As has been mentioned, the algorithm follows a hierarchical structure; hence, quarter sample refinement is performed dependent on the result of the half sample refinement. 533 TABLE II Q UARTER S AMPLE P OSITIONS qrank,h O RDERED BY L IKELINESS TO Y IELD AN E NCODING P ERFORMANCE G AIN OVER THE I NTEGER S AM  I S S ELECTED BY P RIOR H ALF S AMPLE PLE MV ( Rank) W HEN h R EFINEMENT G IVEN THE C ONTEXT ct x (dSAD ) = 1 1) Training Set: After half sample refinement has been performed, ct x(dSAD,n ) is known for each sample in the training set N . With respect to quarter sample refinement, each sample is now characterized by four sets of eight quarter sample performance gains gn ( qi, j ) with i ranging from 1 to 8 and j ranging from 1 to 4. Due to the hierarchical nature of the algorithm, the first three sets include the performance gains for the eight quarter sample positions qi,rank adjacent to the three best-ranked half sample positions given the prior determined context ct x(dSAD,n ), i.e., hrank,ct x(dSAD,n ) with r ank ranging from 1 to 3. Using the three best-ranked half sample positions is motivated by the fact that only up to three best half sample position are checked during the online phase of the half sample MV refinement. This corresponds to limiting u to 3 in Algorithm 2. Independent of ct x(dSAD,n ), the fourth set includes the performance gains for the eight quarter sample positions qi,4 adjacent to the integer sample position m  resulting from the prior integer sample MV search. 2) Offline Phase: The average performance gain for quarter sample positions is obtained by using the information from the prior half sample step described in Section IV-B. More precisely, the average performance gain for each quarter sample position is calculated on the basis of every possible priorselected half sample position as shown in Nct x = {n|ct x(dSAD,n ) = ct x(dSAD)}  gn ( qi, j ) ḡ( qi, j |ct x(dSAD)) = n∈Nct x |Nct x | . (11) Consequently, for each training sample n ∈ N, half sample MV refinement as described in Section IV-B is simulated dependent on the calculated context ct x(dSAD,n ). In comparison to the result of the offline phase for half sample accuracy as illustrated in Table I, this results in one table for each of the eight possible outcomes of ct x(dSAD). As already mentioned in the description of the current training set, the half sample MV refinement may lead to four possible half sample positions for each context. This includes the three best-ranked half sample positions and the integer sample position in the case no better half sample position is found. Assuming that the context was calculated to be equal to 1, the resulting table for quarter sample accuracy is provided in Table II. 3) Online Phase: The implementation of the refinement on a quarter sample accuracy level is rather similar to the one for half sample accuracy as described in Section IV-B3. However, instead of only checking positions with increasing ranks from 534 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 28, NO. 2, FEBRUARY 2018 Algorithm 3 Online Phase Implementation for Quarter Sample Search h ← bestMV from Algorithm 2 bestMv ← h bestCost ← dHAD (best Mv) u←3 rank ← 1 while rank − u ≤ 0 do qrank,h ← MV from according LUT table dependent on ct x(dSAD ) (e.g. Table II for ct x(dSAD) = 1) If necessary, perform up-sampling for qrank,h newCost ← dHAD ( qrank,h ) if newCost < bestCost then bestCost ← newCost bestMV ← qrank,h end if rank ← rank + 1 end while TABLE III E VALUATED C OMMON T EST C ONDITIONS S EQUENCES [24] TABLE IV A DDITIONAL S EQUENCES U SED FOR E XPERIMENTAL E VALUATION (C LASS HD) return bestMv sample positions for that context are h1 , h 2 , and h4 . These half sample positions are indicated by green squares in Fig. 5. Hence, during half sample refinement, these three positions in addition to the initially selected integer sample position m  are checked. It is further assumed that h1 was found to be the best half sample position, i.e., its distortion dHAD(h1 ) is lower than the distortion of the other three. Since context calculation yields ct x(dSAD ) = 1, Table II is chosen to be the quarter sample refinement LUT. Given h1 as the outcome of the half sample refinement, the corresponding row is selected, which leads to the three highest-ranked quarter sample positions q8 , q7 , and q5 . These quarter sample positions are indicated by red squares in Fig. 5. If the HAD distortion of one of these three positions is lower than dHAD(h1 ), this position is selected as the final fractional MV, otherwise it is set to h1 . Fig. 5. Example of a search step performed by the introduced algorithm. the LUT row indicated by the context, the LUT needs to be selected first based on the outcome of the context calculation. Then, the corresponding LUT row is chosen based on the result  The full algorithm is outlined of the half sample refinement h. in Algorithm 3. D. Example of a Search Step To further illustrate the interaction of half and quarter sample refinement as described in Sections IV-B and IV-C , an example search step of the overall algorithm is shown in Fig. 5. In this example, it is assumed that the context calculation according to (5) yields ct x(dSAD ) = 1. The upper bound u of both the half sample and the quarter sample refinement is set to 3. According to Table I, the three highest-ranked half V. T EST M ETHODOLOGY AND E XPERIMENTAL R ESULTS The experimental results have been obtained by using the HEVC reference software HM 12.0 [25]. Although at the time of writing HM is available in version 16.10, it is important to note that this is a valid baseline since no further changes regarding fractional sample refinement were included in between these versions. The random access (RA) and low delay P (LDP) configurations have been considered, according to the HEVC common test conditions, as defined in [24] and specified in Table III. It should be noted that the Class E sequences represent typical video conferencing content, which corresponds to relatively low motion activity. Also, the Class F sequences mainly consist of computer-generated imagery content. Additionally, sequences as listed in Table IV, all of which in a resolution of 1920 × 1080 with 50 frames/s and a bit depth of 8, have been used for experimental verification. They are referred to as Class HD sequences in the remainder. MAIER et al.: CONTEXT-BASED FRACTIONAL SAMPLE REFINEMENT FOR HEVC COMPLIANT ENCODING 535 TABLE V TABLE VIII C ODING E FFICIENCY G AIN (BD-R ATE ) FOR P ERFORMING F RACTIONAL S AMPLE R EFINEMENT C OMPARED TO C ONSIDERING I NTEGER S AMPLE P OSITIONS O NLY IN HM C ODING E FFICIENCY L OSSES (BD-R ATE ) FOR THE RA C ONFIGURATION FOR S EQUENCES F ROM TABLE III AND TABLE IV C OMPARED W ITH THE I NTERPOLATION - AND -S EARCH A LGORITHM IN HM TABLE VI C ODING E FFICIENCY G AINS (BD-R ATE ) FOR THE RA C ONFIGURATION FOR S EQUENCES F ROM TABLE III AND TABLE IV C OMPARED W ITH I NTEGER ME IN HM TABLE IX C ODING E FFICIENCY L OSSES (BD-R ATE ) FOR THE LDP C ONFIGURATION FOR S EQUENCES F ROM TABLE III AND TABLE IV C OMPARED W ITH THE I NTERPOLATION - AND -S EARCH A LGORITHM IN HM TABLE VII C ODING E FFICIENCY G AINS (BD-R ATE ) FOR THE LDP C ONFIGURATION FOR S EQUENCES F ROM TABLE III AND TABLE IV C OMPARED W ITH I NTEGER ME IN HM A. Experimental Results Table V presents the corresponding coding efficiency gains for both the RA and LDP configurations by enabling the fractional sample refinement compared to considering the integer sample positions only, when applying the already mentioned interpolation-and-search method presented in Section II-A. The difference in RD performance is described by the means of the Bjøntegaard-Delta (BD)-rate as proposed in [23]. It should be noted that the RA configuration allows a weighted combination of two integer sample accuracy prediction signals (B-frames). Although both prediction signals are obtained by integer sample precision ME, the combination may represent a prediction obtained by a fractional sample displacement. In turn, this justifies the significantly higher performance gains for the LDP configuration, as shown in Table V, since in such a case only one inter-picture prediction signal is allowed (P-frame). Regarding computational complexity, this approach incorporates eight search points for the half sample refinement and additional eight points for the quarter sample accuracy search, including the corresponding up-sampling operations. In the following, experimental results by applying Algorithms 2 and 3 with u = 1, u = 2, andu = 3 under previously described test conditions are presented. Tables VI and VII demonstrate the coding efficiency gain achieved by the algorithms compared to applying integer ME only. In comparison with Table V, it can be seen that for u = 1 most of the coding efficiency gain achieved by fractional sample refinement can be preserved. For instance, under the RA configuration, HM achieves −12.4% BD-rate using fractional sample refinement compared with integer ME only for Class D. Comparing the proposed approach to integer ME only still results in −9.7% BD-rate, which is ∼ 78% of the gain achieved by the reference algorithm. Especially, for the HD resolution sequences (i.e., Classes B and HD), the proposed approach achieves −6.48% BD-rate, resulting in approximately 85% of the encoding gain obtained by applying the interpolation-andsearch algorithm. For results presented in Tables VIII and IX, the interpolation-and-search algorithm of HM was used as a baseline. As can be seen, for u = 3, only small BD-rate losses are obtained for both the RA and LDP configurations. Furthermore, it is important to note that, as has been mentioned in Section IV, the first ten frames of sequences listed in Table III formed the training set; hence, these frames were part of both the training and the testing phase which generally should be avoided. However, the number of frames was exceeded during the testing phase as all the test sequences are longer than ten frames. In order to further demonstrate that the 536 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 28, NO. 2, FEBRUARY 2018 TABLE X S EARCH P OINTS FOR C ONFIGURATIONS P RESENTED IN TABLE VIII AND TABLE IX FOR H ALF AND Q UARTER S AMPLE S EARCH E ACH approach indeed is applicable to sequences outside the training set, sequences from Table IV have been added to the testing set. Results for these are also presented separately. As can be seen, the algorithm performs as good for these sequences. The computational complexity of the proposed contextbased fractional sample refinement algorithm is a function of search points to be examined, which clearly depends on the value of u. In this respect, Table X provides an overview regarding the reduction of search points of the proposed context-based fractional sample refinement algorithm compared with the conventional interpolation-and-search method for both the RA and LDP configurations. As can be seen, the number of search points is reduced significantly. It should also be noted that, similar to approaches discussed in Section II-C, for u = 1 no interpolation is required. B. Comparison to Other Fast Fractional Sample Refinement Schemes In order to demonstrate the advantage of the proposed approach, simulations were performed to compare the coding efficiency as well as the run time of the presented example implementation with other recently proposed fast fractional sample search algorithms. For this purpose, simulations were performed for representative algorithms from categories as presented in [26] and [27]: 1) fast full search algorithms; 2) reduction of searching candidate points in search range; 3) modeling the matching error surface into mathematical models. With respect to 1), the interpolation-and-search method is considered. Corresponding results have been presented in Section V-A. Algorithms matching category 2) have been discussed in Section II-B and simulations were performed for the pattern-based approaches in the following referred to as Cross [16], 2plus3 [17], and Direction [18]. Regarding category 3), an approach approximating the error surface based on surrounding integer sample position distortions, here referred to as DField9 [20] and discussed in Section II-C, was used. The comparison of coding efficiency losses for the RA configuration is presented in Table VIII and for the LDP configuration in Table IX. With respect to the RA and the LDP configuration, it can be seen that in average only the approach referred to as Cross outperforms our approach letting u = 3 in terms of coding efficiency. However, the proposed approach performs only three checks on fractional sample positions, while for Cross at least four and in the worst case six are necessary. Also, it can be seen that using u = 1 performs better Fig. 6. Comparison of algorithms for the RA configuration using all sequences listed in Table III and Table IV averaged over all four quantization parameters in HM. than or equal to the algorithm presented in Direction both in terms of encoding efficiency and required position checks. An overview comparing the number of fractional sample positions to be checked is provided in Table X. It is important to note that for the proposed approach depending on the value of u, the number of best-case checks equals the number of worst case checks. For instance, selecting u = 2 will result in always two positions being considered. The error surface approximation algorithm denoted as DField9 requires only one position being considered, making it equal in this respect to using the approach proposed in this paper with u = 1. In the course of this paper, the error surface approximation method was implemented using SAD as the distortion measure. Besides the number of fractional sample search positions listed in Table X, the time required for fractional sample refinement is measured in the simulations as another indicator for computational complexity. A comparison of total encoding times would be less conclusive since the overall run times strongly depend on the degree of encoder optimization. HM as reference software for instance performs full RD mode decisions which leads to a very high overall encoding time. An optimized encoder would spend less time on mode decisions by applying fast decision heuristics. Hence, the contribution of fractional motion search to the overall encoding time would be larger in an optimized encoder compared with HM. To provide for a comparison of required computation time, the time spent on fractional sample ME was taken for each algorithm. Results obtained for all aforementioned algorithms have been averaged over all four quantization parameters [24]. The average results for all sequences are also illustrated in Figs. 6 and 7 in relation to coding efficiency. Speedup values obtained for the different classes in no case differed by more than 0.1 from these averages. The speedup expressed in aforementioned tables and figures was obtained using the MAIER et al.: CONTEXT-BASED FRACTIONAL SAMPLE REFINEMENT FOR HEVC COMPLIANT ENCODING Fig. 7. Comparison of algorithms for the LDP configuration using all sequences listed in Table III and Table IV averaged over all four quantization parameters in HM. interpolation-and-search method as discussed in Section II-A as a baseline. As can be seen, the pattern-based approaches respected in this simulation do not achieve a significant decrease in fractional sample ME time. However, they cause losses in BD performance. DField9 and the proposed contextbased approach on the other hand yield a good tradeoff between RD performance and computational complexity, especially with respect to high-resolution sequences. From results presented before, it becomes clear that the algorithm indeed can achieve performance comparable to the error surface approximation method. The computational overhead caused by context calculation is negligible when regarded in comparison with the overall speedup. However, as has already been argued, the context-based approach can be implemented in a much more flexible manner, allowing for an adaptive decision regarding how many samples to consider. VI. C ONCLUSION AND F UTURE W ORK In this paper, a generic context-based fractional sample MV refinement framework and an example application for an HEVC encoder are presented. The proposed example algorithm reduces the 16 fractional sample search points of the interpolation-and-search method employed in the HEVC HM reference encoder to a few most promising ones by evaluating context information based on the neighboring integer sample distortions. When only the two most promising search points are considered, i.e., a search point reduction of 87.5%, coding efficiency losses between 1.3% and 2.1% BD-rate are observed for HD test sequences. Considering the six most promising search points, i.e., the three most promising for half sample MV refinement plus the three most promising for quarter sample MV refinement, coding efficiency losses range between 0.3% and 0.4% BD-rate. Compared with stateof-the-art pattern-based fast fractional sample refinements, the 537 proposed example algorithm always provides a better tradeoff in terms of search point reduction versus coding efficiency loss. When comparing the example algorithm with fractional sample MV refinement by error surface approximation, the advantage of the proposed framework lies in its adaptivity, which allows operation points with lower coding efficiency loss at the cost of decreased speedup. For future work, this adaptivity is of particular interest, e.g., by determining the number of promising search points to be evaluated based on an optimal tradeoff between the coding efficiency loss and/or a given budget of computation time. For instance, in addition to defining a fixed upper bound for promising search points, a minimal performance gain can be defined. Then, most promising fractional sample positions given a certain context are only considered until the upper bound is reached or the current performance gain is less than the defined minimal performance gain. Also, in order to meet real-time processing constraints, an adaptive time-constrained parameter may be further employed. In this strategy, an additional search point is considered only when it is worth to invest additional processing time in terms of computational complexity. Besides adaptive termination, further investigation is encouraged to improve the context model ct x(C) which might lead to a better prediction of promising search points. Finally, the number of fractional sample positions to be searched in the proposed framework is constant. This is of particular interest for hardware implementations that always consider the worst case, i.e., the maximum number of search positions. Hence, the investigation of a corresponding hardware design is another interesting topic for future work. R EFERENCES [1] F. Bossen, B. Bross, K. Sühring, and D. Flynn, “HEVC complexity and implementation analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1685–1696, Dec. 2012. [2] B. Bross, H. Schwarz, and D. Marpe, “The new High-Efficiency Video Coding standard,” SMPTE Motion Imag. J., vol. 122, no. 4, pp. 25–35, 2013. [3] D. Marpe et al., “Video compression using nested quadtree structures, leaf merging and improved techniques for motion representation and entropy coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1676–1687, Dec. 2010. [4] P. Helle et al., “Block merging for quadtree-based partitioning in HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1720–1731, Dec. 2012. [5] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90, Nov. 1998. [6] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. [7] J. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, “Comparison of the coding efficiency of video coding standards— Including High Efficiency Video Coding (HEVC),” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1669–1684, Dec. 2012. [8] D. Grois, D. Marpe, A. Mulayoff, B. Itzhaky, and O. Hadar, “Performance comparison of H.265/MPEG-HEVC, VP9, and H.264/MPEGAVC encoders,” in Proc. Picture Coding Symp., Dec. 2013, pp. 394–397. [9] D. Grois, D. Marpe, T. Nguyen, and O. Hadar, “Comparative assessment of H.265/MPEG-HEVC, VP9, and H.264/MPEG-AVC encoders for lowdelay video applications,” Proc. SPIE, vol. 9217, p. 92170Q, Sep. 2014. [10] B. Bross et al., “HEVC performance and complexity for 4K video,” in Proc. IEEE 3rd Int. Conf. Consum. Electron. (ICCE), Berlin, Germany, Sep. 2013, pp. 44–47. 538 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 28, NO. 2, FEBRUARY 2018 [11] H. A. Choudhury and M. Saikia, “Survey on block matching algorithms for motion estimation,” in Proc. Int. Conf. Commun. Signal Process. (ICCSP), Apr. 2014, pp. 036–040. [12] G. Maier et al., “Pattern-based integer sample motion search strategies in the context of HEVC,” Proc. SPIE, vol. 9599, p. 95991A, Sep. 2015. [13] D. Kang, Y. Kang, and Y. Hong, “VLSI implementation of fractional motion estimation interpolation for High Efficiency Video Coding,” Electron. Lett., vol. 51, no. 15, pp. 1163–1165, 2015. [14] G. Sanchez, M. Corrêa, D. Noble, M. Porto, S. Bampi, and L. Agostini, “Hardware design focusing in the tradeoff cost versus quality for the H.264/AVC fractional motion estimation targeting high definition videos,” Analog Integr. Circuits Signal Process., vol. 73, no. 3, pp. 931–944, 2012. [15] G. He, D. Zhou, Y. Li, Z. Chen, T. Zhang, and S. Goto, “Highthroughput power-efficient VLSI architecture of fractional motion estimation for ultra-HD HEVC video encoding,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 12, pp. 3138–3142, Dec. 2015. [16] Y. Zhang, W.-C. Siu, and T. Shen, “Fast sub-pixel motion estimation based on directional information and adaptive block classification,” in Proc. 5th Int. Conf. Vis. Inf. Eng. (VIE), Jul./Aug. 2008, pp. 622–627. [17] H. Nisar and T.-S. Choi, “Fast and efficient fractional pixel motion estimation for H.264/AVC video coding,” in Proc. 16th IEEE Int. Conf. Image Process. (ICIP), Nov. 2009, pp. 1561–1564. [18] Z. Wei, F. Fen, W. Xiaoyang, and Z. Weile, “Directionality based fast fractional pel motion estimation for H.264,” J. Syst. Eng. Electron., vol. 20, no. 3, pp. 457–462, Jun. 2009. [19] T. Sotetsumoto, T. Song, and T. Shimamoto, “Low complexity algorithm for sub-pixel motion estimation of HEVC,” in Proc. IEEE Int. Conf. Signal Process., Commun. Comput. (ICSPCC), Aug. 2013, pp. 1–4. [20] J. W. Suh and J. Jeong, “Fast sub-pixel motion estimation techniques having lower computational complexity,” IEEE Trans. Consum. Electron., vol. 50, no. 3, pp. 968–973, Aug. 2004. [21] W. Lin, K. Panusopone, D. M. Baylon, M.-T. Sun, Z. Chen, and H. Li, “A fast sub-pixel motion estimation algorithm for H.264/AVC video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 2, pp. 237–242, Feb. 2011. [22] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 620–636, Jul. 2003. [23] G. Bjøntegaard, Calculation of Average PSNR Differences Between RD-Curves, document VCEG-M33, ITU-T, 2001. [24] F. Bossen, Common Test Conditions and Software Reference Configurations, document JCTVC-L1100, Joint Collaborative Team on Video Coding (JCT-VC), 2013. [25] Subversion Repository for the HEVC Test Model (HM) Reference Software, accessed on Nov. 29, 2013. [Online]. Available: https://hevc.hhi. fraunhofer.de/svn/svn_HEVCSoftware [26] X. Liyin, S. Xiuqin, and Z. Shun, “A review of motion estimation algorithms for video compression,” in Proc. Int. Conf. Comput. Appl. Syst. Modeling (ICCASM), Taiyuan, China, 2010, pp. V2-446–V2-450. [27] S. Ashwin, S. J. Sree, and S. A. Kumar, “Study of the contemporary motion estimation techniques for video coding,” Int. J. Recent Technol. Eng., vol. 2, no. 1, pp. 190–194, Mar. 2013. Georg Maier received the M.Sc. degree in computer science in the program game and media technology from Utrecht University, Utrecht, The Netherlands, in 2013. He was a Research Assistant with the Image and Video Coding Group, Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, Germany, where he was involved in motion estimation and efficient implementations thereof for the H.265/MPEG High Efficiency Video Coding standard. Since 2014, he has been with the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation, (Fraunhofer IOSB), Karlsruhe, Germany, where he has been involved in visual inspection systems for the automated sorting of bulkgoods. His research interests include different aspects of image processing, in particular algorithmic aspects, with a focus on real-time capabilities. Benjamin Bross (S’11) received the Dipl.-Ing. degree in electrical engineering from RWTH Aachen University, Aachen, Germany, in 2008, with a focus on 3D image registration in medical imaging and on decoder side motion vector derivation in H.264/MPEG-4 Advanced Video Coding. In 2009, he joined the Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, Germany, where he is currently a Project Manager with the Video Coding and Analytics Department. Since the development of the new H.265/MPEG High Efficiency Video Coding (HEVC) standard, which started in 2010, he was very actively involved in the standardization process as a Technical Contributor and Coordinator of core experiments. In 2011, he was a parttime Lecturer with the HTW University of Applied Sciences, Berlin. In 2012, he was appointed as a Co-Chair with the Editing Ad Hoc Group and became the Chief Editor of the HEVC video coding standard. He is currently with the Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, where he is responsible for developing HEVC conforming real-time encoders and decoders as well as investigating new video coding techniques for the next generation of video coding standards. He has authored or co-authored several fundamental HEVC-related publications, and authored two book chapters on HEVC and Inter-Picture Prediction Techniques in HEVC. Mr. Bross received the IEEE Best Paper Award at the 2013 IEEE International Conference on Consumer Electronics, Berlin, and the SMPTE Journal Certificate of Merit in 2014. Dan Grois (SM’11) received the Ph.D. degree from the Communication Systems Engineering Department, Ben-Gurion University of the Negev (BGU), Beersheba, Israel, 2011. From 2011 to 2013, he was a Senior Researcher with the Communication Systems Engineering Department, BGU. Since 2013, he has been a PostDoctoral Senior Researcher with the Video Coding and Analytics Department, Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, Germany. He has authored or co-authored about 40 publications in the area of image/video coding and data processing, which have been presented at the top-tier international conferences and were published in various scientific journals and books. His research interests include image and video coding and processing, video coding standards, particularly H.265/MPEG High Efficiency Video Coding, region-of-interest scalability, computational complexity and bit-rate control, network communication and protocols, and future multimedia applications/systems. Dr. Grois is a member of the ACM and SPIE societies. He was received various fellowships, including Kreitman Fellowships and the ERCIM Alain Bensoussan Fellowship, which was provided by the FP7 Marie Curie Actions COFUND Programme. He is currently a fellow of the PROVISION ITN Project, which is a part of the European Unions Marie Skodowska-Curie Actions of the European Commission. He is a Referee of top-tier conferences and international journals, such as IEEE T RANSACTIONS ON I MAGE P ROCESSING, IEEE T RANSACTIONS ON M ULTIMEDIA, IEEE T RANSAC TIONS ON S IGNAL P ROCESSING , Journal of Visual Communication and Image Representation (Elsevier), IEEE S ENSORS , and Optical Engineering (SPIE). In 2013, he also served as the Guest Editor of Optical Engineering (SPIE). MAIER et al.: CONTEXT-BASED FRACTIONAL SAMPLE REFINEMENT FOR HEVC COMPLIANT ENCODING Detlev Marpe (M’00–SM’08–F’15) received the Dipl.-Math. degree (Hons.) from Technical University of Berlin, Berlin, Germany, in 1990 and the Dr.-Ing. degree from University of Rostock, Rostock, Germany, in 2004. He joined Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, in 1999, where he is currently the Head of the Video Coding & Analytics Department and Head of the Image and Video Coding Research Group. He was a major Technical Contributor to the entire process of the development of the H.264/MPEG-4 Advanced Video Coding (AVC) standard and the H.265/MPEG High Efficiency Video Coding (HEVC) standard, including several generations of major enhancement extensions. In addition to the CABAC contributions for both standards, he particularly contributed to the Fidelity Range Extensions (which include the High Profile that received the Emmy Award in 2008) and the Scalable Video Coding Extensions of H.264/MPEG-4 AVC. During the recent development of its successor H.265/MPEG-HEVC, he also successfully contributed to the first model of the corresponding standardization project and further refinements. He also made successful proposals to the standardization of its Range Extensions and 3D Extensions. His academic work includes over 200 publications in image and video coding. He holds over 250 internationally issued patents and numerous patent applications in this field. His research interests include still image and video coding, signal processing for communications and computer vision, and information theory. Dr. Marpe is a member of the Informationstechnische Gesellschaft of the Verband der Elektrotechnik Elektronik Informationstechnik e.V. He was a co-recipient of two Technical Emmy Awards as a Key Contributor and a Co-Editor of the H.264/MPEG-4 AVC standard in 2008 and 2009, respectively. He received the IEEE Best Paper Award at the 2013 IEEE International Conference on Consumer Electronics, Berlin, and the SMPTE Journal Certificate of Merit in 2014. He was nominated for the German Future Prize in 2012. He was a recipient of the Karl Heinz Beckurts Award in 2011, the Best Paper Award of the IEEE Circuits and Systems Society in 2009, the Joseph von Fraunhofer Prize in 2004, and the Best Paper Award of the Informationstechnische Gesellschaft in 2004. As a Co-Founder of the Berlin-based daviko GmbH, he received the Prime Prize of the Multimedia Start-Up Competition of the German Federal Ministry of Economics and Technology in 2001. Since 2014, he has been an Associate Editor of IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY. Heiko Schwarz received the Dipl.-Ing. degree in electrical engineering and the Dr.-Ing. degree from the University of Rostock, Rostock, Germany, in 1996 and 2000, respectively. In 1999, he joined the Image and Video Coding Group, Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, Germany. He has contributed successfully to the standardization activities of the ITU-T Video Coding Experts Group (ITU-T SG16/Q.6-VCEG) and the ISO/IEC Moving Pictures Experts Group (ISO/IEC JTC 1/SC 29/WG 11- MPEG). Dr. Schwarz has been appointed as a Co-Editor of ITU-T H.264 and ISO/IEC 14496-10 and as a Software Coordinator for the SVC reference software. During the development of the scalable video coding extension of H.264/AVC, he co-chaired several ad hoc groups of the Joint Video Team of ITU-T VCEG and ISO/IEC MPEG, investigating the particular aspects of the scalable video coding design. 539 Remco C. Veltkamp obtained a M.Sc. degree in computer Science at Leiden University, and a Ph.D. degree in computer science at Erasmus University Rotterdam, the Netherlands. He is currently a Full Professor of Multimedia with Utrecht University, Utrecht, The Netherlands. He has authored over 150 refereed papers in reviewed journals and conferences, and supervised 15 Ph.D. thesis. His research interests include the analysis, recognition and retrieval of, and interaction with, music, images, and 3D objects and scenes, in particular the algorithmic and experimentation aspects. Thomas Wiegand (M’05–SM’08–F’11) received the Dipl.-Ing. degree in electrical engineering from Technical University of Hamburg, Hamburg, Germany, in 1995 and the Dr.-Ing. degree from University of Erlangen–Nuremberg, Erlangen, Germany, in 2000. He was a Visiting Researcher with Kobe University, Kobe, Japan; University of California at Santa Barbara, Santa Barbara, CA, USA; and Stanford University, Stanford, CA, USA, where he also returned as a Visiting Professor. He was a Consultant with Skyfire, Inc., Mountain View, CA, USA. Since 1995, he has been an active participant in standardization for multimedia with many successful submissions to ITU Telecommunication Standardization Sector and International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC). In 2000, he was appointed as the Associated Rapporteur of the ITU-T Video Coding Experts Group. He is currently a Professor with the Department of Electrical Engineering and Computer Science, Technical University of Berlin, Berlin, Germany. He is also the Head of the Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin. He is also a Consultant with Vidyo, Inc., Hackensack, NJ, USA. Dr. Wiegand was a recipient of the ITU150 Award. Thomson Reuters named him in their list of the World’s Most Influential Scientific Minds 2014 as one of the most cited researchers in his field. The projects that he co-chaired for the development of the H.264/Moving Pictures Experts Group–Advanced Video Coding standard have been recognized by the ATAS Primetime Emmy Engineering Award and a pair of the NATAS Technology and Engineering Emmy Award. For his research in video coding and transmission, he received numerous awards, including the Vodafone Innovations Award, the EURASIP Group Technical Achievement Award, the Eduard Rhein Technology Award, the Karl Heinz Beckurts Award, the IEEE Masaru Ibuka Technical Field Award, and the IMTC Leadership Award. He received multiple best paper awards for his publications. From 2005 to 2009, he was the Co-Chair of the ISO/IEC MPEG Video.