DOC

Joint Video Team (MPEG+ITU) Document

By Jerry Taylor,2014-10-25 09:43
11 views 0
Joint Video Team (MPEG+ITU) Document

Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG Document: JVT-H014

    (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) Filename: JVT-H014.doc th8 Meeting: Geneva, May 20-26,2003

Title: Adaptive Rate Control with HRD Consideration

    Status: [Input Document to JVT]

    Purpose: [ Proposal]

    Author(s) or Zhengguo Li, Wen Gao, Feng Pan,

    Contact(s): Siwei Ma, Keng Pang Lim, Genan Tel:

    Feng, Xiao Lin, Susanto Rahardja, Email:

    Hanqin Lu and Yan Lu

    Institute of Computing Technology, Chinese Academy of Sciences, Source:

    Beijing, 100080, China

    Institute for InfoComm Research, 21 Heng Mui Keng Terrace, Singapore,

    119613

    _____________________________

    (Begin text of document here: 11-point font is suggested for short documents, 10 for long ones)

    1. Introduction

    An encoder employs rate control as a way to regulate varying bit rate characteristics of the coded bitstream in order to produce high quality decoded frame at a given target bit rate. Rate control is thus a necessary part of an encoder, and has been widely studied in standards, like MPEG 2, MPEG 4, H.263, and so on [1,2,3,4,5]. However, the rate control for JVT is more difficult than those for other standards [11,12,13]. This is because the quantization parameters are used in both rate control algorithm and rate distortion optimization (RDO), which resulted in the following chicken and egg dilemma when the rate control is studied: to perform RDO for macroblocks (MBs) in the current frame, a quantization parameter should be first determined for each MB by using the mean absolute difference (MAD) of current frame or MB [1,2,7,9]. However, the MAD of current frame or MB is only available after the RDO. Moreover, the available channel bandwidth for the coding process can be either constant or time varying. We thus need to consider both the constant bit rate (CBR) case and the variable bit rate (VBR) case. However, the existing schemes focus on the CBR case [1,2,3,4].

    In this proposal, we present an adaptive rate control scheme for JVT by introducing a concept of basic unit and a linear model. The basic unit can be a frame, a slice, or an MB. The linear model is used to predict the MAD of current basic unit in the current frame by that of the basic unit in the co-located position of the previous frame. The chicken and egg dilemma is solved as follows: the target bits for the current frame are computed by adopting a leaky bucket model and linear tracking theory according to the predefined frame rate, the current buffer occupancy, the File:299902603.doc Page: 1 Date Saved: 2011-12-03

    target buffer level and the available channel bandwidth [5]. To be confirmed with the HRD, the target bits are further bounded. The target bits are allocated to all non-coded basic units in the current frame equally because the MADs of non-coded basic units are not known. The MAD of current basic unit is predicted by the linear model using the actual MAD of basic unit in the co-located position of previous frame. A quadratic rate-distortion (R-D) model [1,2] is used to calculate the corresponding quantization parameter, which is then used for the rate distortion optimization for each MB in the current basic unit. We focus on the VBR case while our scheme performs equally well in the CBR case.

    To verify our scheme, we test our scheme in both the VBR case and CBR case. The bit rate curve for the VBR case is a predefined curve. It is shown that the number of actual generated bits is kept close to the bit rate curve and the buffer is neither underflowed nor overflowed. For the CBR case, we compare the coding efficiency of an encoder by using our rate control scheme with that of an encoder using a fixed quantization parameter. The target bit rate is generated by coding a test sequence with a fixed quantization parameter. The computed rate is then specified in the encoder using our rate control scheme. The coding efficiency is improved by up to 0.99dB by our scheme, and the average PSNR of all testing sequences is improved by 0.32dB.

    2. Preliminary Knowledge

    In this section, we shall present the problem associated with the rate control for H.264.

    2.1 The Chicken and Egg Dilemma

The coding process of a MB related to the rate control is given by

     Rate Control? Quantization Parameter?RDO?MAD?Coding

    Since quantization parameters are specified in both rate control and RDO, there exists a problem when the rate control is implemented: to perform RDO for a MB, a quantization parameter should be first determined for the MB by using the MAD of MB. However, the MAD of current MB is only available after performing the RDO. This is a typical chicken and egg dilemma. Because of this, the rate control for H.264 is more difficult than those for MPEG 2, MPEG 4 and H.263. To study the rate control for H.264, we need to solve the problem to estimate the MAD of current MB. Besides this, we also need to compute a target bitrate for the current MB and to determine the number of contiguous MBs that share the same quantization parameter. To solve these problems, we need the following preliminary knowledge.

    File:299902603.doc Page: 2 Date Saved: 2011-12-03

    2.2 Definition of A Basic Unit

The concept of a basic unit is defined by

Definition 1 Suppose that a frame is composed of MBs. A basic unit is defined to be a Nmbpic

    group of contiguous MBs which is composed of MBs where is a fraction of NNmbunitmbunit

    . Nmbpic

Denote the total number of basic units in a frame by , which is computed by Nunit

    Nmbpic (1) NunitNmbunit

    Examples of a basic unit can be an MB, a slice, a field, or a frame. For example, consider a video sequence with QCIF size, is 99. According to Definition 1,t can be 1, 3, 9, 11, 33, NNmbpicmbunit

    or 99. The corresponding is 99, 33, 11, 9, 3, and 1, respectively. Nunit

    It is noted that by employing a bigger basic unit, a higher PSNR can be achieved while the bit fluctuation is also bigger. On the other hand, by using a small basic unit, the bit fluctuation is less severe, but with slight loss in PSNR.

    2.3 A Fluid Flow Traffic Model

    We shall now present a fluid flow traffic model to compute the target bit for the current coding frame. Let denote the total number of frames in a group of picture (GOP), Ngop

    n(i1,2,;,j1,2,;,N) denote the jth frame in the ith GOP, and denote the B(n)i,jgopci,j

    occupancy of virtual buffer after coding the jth frame. We then have

    u(n)i,jB(n)B(n);b(n)ci,j;1ci,ji,jFr

     (2) B(n)0c1,1

    B(n)B(n)ci;1,0ci,Ngop

File:299902603.doc Page: 3 Date Saved: 2011-12-03

where b() is the number of bits generated by the jth frame in the ith GOP, u() is the nni,ji,j

    available channel bandwidth which can be either a VBR or a CBR, and is the predefined Fr

    frame rate.

    2.4 A Linear Model for MAD Prediction

    We now introduce a linear model to predict the MADs of current basic unit in the current frame by that of the basic unit in the co-located position of the previous frame. Suppose that the predicted MAD of current basic unit in the current frame and the actual MAD of basic unit in the co-located position of previous frame are denoted by and , respectively. The linear MADMADpbcb

    prediction model is then given by

     (3) MADa?MAD;acb1pb2

    where and are two coefficients of prediction model. The initial value of and are set aaaa1212

    to 1 and 0, respectively. They are updated after coding each basic unit. The linear model (3) is proposed to solve the chicken and egg dilemma.

    2.5 HRD Consideration

    In this subsection, a lower bound and a upper bound for the target bits of each frame are determined by considering the hypothetical reference decoder (HRD)[6][7]. The lower bound and the upper bound for the nth frame are denoted by and , respectively. It is U(n)L(n)i,ji,j

    also shown that the HRD is conformed if the actual frame size is always within the bound ,(. L(n), U(n)i,ji,j

Let denote the removal time of the jth frame in the ith GOP. Also let be the bit t(n)be(t)ri,j

    equivalent of a time , with the conversion factor being the buffer arrival rate [8]. The initial t

    values of the upper bound and the lower bound are given as follows:

    u(n)i,0L(n)T(n);i,1ri,0F (4) r

    U(n)(T(n);be(t(n)))?i,1ri,0r1,1

    where T(n) is the remaining bits of the (i-1)th GOP and T(n)0. The value of is ri,0r1,0

    0.9.

    L(n) and U(n) (i1,2,;,j2,;,N)are computed iteratively as follows: i,ji,jgop

    File:299902603.doc Page: 4 Date Saved: 2011-12-03

    u(n)i,j1L(n)L(n);b(n)i,ji,j1i,j1Fr (5) u(n)i,j1U(n)U(n);(b(n))?i,ji,j1i,j1Fr

where is the buffer size of CPB, and is the buffer size and its maximum value is BBss

    determined based on different level and different profile [7].

    With the concept of basic unit, models (2) and (3), the steps in our scheme are given as follows: 1. Compute a target bit for the current frame by using the fluid traffic model (2), linear tracking theory [5] and the bounds (4) and (5).

    2. Allocate the remaining bits to all non-coded basic units in the current frame equally. 3. Predict the MAD of current basic unit in the current frame by the linear model (3) using the actual MAD of basic unit in the co-located position of previous frame.

    4. Compute the corresponding parameter by using the quadratic R-D model [1,2]. 5. Perform RDO for each MB in the current basic unit by the quantization parameter derived from step 4 [8,9].

    Our proposed rate control scheme is composed of two layers: GOP layer rate control and frame layer rate control if the basic unit is selected as a frame. Otherwise, an additional basic unit layer rate control should be added. They will be presented in detail in the following sections.

    3. GOP Layer Rate Control

In this layer, we need to compute the total number of remaining bits for all non-coded frames Tr

    in each GOP and to determine the starting quantization parameter of each GOP. Same as [5], we assume that the GOP structure is IBBPBBP... P or IPPP…P, with I being an intra-coded picture, P

    being a forward predicted picture and B being a bi-directional predicted picture. The length of a GOP is usually 15-30 [4].

    3.1 Total Number of Bits

    In the beginning of the ith GOP, the total number of bits allocated for the ith GOP is computed as follows:

    u(n)i,1 (6) T(n)?NB(n)ri,0gopci1,NgopFr

    File:299902603.doc Page: 5 Date Saved: 2011-12-03

Since the channel bandwidth may vary at any time, is updated frame by frame as follows: Tr

    u(n)u(n)i,ji,j1 (7) T(n)T(n);?(Nj)b(n)ri,jri,j1gopi,j1Fr

In the case of CBR, i.e. , Equation (7) is simplified as u(n)u(n)i,ji,j1

     (8) T(n)T(n)b(n)ri,jri,j1i,j1

    In other words, Equation (7) is also applicable to the CBR case.

    3.2 Starting Quantization Parameter of Each GOP

In our scheme, the starting quantization parameter of the first GOP is a predefined quantization

    parameter . The I frame and the first P frame of the GOP are coded by . is QPQPQP000

    predefined based on the available channel bandwidth and the GOP length. Normally, a small QP0should be chosen if the available channel bandwidth is high and a big should be used if it is QP0

    low. Under the same bandwidth, reduces by 1 if the GOP length increases by 30. can QPQP00be set manually or computed by[6]:

    bppl40~1)

    ?lbppl301~2? (9) QP?0lbppl202~3?

    ?bppl103

    bppu(n)/FN (0,0)rpixel

    where bpp is the bits per pixel. N is the number of pixel in a frame. Recommended values pixel

    l1=0.15,l2=0.45,l3=0.9 for QCIF/CIF and l1=0.6,l2=1.4,l3=2.4 for the image size bigger than CIF.

    The starting quantization parameter of other GOPs is computed by QPst

    8?()TnSumNri1,NPQPgopgop (10) min{2,}QPst()15NTnpri,0

    where N is the total number of P frame in the previous GOP and is the sum of SumpPQP

    QPquantization parameters for all P frames in the previous GOP. Same as , QP is adaptive to 0st

    the GOP length and the available channel bandwidth.

The I frame and the first P frame are coded using . QPst

    File:299902603.doc Page: 6 Date Saved: 2011-12-03

    4. Frame Layer Rate Control

    The frame layer rate control scheme consists of two stages: pre-encoding and post-encoding.

    4.1 Pre-Encoding Stage:

    The objective of this stage is to compute quantization parameter for all frames. We shall first provide a simple method to compute the quantization parameters of B frames.

     4.1.1 Quantization parameters of B frames

    Since B frames are not used to predict any other frame, the quantization parameters can be greater than those of their adjacent P or I frames such that the bits could be saved for I and P frames. On the other hand, to maintain the smoothness of visual quality, the difference between the quantization parameters of two adjacent frames should not be greater than 2. Based on the observations, the quantization parameters of B frames are obtained through a linear interpolation method as follows:

    Suppose that the number of successive B frames between two P frames is L and the quantization parameters are and , respectively. The quantization parameter of the ith B frame is QPQP12

    calculated according to the following two cases:

    Case 1. L=1. In other words, there is only one B frame between two P frames. The quantization parameter is computed by

    QP;QP;2)12~?QPQP if ?12 (11) QB?12

    ?QP;2Otherwise1

    Case 2 L>1. In other words, there are more than one B frame between two P frames. The quantization parameters are computed by

    (QPQP)~21 (12) QBQP;;max{min{,2?(i1)},2?(i1)}i1L1

    where is the difference between the quantization parameter of the first B frame and , and QP1

    is given by

    File:299902603.doc Page: 7 Date Saved: 2011-12-03

    QPQPL~2?33)21?QPQPL2?2221?

    ?QPQPL2?1121 (13) ?QPQPL2?0?21

    ?QPL1QP2?;121?2Otherwise

The case that can only occur at time instant the video sequence switches QPQP2?L;121

    from one GOP to another GOP.

The final quantization parameter is further adjusted by QBi

    ~ (14) QBmin{max{QB,1},51}ii

4.1.2 Quantization Parameters of P Frames

    The quantization parameters of P frames are computed via the following two steps:

    Step 1 Determine a target bit for each P frame. Step 1 is composed of the following two sub-steps.

    The bits allocated to the current picture should be adjusted according to the current buffer occupancy and the picture complexity.

Step 1.1 Determination of target buffer occupancy.

    We predefine a target buffer level for each frame according to the frame sizes of the first I frame and the first P frame, and the average picture complexity. The function of the target buffer level is to compute a target bit for each P frame, which is then used to compute the quantization parameter. Since the quantization parameter of the first P frame is given at the GOP layer, we only need to predefine target buffer levels for other P frames in each GOP.

    After coding the first P frame in the ith GOP, we reset the initial value of target buffer level as Tbl(n)B(n) (15) i,2ci,2

    where is the actual buffer occupancy after coding the first P frame in the ith GOP. B(n)ci,2

The target buffer level for the subsequent P frames is determined by

    File:299902603.doc Page: 8 Date Saved: 2011-12-03

    ~W(n)(L1)u(n)u(n)?;?Tbl(n)pijijiji,,,,2 (16) Tbl(n)Tbl(n);~~ijij;,1,N1FF(W(n)W(n)L)?;?prrpijbij,,

    ~~where is the average complexity weight of P pictures, is the average W(n)W(n)pi,jbi,j

    ~~complexity weight of B pictures, and is the target buffer level. and are WTbl(n)Wpi,jb

    computed by

    ~Wn?Wn()7()~pi,jpi,j1Wn;()pi,j88

    ~Wn?Wn()7()~bi,jbi,j1Wn;()bi,j (17) 88

    Wnbn?QPn()()()pi,ji,jpi,j

    bn?QPn()()i,jbi,jWn()bi,j1.3636

     and are the corresponding quantization parameters. QPQPpb

    In the case that there is no B frame between two P frames, Equation (15) can be simplified as

    Tbl(n)i,2 (18) Tbl(n)Tbl(n)ij;ij,1,N1p

It can be easily shown that Tbl(n) is about 0. Thus, if the actual buffer fullness is exactly the i,Ngop

    same as the predefined target buffer level, it can be ensured that each GOP uses its own budget. However, since the rate-distortion (R-D) model and the MAD prediction model are not accurate [1,2], there usually exists a difference between the actual buffer fullness and the target buffer level. We therefore need to compute a target bit for each frame to reduce the difference between the actual buffer fullness and the target buffer level. This is achieved by the following microscopic control.

Step 1.2 Microscopic control (target bit rate computation).

    Using linear tracking theory [8], the target bits allocated for the jth frame in the ith GOP is determined based on the target buffer level, the frame rate, the available channel bandwidth and the actual buffer occupancy as follows:

    u(n)~i,j (19) f(n);?(Tbl(n)B(n))i,ji,jci,jFr

    File:299902603.doc Page: 9 Date Saved: 2011-12-03

    wherein is a constant and its typical value is 0.5 when there is no B frame and 0.25 otherwise. If the actual number of generated bits is around the target, it can be easily shown that

     (20) B(n)Tbl(n)((1)?(B(n)Tbl(n))ci,j;1i,j;1ci,ji,j

    Therefore, a tight buffer regulation can be achieved by choosing a large .

    Meanwhile, the number of remaining bits should also be considered when the target bit is computed.

    W(n)?T(n)~pi,j1ri,j (21) f(n)i,jW(n)?N(j1);W(n)?N(j1)pi,j1p,rbi,j1b,r

    If the last frame is complex and uses excessive bits, more bits should be assigned to this frame.

    ~ˆThe target bit is a weighted combination of and : f(n)f(n)i,ji,j

    ~ˆ (22) f(n)?f(n);(1)?f(n)i,ji,ji,j

    wherein is a constant and its typical value is 0.75 when there is no B frame and is 0.9 otherwise.

    Normally, most frame will overuse its budget. To smooth the visual quality, the target bit is further adjusted by

     (23) f(n)(1L?0.05)?f(n)i,ji,j

    To be confirmed with the HRD, the target bit is further bounded by

    f(n)max{L(n), f(n)}i,ji,ji,j (24) f(n)min{U(n), f(n)}i,ji,ji,j

     Step 2 Compute the quantization parameter and perform RDO.

    The MAD of current P frame is predicted by model (2) using the actual MAD of previous P frame.

    ˆThe quantization parameter Q corresponding to the target bit is then computed by using the pc

    quadratic model provided in [1,2]. The details on this can be found in [1,2,5], it is thus not elaborated in this section. To maintain the smoothness of visual quality among successive frames,

    ~Qthe quantization parameter is adjusted by pc

    ~ˆQmin{Q;2,max{Q2,Q}} (25) pcpppppc

    where Q is the quantization parameter of the previous P frame. pp

The final quantization parameter Q is further bounded by pc

    ~Qmin{51,max{Q,1}} (26) pcpc

    File:299902603.doc Page: 10 Date Saved: 2011-12-03

Report this document

For any questions or suggestions please email
cust-service@docsford.com