Syntactic Evaluation of MPEG-4 Error Resilient Data Partitioning Mode

By Ronald Gonzalez,2014-05-07 12:36
7 views 0
Syntactic Evaluation of MPEG-4 Error Resilient Data Partitioning Mode

    An Alternative to the MPEG-4 Object-Based Error Resilient Video Syntax

    Luis Ducla Soares, Fernando Pereira

    Instituto Superior Técnico Instituto de Telecomunicações


    from anywhere, at any time, in anyway. Due to their Abstract growing relevance in the telecommunications arena,

    mobile networks play here a major role since they are This paper addresses object-based error resilient characterized by critical bandwidth and error video source coding. Specifically, the current error characteristics. To address the universal access resilient object-based syntax used in MPEG-4 is requirement, MPEG-4 followed two main approaches: i) compared to a proposed alternative syntax. Some results using a video representation scheme which is are presented for two MPEG-4 test sequences, and for intrinsically scaleable, notably in terms of content two relevant error conditions. The results are presented (number of objects), since for the first time the scene is in terms of two distortion measures, one for texture and represented as a set of objects, with more or less one for shape. relevance, and thus the sender has the capability to send

    more or less scene content, depending on the media 1. Context and Motivation

    characteristics; ii) using error resilient source coding, In the last few years, many new digital multimedia controlled at the object level, giving the encoder the communication services and devices, such as possibility to selectively protect the video objects, videotelephony, videoconference and digital television, depending on their relevance and on the media have appeared in the market. However all the video characteristics. Although error resilient video coding representation standards being used still follow an image involves many aspects, notably encoding, error detection, model similar to the one used in the previous analog error localization and error concealment, only the standards, i.e. a periodic sequence of frames or fields. bitstream syntax and the error-free decoding procedure is These “traditional” standards, encoding video data by standardized. The other error resilience areas do not exploring its statistical properties, are well known as impact the interoperability and thus are left free for “frame-based standards”, e.g. H.261, H.263, MPEG-1, competition and continuous improvement. MPEG-2. The increasing availability of digital This paper intends to address the problem of error technology, both hardware and software based, and the resilient video source coding, notably by evaluating and fast evolution of information representation techniques is proposing an alternative to the MPEG-4 object-based pushing towards new applications characterized by a set error resilient video syntax. of video representation requirements where interactivity,

    flexibility, reusability of data, integration of natural and 2. MPEG-4 Object-Based Error Resilient synthetic content and universal access play a major role, Video Coding independently of the bitrate to be used ?1?.

    To provide an answer to these emerging needs, MPEG MPEG-4 allows representing 2D arbitrarily shaped is in the process of specifying a new audiovisual natural video objects by means of a sequence of Video 1-12 bits 5 bits 1 or X M bits N bits 17-23 bits 17 bits 17-23 bits

    representation standard, known as MPEG-4, where a Object Planes (VOP). For each VOP, shape, texture and bits

    motion data is encoded. Texture coding is DCT based completely new video data model is used ?2?. A video MBA QP HEC Shape & Motion Data Texture Resync Motion Resync (8x8 blocks) and motion compensation is macroblock or scene will no longer be a sequence of frames but a MB by MB Data Marker Marker Marker block based (16x16 or 8x8 blocks). As for the shape, it is composition of objects with specific characteristics and Figure 1 - Syntax of the Combined Mode with Data Partitioning encoded using Content-based Arithmetic Encoding (CAE) behavior, notably in space and time, supporting a new

    [2]. Since the various scene objects are independently range of functionalities related to content-based

    encoded, generating separate elementary streams, it is interaction and manipulation, improved compression and

    possible to selectively protect each object by adjusting the universal access. In the context of MPEG-4, universal

    level of protection applied to its VOP. An adequate access means audiovisual information easily accessible

    distribution, among the various objects, of the resources the VOP coding type and the f_codes, which determine available for error resilience, depending on each object’s the motion vector search range. relevance and on the media characteristics, should allow Based on the resynchronization packet approach, two maximizing the final subjective impact. object-based error resilient video coding modes were

    specified by MPEG-4, depending on the way the Due to the need of providing good performance in

    information is structured within the packets: error prone environments, such as mobile networks, the

    MPEG-4 Visual standard specifies the syntax and * Combined Mode (without data partitioning) Within

    semantics for some error resilience tools to be used at the each resynchronization packet, the texture, motion

    and shape coded data is multiplexed at the macroblock source coding level ?2?. The precise way to use these tools,

    level. this means to encode, is left to the implementers who

    need to know how to make the best use of them. These * Combined Mode with Data Partitioning While the error resilience capabilities provided in the source coding shape and motion data is multiplexed at the layer, complement the error protection capabilities macroblock level, the texture data is multiplexed at supported by means of channel coding in the transport the resynchronization packet level, such as indicated layer, which is not standardized by MPEG. in Figure 1. In principle, depending on the granularity used to In Figure 1, MBA is the macroblock address, QP is multiplex the texture, motion and shape data for each the quantization parameter and HEC the Header object, two types of object-based bitstream syntax can be Extension Code, which tells the decoder if more header considered: data will follow. The combined mode with data

    partitioning syntax is a sequence of resynchronization * Combined Mode - texture, motion and shape data is

    packets, where the DCT coefficients are clustered multiplexed at the macroblock level.

    together, after the header information and shape and * Separate Mode texture, motion and shape data is motion data. The motion marker allows knowing where multiplexed at the VOP level, this means that, within shape and motion data stops and texture data begins. A each VOP, first all the shape data is sent, then all the detailed description of the syntax above is included in [2]. motion data and finally all the texture data.

    The combined mode with data partitioning allows to Although the separate mode looks particularly take advantage of the clustering of texture data, notably adequate to implement a selective protection of the by specifying an optional set of Reversible VLC (RVLC) various types of data since they are separate at the VOP tables, providing additional error resilience capabilities. level, other issues such as bitrate overhead, delay, coding Although the combined mode with data partitioning may schemes, etc. strongly influence the choice of the be less compression efficient, notably due to the inclusion bitstream syntax. As will be seen in the following, of the motion marker, it allows the use of more powerful MPEG-4 defined two object-based error resilient error resilience concealment schemes, providing a better bitstream syntaxes, one basically corresponding to the compromise between error resilience and compression combined mode and another corresponding to a hybrid of efficiency. Although there was the impression that the the combined and separate modes. combined mode with data partitioning would be much The main idea (and tool) behind error resilient source more complex than the combined mode, thus becoming a coding in MPEG-4 is to split the VOP information into burden for the decoders, a more careful evaluation made independent resynchronization packets, separated by by the MPEG Implementation group showed the opposite. resynchronization markers, thus preventing error Following this conclusion, it shall be possible to encode propagation from one packet to another. Although any MPEG-4 video object in either one of the two error macroblocks cannot be split between two packets, the resilient modes described above, becoming an encoder packet size (and thus the frequency of the issue the choice of the syntax to use. resynchronization markers) is an encoder issue. The Since the combined mode with data partitioning resynchronization markers are followed by a few provides the best compromise for critical error conditions, important fields, which make the packets totally this syntax is the one used in this paper to represent the independent from each other. These fields include the MPEG-4 error resilience performance for comparison, current macroblock address, the absolute quantization under different conditions, with an alternative syntax to parameter and, optionally, the VOP temporal reference, be proposed here.

    1-12 5 bits 1 or X S bits M bits N bits 17-23 bits 12 bits 17 bits 17-23 bits

    bits bits

    MBA QP HEC Shape Motion Texture Resync Shape Motion Resync

    Data Data Data Marker Marker Marker Marker

    Figure 2 - Alternative Three-Partition Error Resilient Syntax

    combined mode with data partitioning syntax, specified 3. An Alternative Object-Based Separate in the MPEG-4 Visual CD ?2?. In both cases, RVLCs are Syntax used. These two syntaxes are compared here for two very

    different MPEG-4 test sequences. The first sequence to The specification of the MPEG-4 object-based

    be considered is the Akiyo sequence, which consists of a combined mode with data partitioning syntax was partly

    newscaster speaking in front of a camera; a QCIF influenced by non technical issues, notably the fact that

    (176x144) version of this sequence, at 10 Hz, was the shape coding algorithm was defined quite late,

    encoded at approximately 24 kbps. The second sequence leaving very short time to study the possible alternative

    is the Cyclamen sequence, which shows a panning of a syntaxes in terms of error resilience. The adoption of a

    flower, from left to right; a SIF (352x240) version of this true separate syntax would require more study, notably

    sequence, at 10 Hz, was chosen. Since this sequence has related to the specification of additional markers and the

    a complex shape and much more motion than the evaluation of the compression versus resilience trade-off.

    These factors motivated the adoption of a more

    straightforward solution, i.e. a two-partition syntax,

    which means shape and motion data multiplexed together

    at the macroblock level, as shown in Figure 1.

    Although the MPEG constraints are well understood,

    it is now relevant to evaluate the standard syntax, notably

    in comparison to a true separate syntax, where shape,

    motion and texture are multiplexed at the previous one, a higher bitrate of approximately 128 kbps resynchronization packet level. This solution should was used. Sample frames of both sequences can be seen allow more powerful error protection and concealment in Figure 3. strategies since the various types of data could be used

     a) b) independently. Following the arguments above, this

    paper proposes an alternative object-based error resilient Figure 3 Sample frames for the a) Akiyo sequence; bitstream syntax based on three partitions. The b) Cyclamen sequence alternative syntax proposed is depicted in Figure 2. For these tests, only one object was used because of As it can be seen, a new marker needs to be composition issues that have not been studied yet, such as introduced to separate the shape and motion data. The how to compose several objects whose shape has been identification of the shape marker has been less difficult corrupted and no longer fit together. Thus, for the than the identification of the motion marker since the considered sequences only the newscaster and the flower arithmetic coding scheme specified by MPEG-4 for shape itself were used, respectively. coding uses a bit-stuffing algorithm to avoid long runs of As for the resynchronization packet, its size was zeros, which makes this task easier. This new syntax chosen so that approximately four resynchronization allows additional error detection, localization and packets fit in one VOP, which is the usual criterion used concealment schemes since the shape data is now within MPEG. This corresponds to packet sizes of separated from the motion data, and thus can be approximately 600 and 3200 bits for the Akiyo and independently accessed and used. Cyclamen sequences, respectively. 4. Comparison Conditions and Results

    In this paper, the performance of the proposed three-

    partition syntax is compared against the two-partition

    Table 1 Distortion values for the Akiyo sequence Critical case No RVLC Critical case - RVLC Typical case No RVLC Typical case RVLC Average Average Average Average Average Average Average Average Dn (%) PSNR (dB) Dn (%) PSNR (dB) Dn (%) PSNR (dB) Dn (%) PSNR (dB)

    Two-partition syntax 0.0587 33.31 0.0557 33.33 0.0169 34.83 0.0164 34.88 Three-partition syntax 0.0218 33.53 0.0199 33.63 0.0010 35.17 0.0007 35.17 Error Free Case - average PSNR: 35.34 dB; average Dn: 0.0000 %

    respectively, does not reflect the real difference in terms Regarding the error concealment, it is clear that

    of subjective impact more associated to the shape different concealment schemes are possible, for the two

    distortion, which is clearly shown by the Dn values, and three partition alternatives, since concealment should

    0.458% and 17.6 % for Figures 4 a) and b), respectively. precisely take advantage of the bitstream syntactic

    structure. However, in order to avoid testing here the error concealment schemes and not the syntaxes

    themselves (this will have to be done later), a similar

    concealment approach was used for both syntaxes: once

    an error is detected within a given packet, the decoder

    skips the bits until the next resynchronization marker is

    found, using only the fully correctly decoded partitions -

    0, 1 or 2 for the MPEG-4 syntax, 0, 1, 2 or 3 for the

    syntax proposed here. On top of this, the reversible decoding capability provided by the RVLC texture codes a) b) is also used, for both syntaxes. In this case, if the error is Figure 4 Examples of erroneous shape detected in the texture partition, then the previous

    In this paper, two residual error conditions are partitions will be kept and the decoder will try to get the

    considered, corresponding to a critical case (burst errors most information possible out of the erroneous texture -2 and a with an average bit error rate (BER) of 10partition, instead of just discarding it. bulength of 10 ms) and a typical case (burst errors with -3In order to evaluate the object quality, an adequate set an average BER of 10 and a burst length of 10 ms) of of measures is needed. In this paper, two measures (also mobile networks. For both cases, the average BER during used within MPEG) are used: PSNR and Dn, which are the burst is 0.5. After the bitstreams are corrupted, they distortion measures for texture and shape, respectively. are decoded with and without reversible decoding The definition of PSNR for video objects is similar to the processing. In the critical case, the first 50 frames of the one used in the frame-based case, with the exception that sequence are used, whereas in the typical case the first it is applied only to the pixels that belong to both the 100 are used. The PSNR values are averaged over 10 original and reconstructed objects. As for the definition runs and are shown in Figures 5 and 6. In Tables 1 and 2, of Dn, it is the number of different shape pixels in the averages of PSNR and Dn over all the frames are shown. original and reconstructed video object, divided by the As it can be seen, when the three-partition syntax is used total number of pixels in the original object. This value is the PSNR exhibits a small increase in value. In terms of usually expressed as a percentage and smaller values Dn, however, the increase is much greater, justifying by correspond to less distortion. Of course, these measures itself the use of the three-partition syntax. correspond to very different types of visual artifacts.To The results obtained show that, when applying similar show the importance of the Dn figures, two examples of concealment schemes, the three-partition syntax reconstructed video objects with erroneous shape are proposed outperforms the two-partition syntax specified given in Figure 4. The difference in the PSNR values, in MPEG-4. Since the three-partition syntax should 29.14 dB and 26.67 dB for Figures 4 a) and b), allow more elaborate concealment schemes, which

    Table 2 Distortion values for the Cyclamen sequence

    Critical case No RVLC Critical case - RVLC Typical case No RVLC Typical case RVLC

     Average Average Average Average Average Average Average Average Dn (%) PSNR (dB) Dn (%) PSNR (dB) Dn (%) PSNR (dB) Dn (%) PSNR (dB)

    Two-partition syntax 0.6225 27.70 0.6185 27.70 0.0645 29.71 0.0609 29.77

    Three-partition syntax 0.2090 29.09 0.5468 28.02 0.0577 29.80 0.0577 29.80 Error Free Case - average PSNR: 31.72 dB; average Dn: 0.0000 %

should further improve its performance, it is possible to

    conclude that the proposed three-partition syntax

    performs better than the most powerful MPEG-4 error

    resilient syntax.


    The authors acknowledge the support of the European

    Commission under the ACTS project MoMuSys. L.

    Ducla Soares acknowledges the support of PRAXIS XXI

    through his Ph.D. scholarship.


    [1] R.Koenen, F.Pereira, L.Chiariglione, “MPEG-4: Context and Objectives”, Image Communication Journal: MPEG-4

    Special Issue, vol. 9, nº 4, May 1997

    ?2? MPEG Video & SNHC, “Coding of Audio-Visual Objects: Visual ISO/IEC 14496-2 ”, Doc. ISO/MPEG N2202, MPEG Tokyo Meeting, March 1998,

     a) b)

     c) d) Figure 5 PSNR values for the Akiyo sequence for a) critical case with no reversible decoding; b) critical

    case with reversible decoding; c) typical case with no reversible decoding; d) typical case with reversible


     a) b)

     c) d) Figure 6 PSNR values for the Cyclamen sequence for a) critical case with no reversible decoding; b)

    critical case with reversible decoding; c) typical case with no reversible decoding; d) typical case with

    reversible decoding

Report this document

For any questions or suggestions please email