H.264 as video coding algorithm is important for broadcasting standards such as DVB-H and DMB. In comparison to its predecessors MPEG-2 and MEPG-4 SP/ASP, H.264 achieves improved compression efficiency at the cost of increased computational complexity. Real-time execution of the H.264 decoding process poses a large challenge on mobile devices due to low processing capabilities. Multi -core systems provide an elegant and power-efficient solution to overcome this performance limitation. However, efficiently distributing the video algorithm among multiple processing units is a non-trivial task. It requires detailed knowledge about the algorithmic complexity, dynamic variations and inter-dependencies between functional blocks of multi-core environments.
H.264/MPEG-4 AVC is a block-oriented motion- compensation -based codec standard .CONTEXT-BASED adaptive binary arithmetic coding (CABAC) used in H.264/AVC , is an extension of the binary arithmetic coding (BAC), where the coding offset value is dynamically adjusted based on the syntax element being encoded/decoded. While the coding efficiency of CABAC is superior to the conventional Huffman coding , the improvement comes with an increasing performance requirement; at least 3 GHz of computing power is required for the real-time decoding of a HD sequence if a general-purpose,yet high-speed, RISC machine processes the syntax parsing.As HD digital TV broadcasting coded in H.264/AVC Main Profile and High Profile is being widely spread at the present,the necessity of a high-speed CABAC decoder is growing
The gained insights are finally used to optimize the run time behavior of a multi-core decoding system and to find a good trade-off between core usage and buffer sizes.
Increasing the coding efficiency of video codecs with the common combination of temporal prediction and lossy transform coding is basically a matter of reducing the remaining redundancy in the data streams. In H.264 standard, this goal has been achieved by means of more advanced pixel processing algorithms (eg 1/4-pixel motion estimation)as well as using more sophisticated algorithms for predicting syntax elements from neighboring macro blocks (eg context adaptive VLC). However, the advanced coding tools result in significantly increased CPU and memory loads on encoder as well as the decoder. The high computational demands pose a challenge for practical H.264 implementations in environments of limited processing power such as mobile devices. Understanding the run time behavior of the H.264 decoder is therefore essential for meeting the desired performance requirements on the underlying platform. If the computational requirements of a video algorithm cannot be met with a single processing unit, multi-core systems often provide an elegant and power-efficient alternative. Unfortunately,efficiently distributing the video algorithm among multiple processing units is a non-trivial task. It requires detailed knowledge about the algorithmic complexity and inter-dependencies between functional blocks. The most severe problem, however, is that there are strong dynamic variations in run time on the basic level of the H.264 decoding process, namely the level of macro blocks. We go into more detail on this dynamic behavior in the following. Roughly spoken, an H.264 stream can be regarded as a sequence of compressed macro blocks. In the decoding process, each macro block is sent through the decoder pipeline one after the other in order to derive the uncompressed video information. When looking at the computation time consumed by each macro block, it is observed that there are large variations.