week10

Video Compression

When dealing with video compression the same three blocks of any data compression system apply.

  • Redundancy reduction needs to be accomplished first,
  • followed by quantization,
  • and finally entropy encoding (with Huffman, LZW, arithmetic,...).

Towards redundancy reduction both a temporal predictor is used and a decorrelating transformation is applied to the unpredictable part of the prediction (or to the prediction error). The temporal prediction is carried out with the use of the motion vectors connecting the current frame with one or more past frames but also future frames. This is referred to as inter-frame coding. For Redundancy Removal, we'll use Predictor and Transform as a technique.

If temporal prediction is not possible (due for example to scene cut or a new object appearing in the scene) then we treat the video frame of the macroblock of the frame as a still image and we utilize the technique of transform encoding that we learned in the last module applied directly to the intensity values. This is referred to as intra-frame coding.

We will also discuss video compression standards.

Motion-Compensated Hybrid Video Encoding

It is a scheme of choice for all standards. However, all Standards are a bit different. Applications are many, like security applications, capture stereo and multi-view video, DVDs, ...

For Redundancy Removal, we'll use Predictor and Transform as a technique. The Decoder is part of the Encoder. The video frame arrives at the input of the sytstem, say we have aprediction already, then we evaluate en error by difference of predictor and frame. Error is unpredictable, so it needs to be sent to the decoder. Since there's still correlation in this image, we apply Transform coding (like JPEG), so DCT is taken and quantized. The DCTs are quantized and sent to the coder. In the meantime, DCT are inversed and an error is found in the spatial domain, so that prediction can be re-calculated through Motion Estimation and Compensation (as brightest constraint). To do that we need two frames at least: the current (forward) and the reconstructed frame (backward). This is refer to as Motion Compensation, because the motion vector is going backward in time. Again, the reason why we do not use the original frame for this, it is the Error drift we would encounter, since the decoder does not know anything about the original frame. The parameters (motion vectors) are entropy encoded and sent, producing code through DCT coefficients and Motion Vectors.

This is called Inter-Frame Coding because correlation between frames is taken into account. In case there's no correlation (like change of scene), we have Intra-Frame Coding, in which we have switches. The frame will be saved and No Motion Estimation will be made, but just DCT coefficients. In any encoder there's a Rate Controller, useful for many reasons, and has an initial Rate as input... that will change parameters that conversely will change the Rate for the next step.

On Video Compression Standards

They come from ITU-T (mostly for Communication, H.263, H.261,...), MPEG (mostly for Entertainment, MPEG-1, MPEG-2, MPEG-4), JCT-VC (Joint Collaborative Team). All standards use the Motion-Compensated Hybrid Encoding. They deviate by encoding the shape. The standards are generic, do not specify the operation of the Encoder, how Motion Estimation is done. This does not provide quality garantee at the Decoder, but it provides flexibility. What is proprietary is the Pre-processor and Post-Processor, the Rate Control, the Motion Estimation, the Bitstream Encoder and Decoder. Also, the switches between Intra and Inter can be Proprietary as well. Furthermore, usually there's also Audio.

How far can we go with Video Compression? They are used by industries and are spread all over the World. Rule of Thumb of a Standard: Same performance with previous standard at 1/2 the bit rate!

H.261, H.263, MPEG-1 and MPEG-2

H.261 was one of the first ones, from 1990. Rates are multiplies of 64kbps. It's progressive with fiwed aspect ratio 4:3. H.263 had wireless as application. Rates 56k. It uses unrestricted motion vectors. Advance MC, with no loop filter, progressive. I- intra, P- predictive, B- bidirectional and PB- bidirectionalpredictive frames are used. Same performance as H.261 at 1/2 bit rate.

MPEG-1 was introduced in 1992. MPEG-2 in 1994. In 1999 came MPEG4. Then MPEG-7 and MPEG-21 standards are introduced but these last ones are not video compression standards. In MPEG-1 the base is the 8x8 block, and a Macro-block layer with Y (4blocks), Cb (1), Cr (1), therefore 6 blocks. Macroblocks are combined to form a Slice layer . Several Slices form a Picture Layer. Pictures are combined to form a Group of Pictures layer (GOP). With MPEG-1 the structure is fixed: IBBPBB...P type of frames in the GOP.

Before encoding a Macroblock we have many options and decisions. With an Intraframe I picture, all pictures are MacroBlocks and we have to decide between change and no change MQuant (quantizer). With a P - predictive picture, we decide if the motion vector is zero or we can perform a motion compensation. Within that, we can choose to encode as Inter or Intra. Within Inter, we can still decide if the prediction error is Coded or Not Coded. The zero option is to set motion vector to zero, Inter, Not coded error. With a B picture, I have to decide if it's a Forward, Backward or Interpolated motion Compensation. In MPEG-1 on average we can have I,P,B in order of more to less bits used for a certain frame. When B are utilized, we see that P is predicted by I or another P. B, on the other hand, is predicted by B and P. This leads to some buffering, delay that comes in.

MPEG-2 was introduced in 1994, and covers more applications. It handles same time of pictures, but now it encodes interlaced videos. MPEG-2 adopted a toolkit approach (Profiles and Levels) and a scalable bitstream. Interlaced signals are visible in certain displays only... MPEG-2 supprots Field and Frame Prediction (Motion Compensation).

MPEG-4

Deviates from all other standards. One of the major differences is that it looks at the content of the video. We can access and manipulate individual frames that make a scene. It deals with both Audio and Video objects. Provides new tools, merge Natural and Syntethic Data. Robust bitstream and flexible. All possible additional application arise.

Elementary objects are decomposed, recomposed and rendered. Coding of Visual Objects is done on frames, arbitrary shaped objects. sprites. Particular attention to Synthetic objects too. The video is divided in VOPs: Video Object Plane. VOPs are identified, encoded separately, multiplexed and sent through the network and to the Decoder.

The revolution is coding the Shape, with BInary Alpha Block (BAB), wich bascally form a mask. Similar to Intra and Inter with the Texture of MPEG-2, we have Intra Shape and Inter Shape, using CAE (Context-Based Arithmetic Encoding). Sprites are manipulated by Motion parameters out of the object. With Face objects, three sets of Facial parameters and transform data can be used.

H.264

No major differences, but as computer technology has advanced, much more computations can be preformed. Decoders are much lighter than Encoder because they miss Motion Estimation. It is same as MPEG4 Part 10, H.264/AVC, H.26L. New features still follow the main blocks, but more elaborate processing steps are taken along the way. These are: Motion compensation and intra-prediction, Image Transform, Deblocking filters (they're back!), Entropy coding, Frames and Slices.

The Block-Size becomes variable, starting with 16x16 macroblock, and divides it in much smaller blocks based on the quad-tree structure so that great flexibility is provided. Arbitrary Reference Frames are used. Any frame may be used as a reference and it's specified as an index to the buffer. Works also in bi-predictive mode. We are allowed to see into 32 frames in the past. About intra-prediction: Intra-frames are Natural images, so they exhibit a strong spatial correlation. Macroblocks in intra-coded frames are predicted based on previously-coded ones. An encoded parameter specifies the features of this Macro-block. Intra-prediction can be vertical, horizontal, diagonal,... all modes will be evaluated in the prediction phase but only the best will be chosen.

When it comes to the Image Transformation, DCT and IDCT can bring inaccuracies. In H.264 a matrix of integer is used and gives a big gain in implementation and only a minimal decrease of quality.

Concerning Deblocking Filters, previous filters use to smudge edges between blocks. In H.264, each edge is adaptively chosen among 5 deblocking filters to apply. It also improve objective quality.

About Entropy coding, we use CABAC, it binarize all syntax symbols that are translated to bit-strings.

With H.264, each frame consists of one or more slices as continuous groups of macroblocks that can be independently encoded and decoded. Thus now we can have I, P, B -slices.

H.265

It has different names like HEVC and it followed the big revolution of the computational capacity. Supports resolution up to 8K and uses Parallel processing. It codes Tree Units (CTU) an dTree Blocks (CTB). CTU can be of sizes 16x16, 32x32 and 64x64. Bit rates gets bigger, encoding time decreases and decoding time increases with bigger CTU sizes.

Concerning Improved Motion Compensation, it uses quarter pel precision and 7-8tap filters. Use Advanced Motion Vector Prediction. Support 33 directional modes in Intra-Prediction (vs 8 of H.264), it has Planar mode (2d fitting) and DC-flat mode. CABAC is still used, but it was improved as per throughput capacity, and it's No VLC. Better support of Parallel Processing. Sample Adaptive Offset... It supports Slices (as vectors) and Tiles (as blocks) for ease of use with Parallel Processing that runs Threads.


In [ ]: