Group of pictures and its structure

Elecard Company
3 min readDec 8, 2021

A group of pictures (GOP) is a set of sequential pictures that defines the order in which intra (I) and inter (P and B) frames appear.

Often a GOP is denoted using two numbers, such as M = 3, N = 12. М specifies the distance between two reference frames (I or P) and N the distance between two full pictures (I frames). As an example, for M = 3 and N = 12, the GOP will have the following structure: IBBPBBPBBPBBI.

I (IDR) frames

I frames are compressed independently of any other frames in the video sequence. The IDR-frame, also referred to as a key frame, is a subtype of I frame. It is the frame at which decoding of the entire stream begins. No frames located between two IDR frames can reference any frames outside this interval.

Sometimes, when the scene view changes, the current and previous frames differ so much that it is more beneficial to use an I frame instead of a P or B frame at the beginning of a new scene. Encoders are capable of responding to such changes — this capability is called scene change detection (SCD).

P и B frames

P and B frames are used to encode the changes in the current frame relative to the preceding frames. The most versatile structure of a P and B frame sequence is two to three B frames per P frame.

B frames are usually a fraction of the size of a P frame, and each B frame adds latency because of buffering and frame reordering. The greater the number of P and B frames used, the higher the compression ratio.

Fig. 3 The GOP structure of video encoded in different ways: 3.1: M=1 N=10, no B frames, no SCD; 3.2: M=2 N=20, B = 1, no SCD; 3.3: M=4 N=30, B = 3, pyramid, SCD.

How to configure encoder for TV broadcasting: tips

  • Length. Long GOPs are used in files or in ОТТ broadcasting (for example, when the GOP length in seconds is equal to the chunk duration). For live broadcasting, a smaller GOP is recommended for several reasons, such as:

–According to the DVB standard, PAT/PMT tables should appear at a rate of twice per second (equivalent to a period of 500 ms), and, as a rule, a PAT/PMT table is placed next to an I frame.

–For DVB, the channel switching rate is also very important. The longer the GOP, the more time the switching will take.

  • Structure. For better quality, use a hierarchical (pyramidal) GOP. This mode allows B frames to reference each other. Adaptive selection of the number of
    B frames is suited for encoding highly dynamic video sequences with complex motion. In moments of such complex motion, the number of P frames increases, and the GOP structure changes.
  • Scene change detection. Most encoders detect a scene view change and insert a full I frame into the scene automatically. However, if the content features frequent scene view changes (e.g. news), inserting full frames can cause the GOP structure to change. This will add several extra seconds of latency to the stream. If a buffer overrun occurs in the receiving device, the viewers will see frozen pictures and pixelation (scene change detection can be seen in Fig. 3).
  • Average (avg) encode ratio for the entire stream and I, P, and B frames: shows the compression ratio for the raw video. It can be used to verify the overall encoder performance, check whether the encoder has maintained the required proportions in terms of avg[EncRatio(I)] << avg[EncRatio(P)] << avg[EncRatio(B)], and compare the performance of two encoders using a common set of media files.
Fig. 4 Video sequence information (including encode ratio).

To view the detailed information about the video sequence and perform deep video analysis, we recommend using Elecard StreamEye Studio tools.

Request free demo of the Elecard CodecWorks encoder.

--

--

Elecard Company

Leading provider of components and software products for analysis, monitoring, encoding, decoding and streaming digital video and audio data.