Mpeg Info

Posted on  by

What is MPEG?

00s Can You Even Name Half of These ’00s Alt Bands From The. MPEG-2 is used by DVD-ROMs. MPEG-2 can compress a 2 hour video into a few gigabytes. While decompressing an MPEG-2 data stream requires only modest computing power, encoding video in MPEG-2 format requires significantly more processing power. MPEG-3: Was designed for HDTV but was abandoned in place of using MPEG-2 for HDTV. The Moving Picture Experts Group (MPEG) is a working group of authorities that was formed by ISO and IEC to set standards for audio and video compression and transmission. MPEG is officially a collection of ISO Working Groups and Advisory Groups under ISO/IEC JTC 1/SC 29 – Coding of audio, picture, multimedia and hypermedia information (ISO/IEC Joint Technical Committee 1, Subcommittee 29). MPEG-V outlines an architecture and specifies associated information representations to enable interoperability between virtual worlds (e.g., digital content provider of a virtual world, gaming, simulation), and between real and virtual worlds( e.g., sensors, actuators, vision and rendering, robotics). Play MPEG videos in your favorite video app on your Windows 10 device. This extension helps video apps installed on Windows 10, such as Microsoft Movies & TV, to play MPEG-1 and MPEG-2 videos. If you have MP4 or MKV video files that don’t play, this extension should help.

MPEG stands for 'Moving Pictures Experts Groups'. It is a group working underthe directives of the International Standards Organisation (ISO) and theInternational Electro-Technical Commission (IEC).

The groups work concentrates on defining standards for the coding of movingpictures, audio and related data.

MPEG-1 defines a framework for coding moving video and audio, significantlyreducing the amount of storage with minimal perceived difference in quality. Inaddition a System specification defines how audio and video streams can becombined to produce a system stream. This forms the basis of the coding used forthe VCD format.

MPEG-2 builds on the MPEG-1 specification, adding further pixel resolutions,support for interlace picture, better error recovery possibilities, morechrominance information formats, non-linear macroblock quantization and thepossibility of higher resolution DC components.

MPEG video compression

MPEG video compression uses several techniques to achieve high compressionratios with minimal impact on the perceived video quality.

Discrete Cosine Transformation (DCT)

The human vision system exhibits some characteristics that are exploited byMPEG video compression. One of these is that large objects are much morenoticeable than detail within them. In other words, low spatial frequencyinformation is much more noticeable than high spatial frequency information.

Mpeg Information

MPEG video compression discards some high spatial frequency information - theinformation which is less noticeable to the eye. The first step in this process isto convert a static picture into the frequency domain. The DCT performs thistransformation.

A complete frame is split into blocks of 8x8 pixels. The DCT algorithmconverts the spatial information within the block into the frequency domain.After the transformation, the top left value of the block represents the DClevel (think of this as the average brightness) of the block. The valueimmediately to the right of this represents low frequency horizontalinformation. The value in the top right represents high frequency horizontalinformation. Similarly, the bottom left value represents high frequency verticalinformation.

The following diagrams show a 4x4 block of pixels and the resulting DCTvalues. Values in the DCT output matrices range from 0 to 15.

PixelsDCT ValuesComments
8000
0000
0000
0000
A block of grey pixels. The DC value of the DCT output represents the average brightness. All other values are zero.
81500
0000
0000
0000
Low frequency horizontal component. Again the DC value of the DCT represents the average brightness of all pixels. The '15' represent the low frequency horizontal component.
8000
0000
0000
15000
A high frequency vertical component produces a high value in the bottom left corner of the DCT output.
8000
0000
0000
00015
Diagonal stripes have high frequency information in horizontal and vertical directions, producing high values in the bottom right corner of the DCT output matrix.

Our DCT transformed values contain an accurate representation of our originalmacroblock. By applying an inverse DCT on the values we regain our originalpixels. Our DCT output is currently held as high precision (e.g. floating point)values. We apply a technique called quantization to reduce the precisionof the values. Quantization simply means storing the value using a discretenumber of bits, discarding the least significant information. By using theknowledge that the high spatial frequency information is less visible to the eyethan low frequency we can quantize the high frequency parts using fewer bits. Itis important that the DC component is accurately represented.

In our example blocks above, we have used 4 bit (values in the range 0 to 15)values to represent the DCT matrix. With the knowledge that the eye cannotdetermine high frequency information as accurately as low frequency information,we can change the number of bits that we quantize each entry in the matrix. TheDC component must be accurately represented, but we can reduce the number ofbits required for other cells. The following shows an example of how many bitscould be allocated for each call in the DCT matrix:

4332
3222
3211
2211

The original matrix had 16 cells with 4 bits per cell, giving a total of 64bits. The quantized matrix has a total of:

A saving of about 50%. A real MPEG encoder varies the number of bits that DCTmatrix vales are coded to on each frame.

Modified Huffman Coding

Free Mpeg Files

Modified Huffman coding uses fixed tables to perform Huffman coding. The DCToutput is encoded using this technique to reduce the number of bits required.The basis of Huffman encoding is that encoded symbols are a variable number ofbits. Frequently used symbols consume fewer bits, less frequently used symbolsconsume more bits. The result is a (hopefully!) saving in the bit requirements.

Encoding Motion

MPEG video frames are broken into blocks of 8x8 pixels which are DCTprocessed and quantized as outlined above. Blocks are combined into macroblocks of 16x16 or 16x8 (MPEG-2 only) pixels.

Lets consider a sequence of 6 frames. The encoder starts by encoding acomplete representation of the first frame (similar to a static JPEG image).This is known as an Intra-Frame (or I-Frame). I-frames are necessary to give thedecoder a starting point.

The encoder could choose to encode the fourth frame in the video as aPredicted frame (or P-frame). To do this is scans the first frame (the reference frame) and the fourth frame, looking for macroblock size areas ofthe picture that appear similar. If the video contains moving objects, theencoder detects this. For areas of the image which have not changed betweenfirst and fourth frame, macroblocks are skipped. Skipped macroblocks do notconsume any data in the video stream. The decoder simply copies the macroblockfrom the previous reference frame. For areas that have changed slightly comparedto the reference it takes the pixel difference and encodes this using DCT andquantization techniques. For areas that the encoder can detect the movement ofan object from one macroblock position to another it encodes a motion vector and difference information. The motion vector tells the decoder how far andin what direction the macroblock has moved. Where the encoder cannot find asimilar macroblock in the reference frame, the macroblock is encoded as if itwas an I-frame.

The other frames in the sequence (second, third, fifth and sixth) could beencoded as Bidirectional Predicted frames (B-frames). Considering the secondframe, this has two reference frame; the previous reference frame isframe one and the next reference frame is frame four. A B-frame can usemacroblocks from either the previous or next reference frames, or preferably acombination of both. Using forward and backward motion vectors allowsinterpolation of 2 images, reducing noise at low bitrates.

Using our example, video would be encoded using the frame sequence:

It is more normal for I frames to appear less regularly than this, perhapsevery 12 frames. A more sophisticated encoder would dynamically detect whichframes should be encoded using which frame types, e.g. a scene change wouldresult in an I frame being inserted. Thus the sequence could end up looking morerandom, such as:

By using information from previous and next pictures, substantial saving canbe made in the bit requirements for P and B pictures compared with I pictures.Typically P-frames would require 30% to 50% the number of bits compared to I-frames,B-frames would require 15% to 25% the number of bits.

MPEG audio compression

Audio data is sampled at a certain sample rate. That means that anumber of measurements of the audio signal are taken every second (32,000,44,100 or 48,000 samples per second for MPEG-1 audio). Each sample is taken ata certain precision (16 bits).

MPEG audio compression uses a psycho-acoustic model of the human ear todetermine portions of the audio information that can be encoded at a lowerprecision without impacting the listeners perception.

Mpeg streamclip for mac

The first step in encoding MPEG audio is to use our old friend, the discretecosine transform (DCT) to convert a short burst of audio data (known as a frame) into the frequency domain. The DCT converts from time samples into 32equally spaced frequency bands. Using the psycho-acoustic model, the number ofdata bits used to represent the sampled data can be varied for differentfrequency bands. Audio information that will not be heard is not allocated anybits.

Frequency (Auditory) Masking

Frequency masking or auditory masking is a term used todescribe masking of a sound at one frequency by a sound at another frequency. Ifa loud sound is present at a particular frequency it reduces the ability of thehuman ear to discern a softer sound at a second frequency. The louder the firstfrequency is and the closer the two frequencies the greater the effect.

To illustrate, if a -6db signal is present at 1kHz (1000 Hz) and anothersignal present at 1.1kHz (1100 Hz) with a loudness of -18dB, the 1.1kHz signalwill not be heard.

Temporal Masking

Temporal masking is the masking of a sound by another sound thatoccurred before or after it in time. If a loud sound stops abruptly and isreplaced by a soft (low volume) sound of short duration, the soft sound will notbe heard until the ear can recover from the effects of the loud sound.

A similar effect in the other direction is also possible where a soft soundof short duration is followed by a loud sound. Once again the soft sound is notheard.