Bristol Film and Video Society

4:2:0? 4:2:2? A beginner’s guide to Sampling and Quantisation, and SD cards explained!

If you’ve taken the trouble to read the specs on that new camcorder you’re thinking of spending the kids’ inheritance on, you have almost certainly come across the ratio ‘4:2:2’ or similar. For ages I wondered where these figures came from and what they mean - apart from the bigger the numbers, the higher the price! So I had a look round, couldn’t find a full and clear explanation, and after a lot of rummaging came up with the following.

We all know about 1920x1080 being ‘High Definition’ (a relative term; in its day, 425 line was ‘High Def’), but that ultimately refers to the number of pixels forming the image. How much data is being recorded to define each pixel in terms of colour and brightness? That’s where this ratio comes in. But it’s also misleading. It’s not really (e.g.) 4:2:2, but (currently) actually 8:4:4. The relationship is obviously still the same, but it’s expressed in a more convenient form. The problem here is that if someone in the distant future in a land far, far, away, increases the data content, the ratio could be maintained without any real indication of the improvement! But I’m getting ahead of myself. Here’s some info which hopefully takes you gently from the beginning, explaining it as we go. Even so, it might be a good idea to get your slippers, pour a coffee, and find a comfy chair before you start; I apologise in advance for a few technical terms - just keep going, and hopefully you’ll get back on track.

What is a Pixel (‘picture element’, or ‘pel’)?

The colours within the image formed on the retina stimulate the Red, Green and Blue cones. The signals from the rods (which sense luminosity) and these cones are combined in the brain to produce the perceived brightness (luminance) and colour (chrominance). The cones are less sensitive to light than the rods (which are connected in groups, increasing their sensitivity), and this is why we tend to see things in monochrome when in the dark. The cones are concentrated in the centre of our vision, with the outer rim almost all rods. This explains why astronomers use ‘peripheral vision’ to see distant faint objects. Our eyes only see a high-definition image in an area about 3° diameter, a little bigger than looking at a full moon, but a digitized image shown on a screen needs to have uniform coverage of the luminance and chrominance, because it doesn’t know which bit we’re looking at!

With digital television, the data used to record and reproduce luminance and chrominance is covered by the term ‘Sampling’, and is usually expressed as, for example, one of these ratios:

4:4:4 (dream on, you can’t afford it)

4:2:2 (no compression, typically ‘broadcast quality’)

4:2:0 (compressed, typically used for UK PAL, domestic camcorders)

4:1:1 (compressed, typically used in US, I believe because NTSC had fewer horizontal scan lines than PAL)

The signal level, or value, of each of the three parts is termed ‘Quantisation’ (i.e. ‘how much’, or ‘what quantity’), and the number refers to the number of binary bits used to define each value. But read on… we’re not there yet.

Although basically one element of an image, the term Pixel means different things to different people in different contexts. For example, in the world of printing, a pixel is a single point of colour, and one of the multitude of individual colour spots that make up a picture.

With colour TVs and monitors, a Pixel is made up of Subpixels; for example, usually (but not always) one red, one green and one blue. Collectively they reproduce the required colour, shade and brightness. These subpixels are often vertical bars within the square pixel, like this:

[pic]

(Source: en.wiki/Pixel)

The processed signals obtained from the sensor in the camera are usually referred to as YUV, or ‘Component’:

Y Luminance (brightness)

B-Y (“B minus Y”, or U/Cb)

R-Y (“R minus Y”, or V/Cr)

What happened to Green?

See boxout on right…

Basically, don’t panic!

You may see connectors with these legends on your video kit. As the human eye is more sensitive to brightness than colour, the colour information is usually sacrificed first during compression.

Composite video systems such as PAL, NTSC and SECAM are all analogue compression schemes which embed a subcarrier in the luminance signal so that colour pictures are available in the same bandwidth as monochrome

(apologies for the techie language). In comparison with a progressive scan RGB picture, interlaced composite video has a compression factor of 6:1. (Source: ‘An Introduction to Digital Video’ by John Watkinson).

Colour - Additive or Subtractive?

Colours generated on a screen or projected by a light source perform differently from colours mixed as paint or printed on a page. This is because emissive or projected colours are those you wish to reproduce, whereas printed or painted surfaces (which are viewed with reflected light) absorb the colours you don’t want. As an aside, gels used on lamps cannot add or change any colour; they merely filter out any part of the spectrum you don’t want.

If the colour you want isn’t being emitted by the light source, you will be left in the dark! So for example, a Red gel will only allow red light through it and hold back any others, which is why the lamp appears to be less bright.

Additive

Prime colours are Red, Green and Blue, commonly referred to as RGB. Spotlights with colour gels, and colour images formed by Cathode Ray Tubes and flat screens are all examples of ‘Additive’ colours. Additive Secondary colours are Cyan, Magenta and Yellow (CMY).

Subtractive

Prime colours are Cyan, Magenta and Yellow. Paint and printing colours are ‘Subtractive’ colours. Subtractive Secondary colours are Red, Green and Blue. They only reflect the colours you want to see, and absorb the rest.

I’m including the next bit to explain the Quantisation levels (0-255) mentioned later, so pay attention!

Analogue signals can be represented by any value between a minimum and a maximum, say between 0 (zero) volts and 1 volt, but all this new digital video kit uses Binary signals, or just one of two voltages on a wire: let’s say either 0 (zero) volts or 1 volt. That’s it; just two states, 0 or 1. Nothing in-between. This makes life easier for electronic circuitry; think of it as an on/off switch. Any decent kit will probably employ 'slicing' technology or similar. The signal is ones and zeros, resulting in a square waveform, but any frequency drop-off (usually HF) exhibited by, for example, an interconnecting cable, results in a slope rather than a sharp ‘cliff edge’ on the waveform. Slicing effectively cuts a horizontal line through the waveform at its mid point (say at 0.5 volts for a zero to 1v signal).

The circuit then generates a brand new clean waveform, based on whether the signal was going up or down at the point where the waveform is intersected. This makes it very robust as far as signal degradation is concerned.

To continue…

With our ordinary Decimal numbers, we ‘carry one’ when we get to ten. With Binary, we ‘carry one’ when we get to two. It seems confusing, but all you really need is confidence. You can add, subtract, multiply and divide just as you can with Decimal. Binary really comes into its own with big numbers. Using your fingers and thumbs, what’s the biggest number you can count up to? It’s not 10! Let’s have a look at some equivalent values...

Long time since you left school? Here’s a refresher…

Binary values

Shown here with four ‘bits’, or characters (as used for chrominance), and note that leading zeros are included:

4-bit Binary Decimal value

0000 0

0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 7

1000 8

1001 9

1010 10

1011 11

1100 12

1101 13

1110 14

1111 15

8-bit Binary (as used for luminance), showing individual bit values…

|8 bits |1 |1 |1 |

|Sampling ratio: |4 |2 |2 |

|Signal: |Y |Cb |Cr |

|Bits allocated: |8 |4 |4 |

|Sampling frequency: |13.5 MHz |6.75 MHz |6.75 MHz |

|Samples per line:* |720 |360 |360 |

|Bit rate: |165.9 Mbps | | |

Typical compression formats

JPEG Joint Photographic Experts Group

MPEG Motion Picture Experts Group

MPEG1 (1992)

Supports Video coding up to about 1.5Mbits/s and stereo audio at 192 Kbits/s. Used for CDi and Video-CD.

It does not support interlaced video. It has a horizontal resolution of 240 lines - similar to VHS.

MPEG2 (1994)

4-9Mbit/s, supports interlaced video beyond 400 lines. Used for DVDs and Super VCDs.

SDTV 3-15 Mbits/s, HDTV 15-30Mbits/s. Also used for Freeview and Satellite TV.

MPEG3 (not in use)

Originally intended for HDTV, but this is now covered by MPEG2.

MPEG4

ASF, DivX, WMV, low bandwidth systems; Satellite TV, mobile phones.

DV, MiniDV

Fixed 5:1 Intra-frame. (Each frame is independent and complete).

JPEG compression is used for still images. It uses discrete cosine transform (don’t ask me!) and provides lossy compression at ratios up to 100:1 and more. Some images can be compressed at 10:1 or even 20:1 with little apparent loss of sharpness.

Each pixel of the image has a value representing its brightness and colour. The image is divided into 16 pixels by 16 pixels blocks, which are then reduced to 8 pixels by 8 pixels blocks by removing alternate pixels. An average value is then computed for each of these blocks. Further compression may be applied.

JPEG++ allows selected areas of the image to have different compression ratios.

MPEG compression is used for moving images. Offering about three times greater compression than JPEG, it works by comparing successive frames and recording the differences. If, for example, the background doesn’t change, it simply repeats it, thus allowing areas containing moving subjects more file space for detailed data. For example, a newsreader may be sitting at a desk; the background is stationary, the head may move a little, the mouth will move a lot. This generates a group of pictures, or ‘GOP’, where a reference frame is followed by many frames (possibly 14) which show accumulating changes to the first. Then you start again with a new reference frame.

MPEG uses three types of image:

Type I - Intraframe, or I - frame (also referred to as a Key frame).

A reference frame in which the whole image is encoded without reference to other frames.

Type P - Predictive, or P - frame.

Encoded using motion compensation on the previous shot.

Type B - Bi-directional, or B - frame.

Encoded using motion prediction on the previous and next images, which must be either I or P frames (you cannot have adjacent B frames).

But there seems to be a conflict; why the move towards HD and even 4k (of which there are several standards! - let alone 8K SHV) and 3D - if we are all going to be watching films ‘on the move’ on a daft little screen on a mobile?!

If you are approached to make a ‘4K’ film for a client, please ensure that they are aware of the aspect ratio and frame rate (fps). They might assume that 4K is the same as (or better than) UHD. Just to illustrate the confusion, here are some of the current '4K' standards (‘4K’ is an approximate term): (Sources: TV-bay magazine, Wikipedia)

QFHD 4K and 4KTV     3840 x 2160 8.3M pixels 1.78:1 (4x 1920x1080 (HDTV)), 4K UHD 16:9 domestic TV

Digital Cinema 4K        4096 x 1714 7.02M pixels 2.39:1

Cinemascope Crop 4096 x 1716 7.03M pixels 2.39:1 (Possibly the same as Digital Cinema 4K, above)

Flat Crop 3996 x 2160 8.63M pixels 1.85:1

Digital Cinema 4K         4096 x 2160   8.85M pixels 1.90:1 (4x 2048 x1080 (2K)) ‘DCI 4K’ Full Frame

Academy 4K                3656 x 2664   9.74M pixels 1.37:1

Full Aperture 4K           4096 x 3112 12.75M pixels 1.32:1

4K distributions must have a frame rate of 24fps (i.e. same as film). It’s a minefield for the unwary!

The latest High Definition standard is SHV 8K (Super Hi-Vision) which is 7680 x 4320, 33.18M pixels.

That’s 16 times HD, with 22.2, 360-degree, surround sound.

Expect it sometime around 2022... Best start saving now... and 11K is in development...

For comparison, I have gleaned the following from my Panasonic HC-X1 camera user guide (and I have selected recording formats for UK 50Hz as opposed to US 60Hz)...

|File type |Picture size |Mpixels |Aspect Ratio |fps |Bit Rate Mbps |Min. SD type |Recording time 32GB card |

|MOV/MP4 |4K 4096 x 2160 |8.85 |17:9 |24p |100 |U3 SDHC |40m |

|MOV/MP4 |UHD 3840 x 2160 |8.29 |16:9 |50p |150 |U3 SDXC |(64GB minimum: 55m) |

|MOV/MP4 |UHD 3840 x 2160 |8.29 |16:9 |25p |100 |U3 SDHC |40m |

|MOV/MP4 |FHD 1920 x 1080 |2.07 |16:9 |50p |ALL-I 200 |U3 SDHC |20m |

|MOV/MP4 |FHD 1920 x 1080 |2.07 |16:9 |50p |100 |U3 SDHC |40m |

|MOV/MP4 |FHD 1920 x 1080 |2.07 |16:9 |50p |50 |C10/U1 |1h 20m |

|MOV/MP4 |FHD 1920 x 1080 |2.07 |16:9 |25p |ALL-I 200 |U3 SDHC |20m |

|MOV/MP4 |FHD 1920 x 1080 |2.07 |16:9 |25p |50 |C10/U1 |1h 10m |

|MOV/MP4 |FHD 1920 x 1080 |2.07 |16:9 |50i |50 |C10/U1 |1h 20m |

|AVCHD |1920 x 1080 |2.07 |16:9 |50p |25 |C4 |2h 40m |

|AVCHD |1920 x 1080 |2.07 |16:9 |50i |21 |C4 |3h 00m |

|AVCHD |1920 x 1080 |2.07 |16:9 |50i |17 |C4 |4h 15m |

|AVCHD |1440 x 1080 |1.56 |12:9 (4:3) |50i |5 |C4 |13h 20m |

|AVCHD |1280 x 720 |0.92 |16:9 |50p |8 |C4 |8h 30m |

|AVCHD |720 x 576 |0.41 |11.25:9 |50i |9 |C4 |8h 15m |

Be aware that there is a maximum file size division when making long recordings; these files are concatenated;

AVCHD scene: 4GB - about 30 mins.

MOV/MP4 scene when using an SDHC card: 4GB/30 mins, or 96GB/3 hours if using an SDXC card.

So now you know all about Sampling and Quantisation, and why it’s important, and there’s some other info thrown in for good measure to back it up. Now, here’s some words about selecting SD cards...

Selecting SD (‘Secure Digital’) Cards.

SDHC = Secure Digital High Capacity. SDXC = Secure Digital eXtra Capacity (which are backwards compatible).

This applies to my Panasonic HC-X1 ‘4K’ camera, but I expect this advice from Panasonic would apply to similar cameras from other manufacturers. Basically it needs an SDHC or SDXC card; the maximum size recommended is

128 GB, and it is recommended to use a Class 10 to enable shooting 50 Mbps MOV/MP4. To use the Super Slow Recording function plus some other features the card needs to be a minimum of U3 (UHS 3). The manufacturer or brand of the SD card does not really matter; the specifications mentioned above are the most important when it comes to purchasing an SD card for this camera, and there is no need to focus on the V rating.

I suspected that I had been tempted to purchase SD cards with specifications (and therefore prices!) exceeding those required, so I looked at the website for SanDisk ( - other brands are available) and the website for the SD Association () to find a simple chart showing the hierarchy of ratings. I was disappointed! I had to look at individual offerings and tried to generate a chart myself, so the following is ‘E&OE’, but hopefully useful. I am also aware that you shouldn’t rely on recordings on SD cards to last forever, so to take the path of minimum risk I only use them once, back them up, and put the camera originals (locked to avoid accidental changes) in a safe place. First, a bit of background:

The SD Association devised a way to standardise the speed ratings for different cards.

These are defined as 'Speed Class', where the number corresponds to the absolute minimum sustained/sequential write speed in MB/sec. Cards can be rated as Class 2 (minimum write speed of 2MB/s), Class 4 (4MB/s), Class 6 (6MB/s) or Class 10 (10MB/s).

The next rating is the UHS Speed Class. This stands for Ultra High Speed and refers to minimum sustained writing performance for recording video. UHS came about due to 4K-capable video devices needing faster write speeds.

The SD Association has two UHS Speed Classes; UHS speed class U1 supports a minimum 10MB/sec write speed, and speed class U3 supports a minimum 30MB/sec. As a rule of thumb, 4K-capable camcorders will usually require at least a U3 rated SD card, but if you intend to shoot AVCHD, a C4 or Ultra C10 card is likely to suffice - see below.

Video speed class was introduced to facilitate 4K and 8K recording; V10=10MB/sec, V30=30MB/sec etc.

Things get a little more confusing as UHS Speed Class-rated devices will also use one of two UHS Bus Interfaces that indicate the theoretical maximum read and write speeds. They’ll be listed as either UHS-I or UHS-II to show which interface is used. UHS-I devices have a maximum read speed of 104MB/s, whereas a UHS-II card has a maximum read speed of 312MB/s. Note that unlike the UHS Speed Class, these are not sustained speeds.

Availability (SanDisk card descriptions in bold). Listed roughly in order of R/W speed, and probably price!

Some specs are incomplete, especially the Read/Write speed/s where only one speed is stated.

SDHC C4, sizes 4GB-64GB

Ultra SDXC1, R/W 30/48, C10, 16GB-64GB

Ultra SDHC/SDXC UHS-I, R/W 80, C10, 8GB-128GB (Packaging states SDHC UHS-1 ‘Full HD video’).

Ultra Plus SDHC/SDXC, R/W 80, C10, U1, 16GB-128GB

Extreme UHS-I, R/W 90, C10, U3, V30, 16GB-256GB

Extreme UHS-I, R/W 150, C10, U3, 16GB-256GB

Extreme Plus SDHC/SDXC, UHS-I, R/W 90/60, C10 4KUHD, U3, V30, 16GB-64GB

Extreme Pro 4K UHD, SDXC, UHSI, R/W 95/90, C10, U3, V30, 32GB-512GB

Extreme Pro UHS-II, R/W 300/260, C10, U3, 32GB-128GB

Yes, it is confusing, but over-specifying will just be a waste of money - just match you camera’s needs. With so many cards to choose from not everyone stocks a full range, so you will need to shop around; prices differ widely. Offered E&OE! If you have any comments or corrections, please let me know. I hope you find it useful.

Pete Heaven, BFVS. August 2019

-----------------------

One Pixel

R, G, B Subpixels

Red, Green and Blue are the colours usually reproduced by, for example, the electron ‘guns’ of a colour CRT (Cathode Ray Tube - remember them?!), or the LCD/LED/OLED pixel elements of a flat screen.

These colours are emissive, and therefore ‘Additive’.

Notice how the Binary value builds from the right, and adds another column to the left when the decimal value goes to 2, just as you would when you go from 9 to 10 using Decimal. But with Binary, each bit is simply worth double the one to the right of it.

Each of the four bits shown in the table on the left has a value which should be totalled to obtain the Decimal equivalent. So for example…

1011 means, reading from the left: (1 x 8)+(0 x 4)+(1 x 2)+(1 x 1) = 11

4-bit Binary offers a maximum value of 15 (or 16 levels if you include zero).

8-bit Binary offers a maximum value of 255 (or 256 levels if you include zero).

Using Binary with 10 fingers+thumbs (10-bits) you can count from zero up to 1,023 - but a bit of a problem if you have arthritic joints. And that nice man who gave you a friendly gesture was really just indicating ‘six’… (or was it four?)

Try it…

* Active video signal, excluding Vertical and Horizontal Blanking Intervals

B-Y and R-Y are usually referred to as ‘difference’ signals. The human eye is more sensitive to errors in green than in the other two primary colours, and green makes the greatest contribution to the Y element, so think of it as a reference signal. The Y element is used to display monochrome images. A full explanation is given in ‘An Introduction to Digital Video’ by John Watkinson, under “Colour difference signals”. Published by Focal Press, ISBN 0-240-51637-0, it’s an expensive and very technical book which probably won’t suit the average enthusiast. Try the web for used copies.

I’ve included this table for reference, just in case you’re interested.

Remember that there’s

8 bits (b) to a byte (B).

DCI = Digital Cinema Initiatives

Vertical resolution:

The number of horizontal lines in the image, counting down the screen.

Horizontal resolution:

The number of pixels in each line, counting across the screen.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches