How Computers Work – Course Information



Computer Media

Media

File Compression

• Redundancy & Algorithms

• Searching for Patterns

• Lossy and Lossless Compression

MP3 Files

• Sampling Rates

• MPEG – Motion Picture Experts Group

• Perceptual Noise Shaping

JPEG – Joint Photographic Experts Group

• JPEG is lossy

• Rows and Columns (Pixels)

• Only noticeable differences shown

DVD and CD Formats

• DVD-R – Digital Video Disc Reader

• DVD-RW – Digital Video Disc Reader/Writer (written to 1,000 times)

• DVD+R DL – Dual Layer – Requires Dual Layer DVD Burner

• DVD+R

• DVD+RW – Can be erased and recorded over 1,000 times

• DVD-RAM – DVD Random Access Memory can be written to 100,000 times

• DVD-ROM – DVD Read Only memory – for Commercially stamped DVD’s and DVD Movies

• CD-R – Compact Disc for CD burning – typical 700 MB capacity

• CD-RW – CD Reader and Writer – 1,000 times

Movies Stored on DVD Discs

• MPEG2

• NTSC – National Television Standards Committee

• PAL – Phase Alternating Line

• MPEG Encoder

• Intraframe – provides the least compression

• Predicted – only data that has changed since the last frame

• Bidirectional – interpolates from data in surrounding frames

How File Compression Works

by Tom Harris

If you download many programs and files off the Internet, you've probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users, because it lets you reduce the overall number of bits and bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on a disk. Once you download the file, your computer uses a program such as WinZip or Stuffit to expand the file back to its original size. If everything works correctly, the expanded file is identical to the original file before it was compressed.

Most types of computer files are fairly redundant -- they have the same information listed over and over again. File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program.

As an example, in John F. Kennedy's 1961 inaugural address, he delivered this famous line:

"Ask not what your country can do for you -- ask what you can do for your country."

The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period. If each letter, space or punctuation mark takes up one unit of memory, we get a total file size of 79 units. To get the file size down, we need to look for redundancies. Immediately, we notice that:

• "ask" appears two times

• "what" appears two times

• "your" appears two times

• "country" appears two times

• "can" appears two times

• "do" appears two times

• "for" appears two times

• "you" appears two times

Ignoring the difference between capital and lower-case letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote. To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation.

Redundancy and Algorithms

Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. "LZ" refers to Lempel and Ziv, the algorithm's creators, and "dictionary" refers to the method of cataloging pieces of data.

The system for arranging dictionaries varies, but it could be as simple as a numbered list. When we go through Kennedy's famous words, we pick out the words that are repeated and put them into the numbered index. Then, we simply write the number instead of writing out the whole word.

|So, if this is our dictionary: |

|ask |

|what |

|your |

|country |

|can |

|do |

|for |

|you |

|Our sentence now reads: |

|  |

|"1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4" |

|  |

If you knew the system, you could easily reconstruct the original phrase using only this dictionary and number pattern. This is what the expansion program on your computer does when it expands a downloaded file. You might also have encountered compressed files that open themselves up. To create this sort of file, the programmer includes a simple expansion program with the compressed file. It automatically reconstructs the original file once it's downloaded.

But how much space have we actually saved with this system? "1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4" is certainly shorter than "Ask not what your country can do for you; ask what you can do for your country;" but keep in mind that we need to save the dictionary itself along with the file.

In an actual compression scheme, figuring out the various file requirements would be fairly complicated; but for our purposes, let's go back to the idea that every character and every space takes up one unit of memory. We already saw that the full phrase takes up 79 units. Our compressed sentence (including spaces) takes up 37 units, and the dictionary (words and numbers) also takes up 37 units. This gives us a file size of 74, so we haven't reduced the file size by very much.

But this is only one sentence! You can imagine that if the compression program worked through the rest of Kennedy's speech, it would find these words and others repeated many more times. And, as we'll see in the next section, it would also be rewriting the dictionary to get the most efficient organization possible.

Searching for Patterns

In our previous example, we picked out all the repeated words and put those in a dictionary. To us, this is the most obvious way to write a dictionary. But a compression program sees it quite differently: It doesn't have any concept of separate words -- it only looks for patterns. And in order to reduce the file size as much as possible, it carefully selects which patterns to include in the dictionary.

If the compression program scanned Kennedy's phrase, the first redundancy it would come across would be only a couple of letters long. In "ask not what your," there is a repeated pattern of the letter "t" followed by a space -- in "not" and "what." If the compression program wrote this to the dictionary, it could write a "1" every time a "t" was followed by a space. But in this short phrase, this pattern doesn't occur enough to make it a worthwhile entry, so the program would eventually overwrite it.

The next thing the program might notice is "ou," which appears in both "your" and "country." If this were a longer document, writing this pattern to the dictionary could save a lot of space -- "ou" is a fairly common combination in the English language. But as the compression program worked through this sentence, it would quickly discover a better choice for a dictionary entry: Not only is "ou" repeated, but the entire words "your" and "country" are both repeated, and they are actually repeated together, as the phrase "your country." In this case, the program would overwrite the dictionary entry for "ou" with the entry for "your country."

The phrase "can do for" is also repeated, one time followed by "your" and one time followed by "you," giving us a repeated pattern of "can do for you." This lets us write 15 characters (including spaces) with one number value, while "your country" only lets us write 13 characters (with spaces) with one number value, so the program would overwrite the "your country" entry as just "r country," and then write a separate entry for "can do for you." The program proceeds in this way, picking up all repeated bits of information and then calculating which patterns it should write to the dictionary. This ability to rewrite the dictionary is the "adaptive" part of LZ adaptive dictionary-based algorithm.

|Using the patterns we picked out above, and adding “_”|

|for spaces, we come up with this larger dictionary: |

|ask__ |

|what__ |

|you |

|r__country |

|__can__do__for__you |

|And this smaller sentence: |

|  |

|"1not__2345__--__12354" |

|  |

The sentence now takes up 18 units of memory, and our dictionary takes up 41 units. So we've compressed the total file size from 79 units to 59 units! This is just one way of compressing the phrase, and not necessarily the most efficient one. (See if you can find a better way!)

In most languages of the world, certain letters and words often appear together in the same pattern. Because of this high rate of redundancy, text files compress very well. A reduction of 50 percent or more is typical for a good-sized text file. Most programming languages are also very redundant because they use a relatively small collection of commands, which frequently go together in a set pattern. Files that include a lot of unique information, such as graphics or MP3 files, cannot be compressed much with this system because they don't repeat many patterns.

If a file has a lot of repeated patterns, the rate of reduction typically increases with file size. You can see this just by looking at our example -- if we had more of Kennedy's speech, we would be able to refer to the patterns in our dictionary more often, and so get more out of each entry's file space.

This efficiency also depends on the specific algorithm used by the compression program. Some programs are particularly suited to picking up patterns in certain types of files, and so may compress them more succinctly. Others have dictionaries within dictionaries, which might compress efficiently for larger files but not for smaller ones.

Lossy and Lossless Compression

The type of compression we've been discussing here is called lossless compression, because it lets you recreate the original file exactly. All lossless compression is based on the idea of breaking a file into a "smaller" form for transmission or storage and then putting it back together on the other end so it can be used again.

Lossy compression works very differently. These programs simply eliminate "unnecessary" bits of information, tailoring the file so that it is smaller. This type of compression is used a lot for reducing the file size of bitmap pictures, which tend to be fairly bulky. To see how this works, let's consider how your computer might compress a scanned photograph.

A lossless compression program can't do much with this type of file. While large parts of the picture may look the same -- the whole sky is blue, for example -- most of the individual pixels are a little bit different. To make this picture smaller without compromising the resolution, you have to change the color value for certain pixels. If the picture had a lot of blue sky, the program would pick one color of blue that could be used for every pixel. Then, the program rewrites the file so that the value for every sky pixel refers back to this information. If the compression scheme works well, you won't notice the change, but the file size will be significantly reduced.

Of course, with lossy compression, you can't get the original file back after it has been compressed. You're stuck with the compression program's reinterpretation of the original. For this reason, you can't use this sort of compression for anything that needs to be reproduced exactly, including software applications, databases and presidential inauguration speeches.

How MP3 Files Work

If you have read How CDs Work, then you know something about how CDs store music. A CD stores a song as digital information. The data on a CD uses an uncompressed, high-resolution format. Here's what happens when a CD is created:

• Music is sampled 44,100 times per second. The samples are 2 bytes (16 bits) long.

• Separate samples are taken for the left and right speakers in a stereo system.

So a CD stores a huge number of bits for each second of music:

44,100 samples/second * 16 bits/sample * 2 channels = 1,411,200 bits per second

Let's break that down: 1.4 million bits per second equals 176,000 bytes per second. If an average song is three minutes long, then the average song on a CD consumes about 32 million bytes of space. That's a lot of space for one song, and it's especially large when you consider that over a 56K modem, it would take close to two hours to download that one song.

The MP3 format is a compression system for music. The MP3 format helps reduce the number of bytes in a song without hurting the quality of the song's sound. The goal of the MP3 format is to compress a CD-quality song by a factor of 10 to 14 without noticeably affecting the CD-quality sound. With MP3, a 32-megabyte (MB) song on a CD compresses down to about 3 MB. This lets you download a song in minutes rather than hours, and store hundreds of songs on your computer's hard disk without taking up that much space.

| |

|The Name |

| |

|MPEG is the acronym for Moving Picture Experts Group. This |

|group has developed compression systems used for video data.|

|For example, DVD movies, HDTV broadcasts and DSS satellite |

|systems use MPEG compression to fit video and movie data |

|into smaller spaces. The MPEG compression system includes a |

|subsystem to compress sound, called MPEG audio Layer-3. We |

|know it by its abbreviation, MP3. |

|  |

Is it possible to compress a song without hurting its quality? We use compression algorithms for images all the time. For example, a GIF file is a compressed image. So is a JPG file. We create Zip files to compress text. So we are familiar with compression algorithms for images and words and we know they work. To make a good compression algorithm for sound, a technique called perceptual noise shaping is used. It is "perceptual" partly because the MP3 format uses characteristics of the human ear to design the compression algorithm. For example:

• There are certain sounds that the human ear cannot hear.

• There are certain sounds that the human ear hears much better than others.

• If there are two sounds playing simultaneously, we hear the louder one but cannot hear the softer one.

Using facts like these, certain parts of a song can be eliminated without significantly hurting the quality of the song for the listener. Compressing the rest of the song with well-known compression techniques shrinks the song considerably -- by a factor of 10 at least. When you are done creating an MP3 file, what you have is a "near CD quality" song. The MP3 version of the song does not sound exactly the same as the original CD song because some of it has been removed, but it's very close.

From this description, you can see that MP3 is nothing magical. It is simply a file format that compresses a song into a smaller size so it is easier to move around on the Internet and store.

What is JPEG? (from )

JPEG (pronounced "jay-peg") is a standardized image compression mechanism. It stands for Joint Photographic Experts Group, the original name of the committee that wrote the standard. JPEG is designed for compressing either full-color or gray-scale images of natural, real-world scenes.  It works well on photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or line drawings.

It is one of the largest used methods of image display used on the web today.   The image will usually end with the extension JPG or JPEG.

JPEG is "lossy," meaning that the decompressed image isn't quite the same as the one you started with.  JPEG is designed to exploit known limitations of the human eye, notably the fact that small color changes are perceived less accurately than small changes in brightness. 

A useful property of JPEG is that the degree of lossiness can be varied by adjusting compression parameters.  This means that the image maker can trade off file size against output image quality.  You can make *extremely* small files if you don't mind poor quality.  Conversely, if you aren't happy with the output quality at the default compression setting, you can jack up the quality until you are satisfied, and accept lesser compression.

Why use JPEG?

There is one very good reason: To make your image files smaller and having them appear virtually the same as the original.

Making image files smaller is a plus for transmitting files across networks and for archiving libraries of images.  Being able to compress a 2 Mbyte full-color file down to, say, 100 Kbytes makes a big difference in disk space and transmission time!  And JPEG can easily provide 20:1 compression of full-color data. 

How does JPEG work?

To understand how JPEG works, you need to understand a little about your computer display. 

An image generated on you computer can be thought of as a table.  This table consists of cells, along with the standard rows and columns associated with any table.

|  |Columns |

|Rows |cell |

| |cell |

| |cell |

| |cell |

| |cell |

| | |

| |cell |

| |cell |

| |cell |

| |cell |

| |cell |

| | |

| |cell |

| |cell |

| |cell |

| |cell |

| |cell |

| | |

| |cell |

| |cell |

| |cell |

| |cell |

| |cell |

| | |

 

These cells are filled with what your computer display calls "pixels".  So if you have your display set at 640 x 480, you have 640 columns and 480 rows of pixels being displayed.  For uncompressed images, your computer assigns values for EACH pixel.  For instance, pixel 1-1= color 00000001, pixel 1-2= color 00000002, pixel 1-3= color 00000034,....pixel 640-480= color 00000085.

At true color, your computer monitor displays ~24million colors.  So I arbitrarily assigned numbers 00000001 - 24000000 to these colors.  Your eye is unable to distinguish...let's say color 00100000 from color 00100001.  Compression attempts to make noticeable differences the only change visible.  For instance, why should a computer display color 000000001 and 00000002 when you can't see the difference?  If it displayed 00000001 for both pixels, you most likely would not be able to tell the difference.

Understanding DVD and CD Formats

By Stephen K. Weaver

All DVD formats have features in common and can be used for similar tasks: data storage, video, and audio recording. But even though these discs look exactly the same, not all will work in every television, DVD player, or DVD computer drive. All DVD formats (and CD formats) have significant technical differences, but here we'll focus on why the differences exist and how they affect you. Why are there so many DVD formats? It all boils down to the DVD manufacturers. Each manufacturer chooses or develops a specific DVD format; that's what they invest in, produce, then market.

The DVD format is labeled in the ad or on the box of any DVD hardware or disc media you plan to purchase. Pay attention when you buy so your discs match your DVD players, DVD recorders and DVD software. Recently, manufactures have produced hardware that can handle multiple DVD formats, but until these are in wide use, you'll want to pay attention to what you purchase.

Here's an overview of DVD acronyms (and CD acronyms) and main functions that set each apart.

DVD-R

Digital Video Disc Reader-discs can be written to once. DVD-R is used for most home DVD players and movies. Our reviewers prefer the DVD-R format for copying movies. Pioneer developed this format. DVD-R is also supported by Panasonic, Toshiba, Apple Computer, Hitachi, NEC, Pioneer, Samsung and Sharp and the DVD Forum.

DVD-R(W)

Digital Video Disc Reader & Writer-based on the DVD-R format but can be written to 1,000 times.

DVD+R DL (Dual Layer)

This type of DVD has 2 layers allowing it to hold up to twice the amount of Data (9.5GB). You may burn data to this type of disk only if you have a dual layer DVD burner.

DVD+R

Digital Video Disc Reader-discs can be written to once. This format is a little faster for accessing data and a bit more expensive. Our reviewers prefer the DVD+R format for data recording. This format was developed after DVD-R and is in competition with that format. Eventually, one of the two formats may dominate the marketplace and eliminate the other. The + format is supported by Sony, Philips, HP, Dell, Ricoh, Yamaha and other manufacturers.

DVD+RW

DVD+RW is a re-recordable format based on the DVD+R format. The data on a DVD+RW disc can be erased and recorded over 1,000 times without damaging the medium. DVDs created by a +R/+RW device can now be read by most commercial DVD-ROM players.

DVD-RAM

Digital Video Disc Random Access Memory-discs can be written to 100,000 times. This format is different than DVD-R and DVD+R. The drive and media are much more expensive. These discs are often encased in plastic for resilience. These discs are popular for professional DVD video editing and other applications requiring multiple rewrites, edits or backups.

DVD-ROM

Digital Video Disc Read Only Memory is the format of commercial DVDs that are stamped, not burned with a DVD writer. DVD movies you buy or rent are DVD-ROM format.

CD-R

Compact Disc Reader discs dominate the CD-writing market. They hold up to 700 MB of data and can be written to once. The spindle of blank CDs you buy to back up your music collection are CD-R.

CD-RW

Compact Disc Reader & Writer discs are based on the CD-R format and can be written to 1,000 times.

How are movies stored on DVD discs?

Even though the storage capacity of a DVD is huge, the uncompressed video data of a full-length movie would never fit on a DVD. In order to fit a movie on a DVD, you need video compression. A group called the Moving Picture Experts Group (MPEG) establishes the standards for compressing moving pictures.

When movies are put onto DVDs, they are encoded in MPEG-2 format and then stored on the disc. This compression format is a widely accepted international standard. Your DVD player contains an MPEG-2 decoder, which can uncompress this data as quickly as you can watch it.

A movie is usually filmed at a rate of 24 frames per second. This means that every second, there are 24 complete images displayed on the movie screen. American and Japanese television uses a format called National Television Standards Committee (NTSC). NTSC displays a total of 30 frames per second; but it does this in a sequence of 60 fields, each of which contains alternating lines of the picture. Other countries use Phase Alternating Line (PAL) format, which displays at 50 fields per second, but at a higher resolution (see How Video Formatting Works for details on these formats). Because of the differences in frame rate and resolution, an MPEG movie needs to be formatted for either the NTSC or the PAL system.

The MPEG encoder that creates the compressed movie file analyzes each frame and decides how to encode it. The compression uses some of the same technology as still image compression to eliminate redundant or irrelevant data. It also uses information from other frames to reduce the overall size of the file. Each frame can be encoded in one of three ways:

• As an intraframe, which contains the complete image data for that frame. This method of encoding provides the least compression.

• As a predicted frame, which contains just enough information to tell the DVD player how to display the frame based on the most recently displayed intraframe or predicted frame. This means that the frame contains only the data that relates to how the picture has changed from the previous frame.

• As a bidirectional frame. In order to display this type of frame, the player must have the information from the surrounding intraframe or predicted frames. Using data from the closest surrounding frames, it uses interpolation, which is sort of like averaging, to calculate the position and color of each pixel.

Depending on the type of scene being converted, the encoder will decide which types of frames to use. If a newscast were being converted, a lot more predicted frames could be used because most of the scene is unaltered from one frame to the next. On the other hand, if a very fast action scene were being converted, in which things changed very quickly from one frame to the next, more intraframes would have to be encoded. The newscast would compress to a much smaller size than the action sequence. This is why the storage capacity of digital video recorders (which store video on a hard drive using the MPEG format) can vary depending what type of show you are recording.

If all of this sounds complicated, then you are starting to get a feeling for how much work your DVD player does to decode an MPEG-2 movie. A lot of processing power is required -- even some computers with DVD players can't keep up with the processing required to play a DVD movie.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download