Polytec - NYU Tandon School of Engineering



Polytechnic University, Dept. Electrical and Computer Engineering

EE4414 Multimedia Communication System II

Fall 2005, Yao Wang

___________________________________________________________________________________

Second Exam (12/8, 11:00-12:50)

Closed-book, 1 sheet of notes (single or double sided) allowed, no peeking into neighbors!

SOLUTION

1. Video coding standards (20 pt)

a) Describe two features incorporated in the H.263 video coding standard that helped to improve the coding efficiency over the earlier H.261 video coding standard. (2.5pt for each feature)

i) Using half-pel accuracy motion estimation instead of integer-pel. ii) Variable block size for motion compensation. (allow a 16x16 block to be divided into 4 8x8 blocks, and estimating motion vector for each 8x8 block separately. This is helpful when the 16x16 block includes two objects moving differently).

If a student list another legitimate difference, it is acceptable too.

b) Why do MPEG-1 and MPEG-2 use the GOP structure with periodic I-frames? (2 pt) For video conferencing or video phone applications, can the encoder insert I-frames periodically? What may be the problem? (3 pt)

The GOP structure enables random access, which is important for video broadcasting, video streaming, and DVD playback applications, which are the targeted applications of MPEG1/2. Inserting I-frames periodically generally cause the bitstream to have spikes at I-frames. When the bit stream is sent through a constant rate channel, the I-frame data will take longer time to send, this will cause variable delay at the receiver. In order to display the video at constant frame rate, a large smoothing buffer is needed at the receiver. This will significantly increase the delay time between when a frame is sent at the sender and when it is decoded and displayed. The delay may exceed several seconds. For video distribution applications targeted by MPEG1/2, this delay is typically acceptable. However, for video conferencing/telephony applications, the acceptable delay is between 150 ms and 400 ms. Therefore, inserting I-frames periodically is not advisable for video conferencing/telephony applications.

c) What is scalable coding? (2 pt) Why is it beneficial for video streaming applications? (3pt)

Scalable coding generates, for each group of video frames, a bit stream that can be truncated either at any point or at several defined points. When a user receives a truncated bit stream, he/she will see a correspondingly lower video quality (either in spatial resolution, temporal frame rate, or color accuracy, or a combination of these). In video streaming applications, the same video is often requested by users with different access bandwidth or decoding/display capability. Without scalable coding, multiple versions of this video has to be encoded at different bit rate, with different spatial/temporal resolutions. With scalable coding, only a single scalable bit stream needs to be generated. Based on the users available bandwidth and decoding/display capability, only a partial set of the bit stream needs to be delivered. In a broadcast/ multicast application, the complete bit stream will be sent to a multicast tree, but different nodes of the tree may choose to deliver only parts of the bit stream depending on the available bandwidth below that node. (Note that discussion on multicast/broadcast is not required)

d) What is object-based coding? (2pt) What are the three types of information specified for each object? (2 pt) Which standard uses object-based coding? (1pt)

Object-based coding refers to coding different moving objects that may exist in a video separately, so that the decoder can access the bits for different objects separately. The coder can choose to decode certain objects but not others, or display decoded objects with different view angles. The three types of information specified for each object are: shape, motion, texture (color variation). MPEG-4 uses object-based coding.

2. Digital TV systems (10 pt)

a) Describe the major components in the US ATSC system and the method used for each component. (5 pt)

The US ATSC system includes audio coding, video coding, data multiplexing, channel coding and modulation. Audio coding uses Dolby AC3 standard. Video coding follows MPEG2-video standard, using either mp@hl or mp@ml. Multiplexing is done following the MPEG2-system standard. Channel coding is realized by concatenating an outer Reed-Solomn code with a trellis code, with a data interleaver in between. Modulation is accomplished using 8-VSB, which uses 8-ASK for mapping from digital to analog waveforms and use VSB to reduce the bandwidth to 6 MHz total.

b) Repeat the same for the Europe DVB system. (5 pt)

The Europe DVB system also consists of 5 components. Video coding and multiplexing follow the MPEG2-video and MPEG2-system standards, as with the US ATSC standard. For audio, stereo sound is the standard format, coded using MPEG2-audio format (but only requires MPEG2 BC mode, which is equivalent to MPEG1 layer 2). Channel coding is quite similar, but the inner code is punctured convolution code. DVB uses a very different modulation technique. It combines QAM with OFDM.

3. (10 pt) Consider the transmission of digital signals over a channel with 5 MHz bandwidth.

a) If we use 8 ASK for modulating the digital bits into an analog waveform, what is the maximum bit rate the channel can support? (5 pt)

A 5 MHz channel can support at most 10 M symbols/s. With 8 ASK, each symbol carries 3 bits, so the maximum bit rate is 30 Mbits/s.

b) Now, suppose we further use a channel code with rate 2/3 to protect the pay load information, what is the maximum bit rate at which the information can be sent? (5 pt)

Maximum information bit rate is 30 * 2/3=20 Mbits/s.

4. (10 pt) Describe how to map a digital signal to an analog waveform using 4-QAM. For the following sequence of bits, 01101100, sketch the resulting analog signal.

5. TCP/UDP/RTP (15 pt)

a) What information is included in the UDP packet header? (1pt) Can the receiver detect packet loss based on the UDP packet headers? (1pt) How does the RTP packet header enable detection of lost packets? (3pt)

UDP header only includes the sender and receiver port information. Together with the IP header, which includes the sender and receiver IP addresses, these allow the packets to be delivered to the desired port of the desired recipient. However, the receiver cannot detect packet loss based on the UDP headers. The RTP header includes a packet sequence number, increasing sequentially for all packets sent in a connection. The receiver can detect packet loss by observing whether there are gaps in the sequence numbers of received packets (after a certain delay).

b) What types of information can the receiver send to the transmitter in the RTCP receiver reports? (3pt) Propose one way the transmitter can adjust the encoder operation based on the receiver reports? (2pt)

The receiver can send the average packet loss rate and delay observed at the receiver for the past time period. When the transmitter observes increased packet loss rate or delay, it can reduce the encoding rate to relieve the congestion in the network, or to invoke stronger error resilience in the video coder so that lost packets will not lead to severe degradation of the decoded video. It may also increase channel coding redundancy.

c) Describe two things that TCP does but RTP/UDP does not. (3pt) Which one causes more delay? (1pt) Which one is more suitable for audio and video delivery? (1pt)

TCP requires the receiver to send acknowledgement for every packet received. TCP will retransmit packets which are not acknowledged after a certain defined time-out time. TCP has a connection establishment phase before data transmission starts. TCP performs flow control to avoid the receiver buffer overflow, and congestion control to avoid congestion at intermediate routers. RTP/UDP does not do any of this. The transmission mechanism of TCP makes it to have unpredictable delay, which can be very long if the network is overloaded. RTP/UDP by itself does not invoke retransmission (although retransmission can be invoked at the application layer). Thus it does not introduce more delay than UDP. For either interactive conferencing/telephony or streaming applications, RTP/UDP is more suitable, because it incurs lower delay.

6. (10 pt) Why does the receiver of a video streaming session need a buffer? (2 pt) What is the advantages of using a large buffer? (1 pt) What is the disadvantage? (1pt) What is the application-layer protocol developed for Internet video streaming? (2 pt) What does the protocol govern? (2 pt) What protocols can be used for the data transport? (2pt)

The packets of a video streaming session often arrives with different delays due to variability in the network loading (also different packets may be delivered over different paths). This delay variation is called delay jitter. For our discussion purpose, let us assume each packet contains data for one video frame. If the decoder immediately decodes and displays each received packet, the displayed video will have a variable frame rate due to delay jitter. In order to display the decoded video at constant frame rate, a buffer is needed to store all received packets as they arrive. The decoder will take out packets from the buffer at a constant rate, decode and display them at a constant rate. It is possible that when a packet may arrive later than its scheduled decoding time. That packet will be considered lost even if it arrives later. (if a student answers “To overcome delay jitter of arrived packets”, it is fine)

By using a large buffer, longer delays are allowed for all packets, and fewer packets will be droped because they are late. The displayed video will have better quality. However, this also means that a user has to wait longer after issuing a request to see a video.

RTSP is the application layer protocol for video streaming. It governs the interaction between streaming server and clients, and enables connection set up, pause, rewind, stop, etc. It allows either TCP, UDP, or RTP/UDP at the transport layer.

7. (10 pt) What is the acceptable delay for effective audio/video phone/conferencing applications? (3pt) What layer does the SIP protocol resides? (2pt) What is the primary functions of the SIP protocol? (3pt) What transport layer protocols can it work with? (2pt)

150 ms is definitely acceptable. Up to 400 ms can be tolerated.

SIP sits at the application layer. It performs call setup, including finding the current IP address of the callee. It also allows exchanging information regarding acceptable audio/video codecs at each end. It can work with either TCP, UDP, or RTP/UDP at the transport layer.

8. (15 pts) Describe the similarity and differences between watermarking, fingerprinting, and steganography, in terms of intended applications and design objectives. For ease of discussion, assume the cover media is an image. ( 5pt each)

Watermarking is a special type of data hiding, where the embedded data carries information that can be used to prove the ownership or integrity of the cover media. Typical applications including ownership assertion, copy prevention and control, detection of unauthorized alteration. A major design objective is that the watermark must be robust to malicious attacks.

Fingerprinting is a special application of watermarking, which embeds a different watermark (an ID or fingerprint) in each legal copy of a protected document. In the event that a document with a fingerprint is duplicated and distributed illegally, the original distributor can try to identify the embedded fingerprint from the many illegal copies, to trace who was the original buyer of the document. As with the general watermarking applications, fingerprinting requires the watermark be robust to malicious attacks. In addition, the watermark must withstand “collusion”, which refers to the generation of a version of the original document from many legal copies of the original document with different fingerprints.

Steganography is also another special type of data hiding, wherein the embedded data is a secrete message that is typically unrelated to the cover media. The cover media can be chosen randomly or to facilitate the conveyance of the secrete message. Because multimedia documents (audio or image or video) have a lot of redundancy, these documents are often chosen as the cover media for embedding secret messages (which themselves could be an image or plain text or encrypted text). The main design objective for steganography is to minimize the detectability of the embedded watermark (secret message). With watermarking, the embedding algorithm should be designed so that the embedded watermark is robust to benign common operations and malicious attacks. This is usually not a concern for steganography.

-----------------------

[pic]

With the above mapping, the test sequence 01101100 will have the following waveform:

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download