Northwestern University



IEEE MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE

E-LETTER

Vol. 4, No. 10, November 2009

CONTENTS

Message from Editor-in-Chief 2

HIGHLIGHT NEWS & INFORMATION 3

IEEE MMTC Meeting Agenda 3

Call For Nominations: Editor-In-Chief IEEE MMTC E-Letter 4

IEEE GLOBECOM 2010 Call for Tutorial Proposals 5

IEEE GLOBECOM 2010 Call for Workshop Proposals 6

SPECIAL ISSUE ON NEW RESEARCH TRENDS 7

Toward Next Generation of Multimedia Computing and Networking 7

Guest Editor: Guan-Ming Su, Marvell Semiconductor, USA 7

Trends in Multimedia Communications Over Mobile Networks 9

Hamid Gharavi (IEEE Fellow), National Institute of Standards & Technology 9

Searching Music in the Emotion Plane 13

Yi-Hsuan Yang and Homer H. Chen (IEEE Fellow), National Taiwan University, Taiwan 13

Distributed Optimization for Wireless Visual Sensor Networks 17

Yifeng He and Ling Guan (IEEE Fellow), Ryerson University, Canada 17

A New Generation of Wireless Multimedia Link-Layer Protocols 19

Hayder Radha (IEEE Fellow), Michigan State University, USA 19

TECHNOLOGY ADVANCES 30

Four Suggestions for Research on Multimedia QoE Using Subjective Evaluations

Greg Cermak, Verizon Labs, USA 30

From Cross-Layer Optimization to Cognitive Source Coding for Multimedia Transmission: Adapting Content Formats to the Network 34

Simone Milani, University of Padova, Italy 34

Focused Technology Advances Series 37

Application Layer QoS Provisioning for Wireless Multimedia Networks with Cognitive Radios 37

F. Richard Yu, Carleton University, Canada 37

MMTC COMMUNICATIONS & EVENTS 40

Call for Papers of Selected Journal Special Issues 40

Call for Papers of Selected Conferences 41

[pic] [pic]

Message from Editor-in-Chief

Welcome to the November Issue of E-Letter! First I would like to call your attention to a few notices posted in the pages 3-4. The winter and holiday seasons are coming soon; it is time for many members to plan for their family vacations, Hawaii could be a warm and wonderful place to consider, even not mentioning that the IEEE GLOBECOM 2009 will be held there during Nov. 30-Dec. 4. Our MMTC meeting will be held during the conference, and all members are invited to attend (please see our MMC Chair, Dr. Qian Zhang’s message on page 3). On the other hand, it is time to nominate the new Editor-in-Chief for this E-Letter, please send your nomination to Qian by Nov. 15 (the Call for Nominations is posted on page 4).

Then I would like to thank Dr. Guan-Ming Su (Marvell Semiconductors, USA), who continues his earlier efforts in the September Issue to put together a second wonderful Special Issue on New Research Trends, with four invited position papers contributed from world top scientists in the field. Please check out this special issue starting from Dr. Su’s Guest Editorial on page 7.

In the following article, Dr. Greg Cermak (Verizon Labs, USA) gives positions, directions and suggestions on the consumer research and subjective quality evaluation for multimedia communication applications, based on his many years’ experience in VQEG and other quality standardization efforts. After that, Dr. Simone Milani (University of Padova, Italy) introduces a new term “Cognitive Source Coding” in his paper, which refers to a cognitive-radio-like technology that receives a description of the network conditions from the lowest layers in the protocol stacks and adopts the most appropriate source coding solution from a set of possible choices. It differentiates from cross-layer design in that the later jointly tunes the transmission parameters at different layers without changing the structure of the involved coding architecture while cognitive source coding implies reconfiguring the architecture of the source coder depending on network status. Please read more details in his paper.

In the focused technology column, Dr. F. Richard Yu (Carleton University, Canada) demonstrates an integrated approach to optimize application layer QoS for wireless multimedia communication networks. In their effort, the multimedia intra-refreshing rate is jointly optimized with access strategy and spectrum sensing for media transmission in a cognitive radio network.

As always, I thank all Editors of the E-Letter, and our authors to make this issue successful.

Thank you very much.

Haohong Wang

Editor-in-Chief, MMTC E-Letter

HIGHLIGHT NEWS & INFORMATION

IEEE MMTC Meeting Agenda

Nov. 30 – Dec. 4, 2009

Honolulu, Hawaii

Dear all the MMTC members,

It is excited that we will have another MMTC meeting coming soon in Globecom 2009 in beautiful Hawaii, USA, from Nov. 30 to Dec. 4. I am looking forward to seeing all of you there for our MMTC meeting, which has a draft agenda as follows.

0. Informal discussion and networking time

1. welcome new members /introduction

2. Last meeting minutes approval (ICC 2009)

3. MMTC Best Paper Award 2009 winner announcement

We have 2 papers got this award this year and the authors of the following paper decide to receive their plaques in Globecom 2009

B. Li, S.-S. Xie, G. Y. Keung, J.-C. Liu, I. Stoica, H. Zhang and X.-Y. Zhang, "An Empirical Study of the Coolstreaming System," IEEE Journal on Selected Areas in Communications, Special Issue on Advances in Peer-to-Peer Streaming System, 25(9):1627- 1639, December 2007.

4. Report on Conferences activities

CCNC 2010

Globecom 2009

ICC 2010

Globecom 2010

5. Recent change for ICME

6. MMTC IGs Reports - all IG chairs

7. Sub-committees Report

8. Report for News Letter activity and call for nomination for future EiC

9. Suggestions & discussions – everyone

10. Adjourn

Looking forward to seeing you in Hawaii, USA soon.

Cheers,

Qian Zhang

IEEE MMTC Chair

Call For Nominations: Editor-In-Chief IEEE MMTC E-Letter

Thanks for Haohong (current EiC)’s great efforts, our MMTC has continued our E-Letter in an impressive way. You can check all the E-Letters from .

The term of the current editor-in-chief (EIC) of IEEE MMTC E-Letter is coming to an end by Jan. 2010, and we have set up a nominating committee to assist in selecting the next EIC. The EIC is responsible for maintaining the highest editorial quality, for setting technical direction of the papers published in E-Letter, and for maintaining a reasonable pipeline of articles for publication.

Nominating committee member list:

Heather Yu, heathery.yu@

Wenjun Zeng, zengw@missouri.edu

Madjid Merabti, m.merabti@livjm.ac.uk

Rob Fish, rob.fish@

Shueng-Han Gary Chan, gchan@cse.ust.hk

Haohong Wang, haohongwang@

Qian Zhang, qianzh@cse.ust.hk

I would like take this opportunity to invite all the members send your nomination to any of the committee members. Really appreciated for your help to identify capable MMTC member to promote our E-Letter to next level.

Best,

Qian Zhang

IEEE MMTC Chair

[pic]

IEEE GLOBECOM 2010 Call for Tutorial Proposals

IEEE GLOBECOM 2010 opens all tutorial/lecture sessions to the conference attendees for FREE. We invite submission of tutorial proposals for either 3.5-hour length (speak on Dec. 6th and 10th) or 1.5-hour length (speak on Dec. 7-9) overview presentations on topics of interest of the conference. No more than ONE speaker is recommended for each 1.5-hour lecture session. The proposals will be evaluated using the criteria of importance, timeliness, and conference coverage of the topic, track record of the instructor, and previous history for instructing tutorials. The final decisions on accepting tutorial proposals will also reflect space limitations at the conference venue.

Required information in the proposal (up to 3 pages):

• Title and Abstract of the lecture

• Detailed outline of topics covered

• Preferred length of the lecture

• Full contact information of speakers

• Biography of speakers

• History of the tutorial presentation

Important Dates:

• Submission: 15 December, 2009

• Decision Notification: 15 January 2010

All proposals are required to be submitted in PDF format via EDAS, which requires the registration of the proposal first by the title, keywords, authors' names, and an abstract. Only the complete proposals with all required information would be considered. Once a proposal is accepted, we will work with the speak(s) on a contract, which defines the remuneration, copyright, cancellation policy and so on. Please address all your questions regarding to Tutorials to GLOBECOM 2010 TPC Vice-Chair:

Dr. Khaled El-Maleh

[pic]

IEEE GLOBECOM 2010 Call for Workshop Proposals

The IEEE GLOBECOM 2010 features advanced workshops on December 6th and 10th, 2010 to explore special topics and provide international forums for scientists, engineers, and users to exchange and share their experiences, new ideas, and research results. The proceedings of the workshop program will be published by IEEE Communications Society and IEEE Digital Library.

Topics covered in workshops will include, but are NOT limited to:

• Cloud Computing and Communications

• Cognitive communications and networks

• Smart Grids

• Next Gen communications & networks

• Social networking

• Satellite and space communications

• Vehicular communications and networks

• Service-oriented Internet

• Broadband Communications

• Mobile and Ad Hoc Networks

• Wireless Networking

• Internet Quality of Service

• Security of Communication Networks

• Multimedia Communications & Services

• Wearable and Pervasive Computing

• Ubiquitous and Intelligent Services

• Distributed and Mobile Computing

• Internet Architectures & Services

• Grid and P2P Computing

• Cyber-physical computing

Required information in the proposal (up to 4 pages):

• Title of the workshop

• Workshop scope and dates

• Full contact of workshop organizer(s)

• Track record of workshop organizer(s)

• Expected number of paper submissions

• Draft Call for paper of the workshop

• Tentative list of TPC members

Important Dates:

• Submission: 15 December, 2009

• Decision Notification: 15 January 2010

Early-bird proposals (submitted before November 15, 2009) are highly encouraged and would be notified decision within a month from the receipt of your proposal. For each approved workshop, at least one organizer must commit to register and monitor the onsite workshop operation. All proposals as well as your questions should be sent to GLOBECOM 2010 TPC Vice-Chair:

Prof. Xiaobo Zhou < zbo@cs.uccs.edu>

o

SPECIAL ISSUE ON NEW RESEARCH TRENDS

Toward Next Generation of Multimedia Computing and Networking

Guest Editor: Guan-Ming Su, Marvell Semiconductor, USA

guanmingsu@

With the rapid growth of computation power and communication system, multimedia computing and networking have become vibrantly attractive since underlying foundations provide new capabilities to express natural human presentation. However, multimedia exhibits dramatically dissimilar characteristics than the traditional data in many perspectives, including the presentation form/timing, computing/coding elements, and transmission mechanism. To provide satisfactory quality of experience for multimedia service, we face more difficult and diverse challenges than we ever faced in the data domain before. Continuing our special issue in this September, in this issue, we invite top-notch researchers to analyze the new research treads on multimedia computing and networking, and provide their valuable suggestions.

The first article, “Trends in multimedia communications over mobile networks” by Hamid Gharavi, discusses the challenging and corresponding solutions in transmitting multimedia over mobile networks. More specially, the author starts to examine the impact of channel fading, co-channel interference, and packet loss on video quality. Then, the author presents the recent research trends, such as MIMO technologies, error control coding method in the OSI protocol stack (including cyclic-redundancy check, forward error correction, and unequal error protection), to resolve the aforementioned problems. The author also addresses the research potential of network coding in the unreliable multi-hop environments. Finally, the needs to consider tradeoff and cross-layer optimization among overhead, latency, and performance for multimedia over mobile networks are stated.

Multimedia retrieval has become an important topic owing to fast growing amount of multimedia content. The second article, “Searching music in the emotion plane” by Yi-Hsuan Yang and Homer Chen, introduces a new music search method based on emotion plane. Unlike the traditional method to categorize the emotion into discrete classes, the authors propose to use 2-D real-plane to represent the emotion. The further applications based on this approach are addressed. The authors also indicate the open issues and future research directions on the music emotion recognition.

Wireless visual sensor network has become an important research topic owing to its new wide range of applications. The third article, “Distributed optimization for wireless visual sensor networks” by Yifeng He and Ling Guan, overviews the design consideration and tradeoff, such as power, lifetime, video quality, time-varying channel) in such a network, and formulates the whole network as an optimization problem. Since each node only knows its neighbor's information, the authors suggest that a distributed optimization framework is a more efficient and effective solutions.

As the level of heterogeneity and high-bandwidth requirements of multimedia applications increase radically, the popular adopted link-layer protocols, mainly ARQ, cannot provide satisfactory quality of experience at the end-user. The fourth article, “A new generation of wireless multimedia link-layer protocols” by Hayder Radha, highlights the importance to achieve both reliability and stability in the wireless link layer for both real-time and delay-insensitive applications. The author first analyzes the disadvantages of existing ARQ-based protocols and reviews the solutions to overcome the shortcomings. Based on the discussion, the author presents the framework of next generation wireless link layer protocol to satisfy both the reliability and stability requirement.

As illustrated in this special issue, research in multimedia computing and networking has gained significant interest. There are more new challenges that will require ground-breaking solutions for those emerging applications. We would like to thank all the authors for their contribution and hope these articles can stimulate further research works on the area of multimedia computing and networking.

[pic]

Guan-Ming Su received the B.S.E. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1996 and the M.S. and Ph.D. degrees in electrical engineering from the University of Maryland, College Park, USA, in 2001 and 2006, respectively.

He was with the R&D Department, Qualcomm, Inc., San Diego, CA, during the summer of 2005, and with ESS Technology, Fremont, CA, in 2006. He is currently with video R&D department in Marvell Semiconductor, Inc., Santa Clara, CA. His research interests are multimedia communications and multimedia signal processing.

Dr. Su is an associate editor of Journal of Communications and a guest editor in Journal of Communications special issue on Multimedia Communications, Networking, and Applications. He serves as the Publicity Co-Chair of IEEE GLOBECOM 2010.

Trends in Multimedia Communications Over Mobile Networks

Hamid Gharavi (IEEE Fellow), National Institute of Standards & Technology

gharavi@

Increasing demand for high quality multimedia services has been a driving force in the technological evolution of high bandwidth wireless/mobile communications systems and standards. The most challenging aspect is the support of higher quality video, which requires a higher bandwidth. Although the operational third generation (3G) wireless systems may be capable of handling low data rate video [1], [2], the problem is that mobile cellular technology is not ready to offer reliable real-time video services. In addition, as the proliferation of mobile video accelerates, the next generation wireless communication systems must aim at providing higher per user data rate services to support higher quality real-time audio/video services, especially as new applications, such as ubiquitous on line journalism and live citizen reporting, are emerging.

It is important to note that in mobile environments a higher per user bandwidth does not necessarily guarantee a higher video quality reception. Thus, the major technical challenges will be to cope with frequency-selectivity fading due to the use of larger bandwidths. As far as transmission at the physical layer is concerned, schemes such as OFDM (Orthogonal Frequency Division Multiplexing) and MC-CDMA (Multi Carrier-Code Division Multiplexing), which are capable of providing frequency diversity, can indeed enhance robustness to frequency selective fading.

In addition, recent technological breakthroughs in Multiple Input Multiple Output (MIMO) techniques [3], [4], i.e. providing space-time diversity, have already made a significant impact to transmission over mobile channels in terms of reliability and throughout. Needless to say, a deep fade in all the wireless channels may still erase some of the information and consequently, the use of error control coding - particularly for real-time multimedia - remains a major research topic in mobile communications.

Error control coding methods, such as error detection and error correction have been traditionally applied at the physical layer in most digital cellular communication systems (also referred to as channel coding). In the case of the IEEE 802 standard for Wireless Local Area Networks (WLAN), for instance, the physical and link layers are responsible for handling error control coding for IP packets. This includes a 16-bit CRC (cyclic redundancy check) error detection field at the physical layer and packet retransmissions via the MAC (Medium Access Control) sub-layer.

However, in a packet based wireless network environment where communications may need to be integrated into the framework of the OSI (Open Systems Interconnection) model, the error control coding strategy is a challenging issue. This is mainly a consequence of the layering structure in the OSI protocol stack, where the protection against packet loss is handled separately and independently for each layer. For instance, in addition to the link and physical layers, error control coding in the form of Forward Error Correction (FEC) is also applied at the application layer. For video transmission particularly, a combination of multi-layer coding and FEC with a differing level of protection for each layer (also known as unequal error protection: UEP), has been an effective approach for transmitting video over multipath fading mobile channels [5], [6].

Nonetheless, when the channel condition is unknown and there are multiple receivers, the use of a fixed rate error control coding could be wasteful and unreliable. Recently, a new generation of rate-less codes, such as raptor codes [7], has been considered for file download in the Digital Video Broadcast for Handheld (DVB-H) standard [8]. In the raptor code, which is a category of the Fountain code, the encoder can generate as many encoded symbols as needed from a block of data on the fly. Raptor codes have the advantage over traditional fixed rate erasure codes, such as Reed-Solomon codes, for their ability to manage the overhead when channel conditions are unknown.

The flexibility and reliability of the Raptor code for UEP has been studied by a number of researchers in recent years [9], [10]. IETF RFC 5053 [11] defines procedures for generating the Raptor FEC and its application for the reliable delivery of data objects. In addition, for Digital Video Broadcast two-layer protection for real-time multimedia data has been recently proposed, where the first layer (base layer) is protected by the 1-D interleaved parity code, and the enhancement layer is produced by the Raptor code [12]. It should be noted that in this AL-FEC approach, the source packets are carried in separate RTP (Real-time Transport Protocol) streams. The next layer in the protocol stack is the transport layer. Between the two popular transport protocols, TCP and UDP (User Datagram Protocol), UDP is the preferred protocol. Nonetheless, this protocol, unlike the connection oriented TCP, is the best effort transport protocol. Since UDP cannot provide reliable packet transmission, additional error control coding may be needed to prevent any significant loss of video quality.

Indeed, further coding can be accomplished at the next lower layer in the protocol stack, which is the network layer. This layer is responsible for routing RTP/UDP/IP packets to their destinations. In the case of mobile ad-hoc networks (MANET), these autonomous networks are not quite capable of reliably distributing RTP/UDP/IP packets [13], [14]. One major obstacle is a problem with the dynamically changing network topology. This manifests itself in a frequent route change, consequently causing a potentially long delay [15]. Co-channel interference from other users is another important factor that is problematic and can severely impact the end-to-end throughout performance. While mitigating the effect of interference continues to be an active research topic, a new approach known as network coding, has recently emerged [15]. Its concept is based on performing coding operations in the interior network, rather than just being received and forwarded by the intermediate nodes (routers). By intelligently mixing packets in multicast routing it is possible to enhance the network throughput performance. Although network coding appears to work well for wired networks, the debate on its suitability for unreliable multihop environments for real-time multimedia services is now becoming a hot research topic.

Finally it is worth noting that in contrast to fixed wireless communications, existing mobile networks are still incapable of supporting reliable, interactive, and high quality video services. This is mainly due to a number of factors, such as increasing demands for live video transmissions, bandwidth limitations, co-channel interferences, and ever-changing channel conditions. Developing methods such as space-time diversity for cooperative transmission for multihop networks is becoming the most active research topic in combating multipath fading [16], [17]. In addition, in such environments error control coding techniques still continue to play a major role in supporting high quality multimedia services for the next generation of wireless/mobile networks. As far as their deployment is concerned, the major problem is that they are separately applied within each OSI layer without any attention to the tradeoffs between overhead, latency and performance. This will involve the issue of cross layer optimization in order to maximize their efficiency in accordance with service quality requirements.

References

1. H. Gharavi and S. M. Alamouti, “Multi priority Video Transmission for Third Generation Wireless Communication Systems,” Proceedings of the IEEE, Vol. 87, No. 10, October 1999, pp1751-1763.

2. L. Hanzo, P. Cherriman, J. Streit, “Video Compression and Communications,” H.261, H.263, H.264, MPEG4 and Proprietary Codecs as well as HSDPA-Style Adaptive Turbo-Transceivers, John Wiley and IEEE Press, September 2007

3. S. M. Alamouti. A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications, 16(8): 1451–1458, 1998.

4. V. Tarokh, H. Jafarkhani, and A. R. Calderbank. Space-time block codes from orthogonal designs. IEEE Transactions on Information Theory, 45(5): 1456–1467, 1999.

5. R. Stedman, H. Gharavi, L. Hanzo, and R. Steele, "Transmission of Coded Images via Mobile Channels," IEEE Transactions on Circuits and Systems for Video Technology, CSVT, vol.3, NO. 1, pp.15-26, February 1993.

6. H. Gharavi, "Pilot Assisted 16-QAM for Video Communication," IEEE Transactions on CSVT, Vol. 12, No. 2, February 2002, pp. 77-89.

7. A. Shokrollahi, “Raptor Codes”, IEEE Trans. Inf. Theory, Vol. 52, No. 6, June 2006, pp. 2551–2567.

8. ETSI, TS 102 472 V1.2.1, “Digital Video Broadcasting (DVB); IP Datacast over DVB-H: Content Delivery Protocols,” Dec. 2006.

9. N. Rahnavard, B. N. Vellambi, and F. Fekri, “Rateless codes with unequal error protection property”, IEEE Trans. Inf. Theory, Vol. 53, No. 53, pp. 1521–1532, April 2007.

10. D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki, “Expanding window fountain codes for unequal error protection”, Proc. 41st Asilomar Conf., Pacific Grove, pp. 1020– 1024, 2007.

11. M. Luby, A. Shokrollahi, M. Watson, T. Stockhammer,” Raptor Forward Error Correction Scheme for Object Delivery, IETF, RFC 5053, October 2007 “.

12. A. Begen and T. Stockhammer, “DVB Application-Layer Hybrid FEC Protection,” draft-ietf-fecframe-dvb-al-fec-02, August 11, 2009 “.

13. H. Gharavi, K. Ban, “Multihop Sensor Network Design for Wideband Communications, ”The Proceedings of the IEEE, vol. 91, NO. 8, August 2003, pp. 1221-1234.

14. H. Gharavi ” Control Based Mobile Ad-hoc Networks For Video Communications, IEEE Transactions on Consumer Electronics, Vol. 52, No. 2, May 2006, pp. 383-391.

15. R. Ahlswede, N. Cai, S. Li, and R. Yeung, “Network information flow,” IEEE Transactions on Information Theory, vol. 46, July 2000, pp. 1204-1216.

16. S. Katti, H. Rahul, W. Hu, D. Katabi, M. Médard, and J. Crowcroft, “XORs in the air: practical wireless network coding,” in Proc. ACM SIGCOMM’06, Pisa, Italy, Sep. 11-15, 2006, pp. 243-254.

17. C. Fragouli, J. Widmer, and J.-Y. L. Boudec, “A network coding approach to energy efficient broadcasting: from theory to practice,” in Proceedings of IEEE INFOCOM, Apr 2006, pp. 1-11.

Hamid Gharavi received the Ph.D. degree from Loughborough University, Loughborough, U.K., in 1980. He joined AT&T Bell Laboratories, Holmdel, in 1982. He was then transferred to Bell Communications Research (Bellcore) after the AT&T-Bell divestiture, where he became a Consultant on video technology and a Distinguished Member of Research Staff. In 1993, he joined Loughborough University as Professor and Chair of Communication Engineering. Since September 1998, he has been with the National Institute of Standards and Technology (NIST), US Department of Commerce, Gaithersburg, MD.

Dr Gharavi was a core member of the Study Group XV (Specialist Group on Coding for Visual Telephony) of the International Communications Standardization Body CCITT (ITU-T). He was selected as one of the six university academics to be appointed to the U.K. Government’s Technology Foresight Panel in Communications to consider the future through 2015 and make recommendations for allocation of key research funds. His research interests include video/image transmission, wireless multimedia, mobile communications and third generation wireless systems, and mobile ad-hoc networks. He holds eight U.S. patents related to these topics.

Dr Gharavi received the Charles Babbage Premium Award from the Institute of Electronics and Radio Engineering in 1986, and the IEEE CAS Society Darlington Best Paper Award in 1989. He has been a Distinguished Lecturer of the IEEE Communication Society. In 1992 Dr Gharavi was elected a Fellow of IEEE for his contributions to low bit-rate video coding and research in subband coding for image and video applications. He has been a Guest Editor for a number of special issues. Dr Gharavi served as a member of the Editorial Board of the PROCEEDINGS OF THE IEEE from January, 2003 to December, 2008. He is currently a member of the Editorial board, IET Image Processing. He served as an Associate Editor for the IEEE Transactions on CAS for Video Technology (CSVT) from 1996 to 2006. He then became the Deputy Editor-in-Chief of this IEEE Transactions through December 31, 2009. Dr Gharavi was recently appointed to serve as the new Editor-in-Chief for the IEEE Transactions on CSVT.

Searching Music in the Emotion Plane

Yi-Hsuan Yang and Homer H. Chen (IEEE Fellow), National Taiwan University, Taiwan

affige@, homer@cc.ee.ntu.edu.tw

There have been tremendous efforts and significant progress in providing media streaming over the Internet given its great potential. Music plays an important role in human’s history, even more so in the digital age. Never before has such a large collection of music been created and accessed daily by people. Because almost all music is created to convey emotion, music organization and retrieval by emotion is a meaningful way for accessing music information. The proliferation of tiny mobile devices and the like also calls for content-based retrieval of music through a small display space.

Music emotion recognition (MER) aims at recognizing the affective content of music signals. A typical approach is to categorize emotions into a number of classes (e.g., happy, angry, sad and relaxing) and apply machine learning techniques to train a classifier [1]–[3]. This approach, though widely adopted, faces the granularity issue in practice, because classifying emotions into only a handful of classes cannot meet the user demand for effective information access. Using a finer granularity for emotion description does not necessarily address the issue since language is inherently ambiguous, and the description for the same emotion varies from person to person.

Instead, we propose to view emotions from a dimensional perspective and define emotions in a 2-D plane in terms of arousal (how exciting or calming) and valence (how positive or negative), the two emotion dimensions found to be most fundamental by cognitive study [4]. In this way, MER becomes the prediction of the arousal and valence (AV) values of a song corresponding to a point in the emotion plane [5]–[8]. The granularity and ambiguity issues associated with emotion classes no longer exists since no categorical classes are needed. Moreover, because the 2-D emotion plane provides a simple means for user interface, novel emotion-based music organization, browsing, and retrieval can be easily created for mobile devices.

1. Emotion-Based Retrieval

The advantages of the emotion-based approach are that each music sample can be represented as a point in the emotion plane and that the similarity between music samples can be measured by Euclidean distance. As shown in Fig. 1, a user can retrieve music of a certain emotion by simply specifying a point in the emotion plane. The system then returns the music samples whose AV values are close to the point. A user can also generate an emotion-based playlist by drawing a trajectory in the emotion plane. This way, songs of various emotions corresponding to different points on the trajectory are added to the playlist and played back in order.

One can also couple other musical metadata such as artist name, genre, or lyrics with emotion to narrow down the search range. For example, one can specify an artist, and the system would display all songs of the artist in the emotion plane. It is also possible to playback music that matches the user’s mood detected by using physiological, prosodic, or facial cues [9]. This retrieval paradigm is functionally powerful since people’s criterion is often related to the emotion state at the moment of music selection [10].

[pic]

Fig. 1. With emotion-based music retrieval, a user can retrieve music of certain emotions by specifying a point or drawing a trajectory in the 2-D emotion plane [5], [6]

2. Emotion Recognition

MER can be formulated as a regression problem [5] by viewing arousal and valence as real values in [-1, 1]. Then a regression model can be trained to predict the AV values. More specifically, given N inputs (xi, yi), 1≤ i ≤N, where xi is a feature vector of the ith input sample, and yi is the real value to be predicted, a regression model (regressor) R(·) is created by minimizing the mismatch (i.e., mean squared difference) between the predicted and the ground truth values. Many good regression algorithms, such as support vector regression (SVR) or Gaussian process regression [11], are readily available. In [5], two SVR models are trained for arousal and valence respectively. A schematic diagram of this MER system is shown in Fig. 2.

Usually, timbral, rhythmic, melodic, and harmonic features of music are extracted to represent the acoustic property of a song. Because of its ability to model auditory sensation based on psychoacoustic models, the computer program PsySound [12] is often employed for feature extraction. The use of mid-level features such as chord progression or genre metadata has also been explored [13], [14]. Many features such as loudness (loud/soft), tempo (fast/slow), and pitch (high/low) have been found relevant to arousal, but only few features are relevant to valence. Thus, valence recognition is more challenging than arousal recognition.

Typically a subjective test is conducted to collect the ground truth needed for model training. The subjects are asked to annotate the music pieces by rating their emotion perception of the music pieces using either the standard ordinal rating scale or the graphic rating scale [5], [15]. Because emotion perception is subjective, each music piece is annotated by multiple subjects and the ground truth is set to the average rating.

[pic]

Fig. 2. The schematic diagram of a MER system [5]

3. Challenges

As MER is still in its infancy, there are many open issues. Some major issues and proposed solutions are discussed in this section.

1. Subjectivity of Emotion Perception

Emotion perception is intrinsically under the influence of many factors such as cultural background, generation, sex, and personality. Developing a general retrieval model that performs equally well for everyone is a challenging task. This can be explained via Fig. 3, where each circle corresponds to the annotation of a song in the emotion plane by a subject. Obviously, simply assigning one emotion value to each song in a deterministic manner does not work well in practice because the emotion perception varies greatly from person to person.

The subjectivity issue can be addressed by personalizing the MER system [7], [15]. We can ask a user to annotate a small number of songs and use the annotations to train a personalized model. A two-stage personalization scheme is proposed in [7]. Two models are trained: one for predicting the general perception of a song, and the other for predicting the difference between the general perception and a user’s individual perception. This is a simple personalization process because the music content and the individuality of the user are treated separately. To make it more sophisticated, one can take into account the demographic property, music preference, or listening context of the user in the process.

2. Difficulty of Emotion Annotation

The emotion annotation process of MER requires the subjects to rate the emotion in a continuum. But it has been found that such rating imposes a heavy cognitive load to the subjects [8]. In addition, it is difficult to ensure a consistent rating scale between and within the subjects [16]. As a result, the quality of the ground truth varies, which in turn degrades the accuracy of MER.

To address this issue, ranking-based emotion annotation is proposed [8]. A subject is asked to compare the affective content of two songs and determine, for example, which song has a higher arousal value, instead of the exact emotion values. The rankings of music emotion are then converted to numerical values by a greedy algorithm [17]. Empirical evaluation shows that this scheme relieves the burden of emotion annotation on the subjects and enhances the quality of the ground truth. It is also possible to use an online game to harness the so-called human computation and make the annotation process more engaging [18].

[pic]

Fig. 3. Emotion annotations in the emotion plane for four songs: (a) Smells like teen spirit by Nirvana, (b) A whole new world by Peabo Bryson and Regina Belle, (c) The rose by Janis Joplin, and (d) Tell Laura I love her by Ritchie Valens. Each circle corresponds to the annotation of a song by a subject [7]

3. Semantic Gap Between Audio Signal and Human Perception

The viability of an MER system largely lies in the accuracy of emotion recognition. However, due to the semantic gap between the object feature level and the human cognitive level of emotion perception, it is difficult to accurately compute the emotion values, especially the valence values. What intrinsic element of music, if any, causes a listener to create a specific emotional response is still far from well-understood. While mid-level audio features such as chord, rhythmic patterns, and instrumentation carry more semantic information, robust techniques for extracting such features need to be developed.

Available data for MER are not limited to the raw audio signal. Complementary to music signal, lyrics are semantically rich and have profound impact on human perception of music [19]. It is often easy for us to tell from the lyrics whether a song expresses sadness or happiness. Incorporating lyrics to MER is feasible because most popular songs sold in the market come with lyrics [20]. One can analyze lyrics using natural language processing to generate textual feature descriptions of music. It has been shown that using lyrics indeed improves valence recognition [21], [22].

4. Conclusion

The past decade has witnessed a growing interest in analyzing the affective content of music. In this article, we have described a new music retrieval paradigm that allows users to search music in the emotion plane. It opens up a new playground for advanced research on music emotion recognition and understanding.

Acknowledgments

This work was supported by the National Science Council of Taiwan under the contract number NSC 97-2221-E-002-111-MY3.

References

1] T. Li and M. Ogihara, “Detecting emotion in music,” in Proc. ISMIR, 2003.

2] L. Lu et al, “Automatic mood detection and tracking of music audio signals,” IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 1, pp. 5–18, 2006.

3] X. Hu et al, “The 2007 MIREX audio mood classification task: Lessons learned,” in Proc. ISMIR, pp. 462–467, 2008.

4] R. E. Thayer, The Biopsychology of Mood and Arousal. New York, Oxford University Press, 1989.

5] Y.-H. Yang et al, “A regression approach to music emotion recognition,” IEEE Trans. Audio, Speech and Language Processing, vol. 16, no. 2, pp. 448–457, 2008.

6] Y.-H. Yang et al, “Mr. Emo: Music retrieval in the emotion plane,” in Proc. ACM Multimedia, pp. 1003–1004, 2008.

7] Y.-H. Yang et al, “Personalized music emotion retrieval,” in Proc. ACM SIGIR, pp. 748–749, 2009.

8] Y.-H. Yang and H. H. Chen, “Music emotion ranking,” in Proc. ICASSP, pp. 1657–1660, 2009.

9] T.-L. Wu et al, “Interactive content presenter based on expressed emotion and physiological feedback,” in Proc. ACM Multimedia, pp. 1009–1010, 2008.

10] P. N. Juslin and J. A. Sloboda, Music and Emotion: Theory and Research. Oxford: Oxford University Press, 2001.

11] A. Sen and M. Srivastava, Regression Analysis: Theory, Methods, and Applications. New York, Springer, 1990.

12] D. Cabrera, “PSYSOUND: A computer program for psycho-acoustical analysis,” in Proc. Australian Acoustic Society Conf., pp. 47–54, 1999. .

13] H.-T. Cheng et al, “Automatic chord recognition for music classification and retrieval,” in Proc. ICME, pp. 1505–1508, 2008.

14] Y.-C. Lin et al, “Exploiting genre for music emotion classification,” in Proc. ICME, pp. 618–621, 2009.

15] Y.-H. Yang et al, “Music emotion recognition: The role of individuality,” in Proc. ACM Int. Workshop on Human-Centered Multimedia, pp. 13–21, 2007.

16] S. Ovadia, “Ratings and rankings: Reconsidering the structure of values and their measurement,” Int. J. Social Research Methodology, vol. 7, no. 5, pp. 403–414, 2004.

17] W. W. Cohen et al, “Learning to order things,” J. Artificial Intelligence Research, vol. 10, pp. 243–270, 1999.

18] Y. E. Kim et al, “Moodswings: A collaborative game for music mood label collection,” in Proc. ISMIR, 2008.

19] S. Omar Ali et al, “Songs and emotions: Are lyrics and melodies equal partners,” Psychology of Music, vol. 34, no. 4, pp. 511–534, 2006.

20] J. Fornäs, “The words of music,” Popular Music and Society, vol. 26, no. 1, pp. 37–53, 2003.

21] Y.-H. Yang et al, “Toward multi-modal music emotion classification,” in Proc. PCM, pp. 70–79, 2008.

22] C. Laurier et al, “Multimodal music mood classification using audio and lyrics,” in Proc. ICMLA, pp. 1–6, 2008.

Yi-Hsuan Yang received the B.S degree in Electrical Engineering from National Taiwan University, Taiwan, in 2006. He is currently working toward the Ph.D. degree in the Graduate Institute of Communication Engineering, National Taiwan University. His research interests include multimedia information retrieval and analysis, machine learning, and affective computing. He has published over 20 technical papers in the above areas.

Mr. Yang is a Microsoft Research Asia Fellowship recipient 2008–2009.

Homer H. Chen (S’83-M’86-SM’01-F’03) received the Ph.D. degree in Electrical and Computer Engineering from University of Illinois at Urbana- Champaign, Urbana.

Since August 2003, he has been with the College of Electrical Engineering and Computer Science, National Taiwan University, Taiwan, R.O.C., where he is Irving T. Ho Chair Professor. Prior to that, he held various R&D management and engineering positions with US companies over a period of 17 years, including AT&T Bell Labs, Rockwell Science Center, iVast, and Digital Island. He was a US delegate for ISO and ITU standards committees and contributed to the development of many new interactive multimedia technologies that are now part of the MPEG-4 and JPEG-2000 standards. His professional interests lie in the broad area of multimedia signal processing and communications. 

Dr. Chen is an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology.  He served as Associate Editor of IEEE Transactions on Image Processing from 1992 to 1994, Guest Editor of IEEE Transactions on Circuits and Systems for Video Technology in 1999, and an Associate Editorial of Pattern Recognition from 1989 to 1999.

Distributed Optimization for Wireless Visual Sensor Networks

Yifeng He and Ling Guan (IEEE Fellow), Ryerson University, Canada

yhe@ee.ryerson.ca, lguan@ee.ryerson.ca

A wireless visual sensor network (WVSN) consists of geographically distributed video sensors that communicate with each other over wireless channels. Different from conventional wireless sensor networks, each video senor in the WVSN has a camera component to capture the video and a processing component to compress the video. WVSNs have a wide range of applications, such as video surveillance, emergency response, environmental tracking, and health monitoring [1].

Sensor nodes are typically battery powered, and battery replacement is infrequent or even impossible in many sensing applications. Hence, much research in WVSNs has been focused on maximization of a utility function (e.g., the network lifetime, the video quality) by optimizing the power allocation at each sensor [2].

In a mesh-based WVSN, each video sensor transmits the compressed video stream via the relays of other sensors to a sink for further analysis and decision making. The total power dissipation at a video sensor mainly consists of the encoding power consumption, the transmission power consumption and the reception power consumption. The encoding power consumption takes a major part of the total power consumption [2]. Based on the power-rate-distortion (P-R-D) analytical model [1], the video sensor can either increase the encoding power or increase the source rate to achieve an encoding distortion requirement. However, increasing the encoding power raises the power consumption of the source node. On the other hand, increasing the source rate causes the downstream nodes to consume more power in relaying the traffic. How to allocate the power at each sensor depends on the pre-defined utility function. For example, in order to maximize the network lifetime, which is defined as the minimum node lifetime, the video sensor far away from the sink should encode the video at a lower source rate with a higher encoding power, and the video sensor close to the sink should encode the video at a higher source rate with a lower encoding power. Such allocation enables each sensor to use up its energy almost at the same time, thus prolonging the network lifetime to maximum.

In a WVSN, each node only knows about its neighbors, and does not have global knowledge. Therefore, a centralized algorithm is not appropriate for WVSNs. Instead, distributed algorithms require only local message exchange, matching well with the distributed nature of WVSN. Distributed optimization provides efficient solutions for many resource allocation problems in WVSNs. For example, the network lifetime maximization for a WVSN has been studied in [2], in which the network lifetime maximization problem is formulated to maximize the network lifetime by jointly optimizing the source rate and the encoding power at each video sensor, and the link rates for each session, subject to the constraint of flow conservation and the requirement of the collected video quality. The formulation is a convex optimization problem [3]. Therefore, a distributed solution can be developed using the properties of Lagrangian duality [4].

The formulation of the optimization problem in a WVSN is time-varying due to the channel dynamics and the content dynamics. The channel dynamics are caused by the channel fading or interferences. The content dynamics mean that the P-R-D characteristics are different for different segments of the video [5]. For example, some segments may contain many object motions and require a larger amount of bits to encode, while others may contain only static scenes which require relatively less bits to encode. To deal with the channel dynamics and content dynamics, each video sensor needs to adaptively learn the channel model and the P-R-D model by using the collected channel statistics and source statistics in the near past. Based on the estimated models, a new optimization problem is formulated. Each video sensor performs distributed optimization, and adjusts its outputs (e.g., bit rate, power) toward the optimal solution to the new formulation.

In summary, wireless visual sensor network is a distributed system, in which each sensor node has only a local view. Therefore, distributed optimization can provide an efficient solution to the resource allocation problem in a WVSN.

REFERENCES:

1] Z. He and D. Wu, “Resource allocation and performance analysis of wireless video sensors,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 5, pp. 590–599, May 2006.

2] Y. He, I. Lee, and L. Guan, “Distributed algorithms for network lifetime maximization in wireless visual sensor networks”, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 5, pp. 704-718, May 2009.

3] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge, U.K.: Cambridge Univ. Press, 2004.

4] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proc. IEEE, vol. 95, no. 1, pp. 255–312, Jan. 2007.

5] Z. He, W. Cheng, and X. Chen, “Energy Minimization of Portable Video Communication Devices Based on Power-Rate-Distortion Optimization,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 5, pp. 596–608, May 2008.

[pic]

Yifeng He (M’09) received his Ph.D. degree in Electrical Engineering from Ryerson University, Canada in 2008. He is currently an assistant professor at Ryerson University, Canada. His research interests include wireless video streaming, wireless visual sensor networks, peer-to-peer streaming, and distributed optimizations for multimedia communications. He is the recipient of 2008 Governor General’s Gold Medal in Canada, and 2007 Pacific-rim Conference on Multimedia (PCM) best paper award.

[pic]

Ling Guan (S’88-M’90-SM’96-F’08) received his Ph.D. Degree in Electrical Engineering from the University of British Columbia, Canada in 1989. He is currently a professor and a Tier I Canada Research Chair in the Department of Electrical and Computer Engineering at the Ryerson University, Toronto, Canada. He held visiting positions at British Telecom (1994), Tokyo Institute of Technology (1999), Princeton University (2000) and Microsoft Research Asia (2002). He has published extensively in multimedia processing and communications, human-centered computing, machine learning, and adaptive image and signal processing. He is a recipient of 2005 IEEE Transactions on Circuits and Systems Best Paper Award.

A New Generation of Wireless Multimedia Link-Layer Protocols

Hayder Radha (IEEE Fellow), Michigan State University, USA

radha@egr.msu.edu

1. Introduction

Despite the unprecedented success and proliferation of wireless LANs over the past decade, there are few, arguably major, shortcomings in the underlying link-layer protocols of well-established wireless systems. These shortcomings are expected to be exacerbated as the level of heterogeneity and high-bandwidth requirements of emerging multimedia applications increase dramatically. In particular, popular wireless link-layer protocols, such as the retransmission (ARQ) based approach employed by the IEEE 802.11 standard suite [58], are designed to achieve some level of reliability by discarding corrupted packets at the receiver and by performing one or more retransmission attempts until a packet is received error-free or a maximum number of retransmission attempts is reached. In addition to the our contributions [1]-[26] in this area, many other leading research efforts (see for example [27]-[57]) have highlighted the inefficiencies of the retransmission approach used by the current IEEE link-layer protocol and proposed a variety of remedy solutions. Many of these remedy solutions focus on a variety of ARQ-based packet combining schemes and cooperative decoding [52]-[56]; and others employ cross-layer strategies with some form of channel coding that is usually implemented at higher layers, especially at the application multimedia layer [5][6][8][9][12]-[16][57].

Although these and other recent proposed remedies for the wireless link-layer focus on some aspects of the reliability issue, they largely ignore the stability dimension, and they especially ignore the heterogeneous nature/demands of data/applications at higher layers. Meanwhile, we believe that emerging and future wireless networks supporting high-end heterogeneous applications cannot afford piecemeal solutions. Ideally, the wireless link-layer protocol must meet the reliability and stability requirements of all applications (realtime or not) while maximizing throughput. Here, stability can be simply (and coarsely) defined for legacy and realtime multimedia applications as the set of condition(s) that ensure continuous availability of content for processing/decoding at the receiver [1]. Simply stated, instability can be synonymous with the well-known underflow event in multimedia applications; and under limited buffer constraints, overflow events can naturally contribute to the instability of the application as well.

Recently, a new wireless framework, Automatic Code Embedding (ACE) [1] that achieves reliability and stability while maximizing bandwidth efficiency, has been developed. ACE is designed to support a broad range of applications both legacy (i.e., TCP-based) and realtime, including high-end wireless multimedia such as HDTV over wireless, remote presence services, gaming, and immersive applications. We believe that successful wireless link-layer protocols have to (a) address the reliability and stability issues (jointly) for the broad range of heterogeneous applications (real-time and non-real-time) that ride on these protocols; (b) take advantage of “side information” in conjunction with intelligent feedback to maximize throughput; and (c) be flexible and adaptive in face of changing channel conditions and traffic demands.

To that end, ACE is a good starting point for a new generation of wireless link layer protocols that employ rate-adaptive channel coding with some intelligent feedback and possibly with ARQ (when needed). More importantly, ACE is built based on the philosophy of “fixing the problem at the source” with the understanding that the link-layer is the lowest layer where one can address the reliability and stability issues while trying to maximize capacity utilization and associated throughput. It is important to note that, one can envision some form of a “collaborative” physical/link-layer design that achieves the aforementioned reliability and stability requirements for heterogeneous applications. In this article, however, we focus on the wireless link-layer design issues with the understanding that many of the ideas presented here could be done with some cooperation with the physical layer[1].

In this article, we briefly outline the major shortcomings of the current ARQ based link-layer and highlight some examples of leading remedies to overcome those shortcomings. This motivates the need for a new generation of a wireless link-layer. We then provide a chronological account of key protocol designs that led to the ACE framework. Subsequently, we focus on highlighting the salient features of the ACE protocol and its architecture, as a representative of a new generation of wireless link-layer that is designed with reliability and stability considerations in mind.

2. Why do we need a new wireless link-layer protocol?

To motivate the need for a new generation of wireless link layer protocols, we highlight the key issues with the 802.11 link-layer protocol. In particular, one can identify two major shortcomings with such ARQ-based protocols:

1. Inefficient reliability

The 802.11 ARQ approach discards corrupted packets that are mostly error-free, even when there is only a single bit error in a corrupted packet. Hence, the effective throughput of 802.11 systems can be significantly improved. This issue led many efforts to propose new link-layer and cross-layer protocols that utilize corrupted packets (or partial packets) instead of discarding them [5]-[51]. In addition to Hybrid ARQ (HARQ) based methods [40]-[42], examples of recent efforts for combating inefficiencies of ARQ-based wireless protocols include Cross-Layer Design with Side-information (CLDS) [5]-[11], packet combining [29]-[44], Partial Packet Recovery (PPR) [51], ZipTx [39], and Automatic Code Embedding (ACE) [1]. Some of these approaches, such as PPR and packet combining, exploit physical layer information regarding the quality of individual bits to improve the probability or recovering corrupted packets. Others, such as CLDS, ZipTx, and ACE, utilize information available in current 802.11 link-layer protocols in conjunction with error correcting codes to recover corrupted packets.

In particular, Cross-Layer Design with Side-information (CLDS) [5]-[11] demonstrated a significant increase in throughput by utilizing corrupted packets under current 802.11 systems. More importantly, under CLDS, it can be shown that the mere utility of binary side information (packet is corrupted or not), which is available in the current 802.11 link layer protocol, can increase the effective information-theoretic capacity significantly [5]-[7]. More details will be provided about CLDS below.

2. Largely absent stability

The ARQ approach is designed to provide “reliability in the long run”, where information could eventually be delivered to the destination. Even then, the link-layer does not guarantee delivery and the reliability burden (due to wireless errors) is carried by higher layers, especially for applications that require guaranteed delivery. More importantly, ARQ-based 802.11 link-layer and other recent protocols largely ignore the stability aspect of data communications in terms of maintaining a sustainable flow, which is critical for a dynamic and heterogeneous ubiquitous wireless environment. Although many leading efforts have addressed the reliability and associated throughput inefficiency shortcoming of current 802.11 link-layer (as highlighted above), current ARQ and many emerging link-layer protocols rely on (or arguably shift the problem to) higher layers to provide reliable and stable flow control for both realtime and non-realtime traffic. In conjunction with the inefficient reliability approach, this design strategy has led to a great deal of inefficiency in throughput and to other major technical issues and challenges at higher layers. A well-known example is the TCP over-wireless performance degradation phenomenon, which led to major research efforts and numerous studies in attempt to mitigate the shortcoming of the lower layers.

3. The evolution from cross-layer protocols toward a reliable and stable link-layer protocol

The inception of the ACE protocol has been in the making since the early 2000 [8] [22][23][24]. Several research tracks have contributed to developing an insightful protocol design. For example, comprehensive studies regarding the measurement, analysis, and modeling of the “MAC-to-MAC channel” error process [17]-[24] provided an underlying foundation for designing better reliable wireless protocols at the MAC/link layer and above. Here, the “MAC-to-MAC channel” or simply the “MAC channel” (or the “link-layer channel”) is an abstraction where one can include the physical layer and its underlying channel within such abstraction. Hence, errors detected at the MAC/link layer represent the error characteristics of such abstract channel. These MAC channel errors are referred to as residue errors¸ and they represent errors that are not corrected by the error-correcting capabilities of the physical layer.

As highlighted above, an important conclusion of recent studies in this area is that the simple ARQ strategy is very inefficient under realistic channel conditions. This led to the intuition that wireless systems are better off by simply utilizing corrupted packets instead of dropping them [7]. One major early direction under this intuition is the idea of passing corrupted packets to higher layers, where further and more efficient reliability functions (e.g., application-layer FEC) can be applied to the corrupted packets [59][23][7][5]. This opens the door for a variety of cross-layer protocols that perform significantly better when compared with the conventional ARQ protocol.

More specifically, one can consider different “MAC-to-MAC” channel models as shown in Figure 1. Each of these channel models maps a MAC/link layer reliability protocol to a corresponding abstract channel. The first channel is based on the ARQ protocol, which is simply mapped into an erasure channel model due to packet drops that are induced by the underlying wireless channel errors. Let this channel have an information-theoretic capacity [pic], which represents the maximum amount of information one can convey per transmitted symbol. Now let’s consider a second channel model as shown in Figure 1b. Under this channel model, the receiver (i) does not drop a corrupted packet when the errors are within the payload only; but (ii) drops a packet when one or more errors impact the header (regardless if the payload is error-free or not). This channel model, which we refer to as a Cross-Layer Design (CLD) channel, is a hybrid error-erasure channel since some packets are corrupted (with errors) and some packets are dropped (due to errors in the header) [5]-[7]. Let the capacity of this second channel be [pic]. It is important to note that under the CLD channel model, the MAC/link layer does not provide any side information about the status of the packet (corrupted or not) to the receiver. In other words, the receiver is blind about any received packet; and hence, under CLD, the receiver does not know if that packet has errors or if it is error-free. Consequently, one can define a third channel, which is basically CLD in conjunction with binary side-information about the status of the packet (corrupted or not). Let’s assume that this third channel, CLDS, has a capacity [pic].

Now, one can compare the information theoretic capacities among the three channels, [pic], [pic], and [pic] based on the corresponding models. In particular, it can be shown that under realistic channel conditions, the capacity of the cross-layer design channel [pic] (without side information) is “usually” higher than the capacity of the traditional channel [pic]. Here, the ARQ channel may have a higher capacity only if the corrupted packets are severely corrupted. More importantly, one can show that the capacity [pic] of the cross-layer design with side information is “always” higher than the capacity [pic] of the conventional channel: [pic] [5]-[7]. And under realistic channel conditions [pic] is significantly higher than [pic]. This basic and fundamental result clearly leads to the conclusion that any viable wireless link layer should not adopt a plain ARQ scheme, which drops corrupted packets. This conclusion can probably be stretched to the following extent: “regardless how badly corrupted these packets are, do not drop them”.

[pic]

(a)

[pic]

(b)

[pic]

(c)

Figure 1: Three channel models that represent (a) the conventional ARQ (erasure) channel; (b) a cross-layer design (CLD) hybrid error-erasure channel; and (c) a CLD with side-information (CLDS) channel. Here, “residue errors” are errors that are not corrected by the physical layer and hence are observed at the MAC/link layer.

From the above discussion, the cross-layer design with side-information (CLDS) protocol model is the best among the three models shown in Figure 1. The CLDS protocol employs a hybrid error-erasure channel coding scheme to jointly correct errors in corrupted packets and recover lost packets (due to errors in the header) [5][6]. Such cross-layer design protocols can be further enhanced by reducing the number of packet drops due to errors in the header. Novel methods based on relatively simple detection and estimation theoretic tools can be adopted to “estimate” a corrupted header of a packet and hence determine if that packet belongs to the receiver or not [12]-[16]. Under such strategy, if the probability of “the packet belongs to the reciever” is very close to one, the receiver would keep the packet. These header detection/estimation techniques can benefit greatly from accurate modeling of the MAC channel under consideration [17]-[22].

In summary, one can achieve significant improvements over conventional ARQ methdos by employing cross-layer designs that are based on (i) preserving corrupted packets (i.e., not dropping them due to errros in the payload); (ii) employing a form of error-correction and erasure-recovery capabilities at a higher layer; and (iii) utilizing header detection/estimation to reduce the number of packet drops due to errors in the header.

4. Fixing the problem at the source: The next generation wireless link-layer

The above discussion of cross-layer designs that can achieve improvements over traditional ARQ based link-layers begs for the following question: why not fixing the problem at the source? In particular, we have seen that the combination of keeping (not dropping) corrupted packets in conjunction with some form of channel coding can provide significant improvements. A natural question is, why not include a channel coding based scheme within the link-layer that can achieve reliable communication? By following such strategy, all types of traffic and data (not only realtime multimedia) can benefit from simple and robust link-layer architecture, while preserving the integrity of the link-layer itself and the higher layers above it. This new thinking of a reliable link-layer that is based on a channel coding scheme with feedback was a first major step toward a next generation wireless link-layer. This major step led to a new family of reliable wireless link-layer protocols that we refer to as Packet Embedded Error Control (PEEC) protocols [2][3][4].

The second important question that can be raised is the following: can the link-layer not only provide reliable communication, but also provide stability for higher layers? As mentioned above, stability implies that higher layers should not be starved for data at any time during a session. This concept can be applied to both realtime and non-realtime data [1]. For one thing, realizing a reliable and stable link layer can virtually eliminate all issues with the well-known wireless TCP problem, which have occupied the attention of many leading research efforts mainly due to the unreliable and inefficient nature of the current ARQ based link-layer. Below, we outline the main features of the PEEC (reliable link-layer) and ACE (reliable and stable link-layer) protocol architectures.

1. A reliable wireless link-layer: Packet Embedded Error Control (PEEC) protocol

PEEC is based on the simple idea that each packet should protect itself with error correcting codes embedded in it. If the link-layer at the receiver side can correct all errors, the packet is moved to the higher layers; otherwise, the corrupted packet is kept in a buffer at the link-layer receiver while waiting for more redundant bits from the transmitter. Hence, there is a feedback mechanism between the receiver and transmitter that is similar to the feedback currently used by wireless link-layer protocols. In this case, however, the feedback can be used to request additional redundancy and can also be used to adjust the channel coding rate by the transmitter.

Consequently, there are two types of redundant symbols that can be carried in each packet: Type-I parity and Type-II parity symbols. Type-I parity are redundant symbols for the current packet that is being transmitted. Meanwhile, Type-II parity are redundant symbols for previously transmitted packets that are waiting at the receiver buffer (since they could not be corrected based on their own Type-I parity symbols). The basic architecture and a simple scenario of the PEEC protocol is shown in Figure 2.

[pic]

Figure 2 The Packet Embedded Error Control (PEEC) protocol.

[pic][pic]

ACE Sender ACE Receiver

Figure 3 design architecture of the ACE protocol.

2. A reliable and stable wireless link-layer: The Automatic Code Embedding (ACE) Framework

ACE is built on the reliable PEEC protocol described above. The most basic type of wireless link-layer communication is the contention-free point-to-point communication comprising a single type of traffic flow. In this communication scheme, the sender has a single task which is to transmit information packets reliably to the receiver. However, depending on the nature of the traffic flow (realtime or non-realtime), the sender should avoid throughput instability at the receiver. In [1], we proposed a paradigm shift where both reliability and stability are ensured using an Automatic Code Embedding (ACE) wireless link-layer protocol. An important conclusion of this work is that various traffic demands (in terms of reliability and stability requirements) can be met using a packet-by-packet code embedding rate constraint that is independent of traffic type. Our results show the feasibility of designing stable and reliable link layer over 802.11 channels [1]; and more importantly provide clear evidence of the feasibility of achieving significantly improved throughput by using this type of link-layer.

The design architectures of the ACE sender and receiver are illustrated in Figure 3. The ACE sender has two components. The first component is Channel State Prediction where the link-layer wireless channel condition for the next transmission interval is predicted based on the receiver feedback (provided to the sender by the acknowledgment packet (ACK)). The second component is Parity Allocation where a new codeword is generated and appropriate numbers of parity bits are added to a packet for the next transmission. On the other hand, upon the reception of the link-layer packet, an ACE receiver first attempts to decode a codeword embedded in the packet. If the decoding is successful, the information symbols in the codeword are immediately sent to the higher layer. But if the decoding fails, then the codeword is sent to the receiver buffer for future recovery.

The decoding operations and buffer management of the ACE receiver are performed in a Packet Decoding and Buffer Management component. In addition, the second component in the ACE receiver, Channel State Estimation, is designed to estimate the channel condition by utilizing the physical and link-layer side-information embedded in the received packet (channel state inference). It is important to note that accurate estimation and prediction of the channel condition has a critical impact on the performance of the ACE framework. This is due to the fact that ACE employs Low-Density-Parity-Check (LDPC) codes for decoding link-layer packets, and LDPC codes use a soft decision decoding (an iterative belief propagation method) which requires a knowledge of channel bit error rate (BER). Therefore, it is essential to identify practically observable variables, which can be used for reasonably robust channel state inference/prediction (CSI/CSP).

5. Conclusion

In this article, we provided an overview of a variety of solutions that have been developed recently to overcome the shortcomings of the ARQ based wireless link layer. These solutions range from packet combining approaches to cross-layer methods. We believe that all of these efforts should be utilized to develop a new framework of wireless link layer. This new link layer should provide both reliability and stability for higher layers (instead of shifting the problem to the higher layers). More importantly, the research, academic and industrial communities have to collaborate to develop a new wireless link-layer standard that harness all of the advancements made in this area over the past decade.

References

1] S. Soltani, K. Misra, and H. Radha, “On Link-Layer Reliability and Stability for Wireless Communication,” ACM MOBICOM, 2008.

2] S. Soltani, H. Radha, “PEEC: A Channel-Adaptive Feedback-Based Error Control Protocol for Wireless MAC Layer," IEEE JSAC Special Issue on Exploiting Limited Feedback in Tomorrows Wireless Communication Networks, 26(8): 1376-1385 (2008).

3] Sohraab Soltani and Hayder Radha, “Delay Constraint Error Control Protocol for Real-Time Video Communication,” IEEE Transactions on Multimedia, Volume 11, Issue 4, June 2009 Page(s):742 – 751.

4] Sohraab Soltani and Hayder Radha, "Performance Evaluation of Error Control Protocols over Finite-State Markovian Channels", Proceedings of the Conference of Information Sciences and Systems (CISS’08), Princeton University, NJ, USA, March 2008.

5] Shirish Karande and Hayder Radha, "Hybrid Erasure-Error Protocols for Wireless Video," IEEE Transactions on Multimedia, vol. 9, no. 2, pp. 307 – 319, February 2007.

6] Shirish Karande and Hayder Radha, “The Utility of Hybrid Error Erasure LDPC (HEEL) Codes for Wireless Multimedia," IEEE International Conference on Communications (ICC), May 2005.

7] Shirish Karande and Hayder Radha, “Does Relay of Corrupted Packets Lead to Capacity Improvement?," IEEE Wireless Communications and Networking Conference (WCNC), March 2005.

8] Syed Ali Khayam, Shirish S. Karande, Michael Krappel, and Hayder Radha, "Cross-Layer Protocol Design for Real-Time Multimedia Applications over 802.11b Networks," IEEE International Conference on Multimedia and Expo (ICME), July 2003.

9] Y. Cho, S. Karande, K. Misra, H. Radha, J. Yoo, and J. Hong, "On Channel Capacity Estimation and Prediction for Rate Adaptive Wireless Video," IEEE Transactions on Multimedia, vol. 10, no. 7, Nov. 2008.

10] S. Karande, S. A. Khayam, Y. Cho, K. Misra, H. Radha, J. Kim and J. Hong, “On Channel State Inference and Prediction Using Observable Variables in 802.11b Networks," IEEE International Conference on Communications (ICC), Glasgow, UK, June 2007.

11] Shirish Karande, Utpal Parrikar, Kiran Misra, and Hayder Radha, “Utilizing Signal to Silence Ratio indications for improved Video Communication in presence of 802.11b Residue Errors," IEEE International Conference on Multimedia & Expo (ICME), July 2006.

12] Syed A. Khayam and Hayder Radha, “Maximum-Likelihood Header Estimation: A Cross-Layer Methodology for Wireless Multimedia,” IEEE Transactions on Wireless Communications, vol. 6, no. 11, pp. 3946-3954, November 2007.

13] Syed Ali Khayam and Hayder Radha, "Comparison of Conventional and Cross-Layer Multimedia Transport Schemes for Wireless Networks," Springer Journal of Wireless Personal Communications (WPC), pages(s) 535-548, July 2009.

14] Syed Ali Khayam, Shirish Karande, Muhammad Usman Ilyas, and Hayder Radha, "Header Detection to Improve Multimedia Quality over Wireless Networks," IEEE Transactions on Multimedia, vol. 9, no. 2, pp. 377-385, February 2007.

15] Syed Ali Khayam, Shirish Karande, Muhammad Usman Ilyas, and Hayder Radha, “Improving Wireless Multimedia Quality using Header Detection with Priors," IEEE International Conference on Communications (ICC), June 2006.

16] Syed Ali Khayam, Muhammad U. Ilyas, Klaus Pцrsch, Shirish Karande, and Hayder Radha, “A Statistical Receiver-based Approach for Improved Throughput of Multimedia Communications over Wireless LANs," IEEE International Conference on Communications (ICC), May 2005.

17] Syed A. Khayam and Hayder Radha, “Constant-Complexity Models for Wireless Channels," IEEE INFOCOM, April 2006. 9, no. 2, Feb 2007.

18] Syed Ali Khayam, Hayder Radha, Selin Aviyente, and John R. Deller, Jr., “Markov and Multifractal Wavelet Models for Wireless MAC-to-MAC Channels,” Elsevier Performance Evaluation Journal, vol. 64, no. 4, pp. 298-314, May 2007.

19] Shirish Karande, U. Parrikar, Kiran Misra, and Hayder Radha, “On Modeling of 802.11b Residue Errors," Conference on Information Sciences & Systems (CISS), March 2006.

20] Syed Ali Khayam and Hayder Radha, “Linear-Complexity Models for Wireless MAC-to-MAC Channels," ACM/Kluwer Wireless Networks (WINET) Journal - Special Issue on Selected Papers from MSWiM’03, vol. 11, no. 5, pp. 543-555, September 2005.

21] Syed Ali Khayam, Selin Aviyente, and Hayder Radha, “On Long-Range Dependence in High-Bitrate Wireless Residual Channels," Conference on Information Sciences and Systems (CISS), March 2005.

22] Syed Ali Khayam and Hayder Radha, “Markov-based Modeling of Wireless Local Area Networks," ACM Mobicom International Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM), September 2003.

23] Syed Ali Khayam, Shirish Karande, Hayder Radha, and Dmitri Loguinov, “Analysis and Modeling of Errors and Losses over 802.11b LANs for High-Bitrate Real-Time Multimedia," EURASIP Signal Processing: Image Communication, vol.18, no.7, pp. 575-595, August 2003.

24] Shirish Karande, Syed Ali Khayam, Michael Krappel, and Hayder Radha, “Analysis and Modeling of Errors at the 802.11b Link-Layer," IEEE International Conference on Multimedia and Expo (ICME), July 2003.

25] Muhammad U. Ilyas, and Hayder Radha, “Measurement Based Analysis and Modeling of the Error Process in IEEE 802.15.4 LR-WPANs,” Proceedings of the 27th IEEE International Conference on Computer Communications (INFOCOM’08), Phoenix, AZ, United States, April, 2008.

26] Muhammad U. Ilyas, Moonseong Kim, and Hayder Radha , "Reducing Packet Losses in Networks of Commodity IEEE 802.15.4 Sensor Motes Using Cooperative Communication and Diversity Combination," Proceedings of the 28th IEEE International Conference on Computer Communications (INFOCOM'09), Rio de Janeiro, Brazil, April 19 - 25, 2009.

27] D. Aguayo, J. Bicket, S. Biswas, G. Judd, and R. Morris. “Link-level Measurements from an 802.11b Mesh Network”. In SIGCOMM, 2004.

28] J. G. Kim and M. M. Krunz, “Delay analysis of selective repeat ARQ for a Markovian source over a wireless channel," IEEE Trans. Veh. Technol., vol. 49, no. 5, pp. 1968–1981, Sep. 2000.

29] J. G. Kim and M. M. Krunz, “Delay analysis of selective repeat ARQ for a Markovian source over a wireless channel," IEEE Trans. Veh. Technol., vol. 49, no. 5, pp. 1968–1981, Sep. 2000.

30] P. S. Sindhu, “Retransmission error control with memory", IEEE Transactions on Communications, vol. COM-25, no. 5, pp. 473.479, May 1977.

31] S. S. Chakraborty, E. Yli-Juuti, and M. Liinaharja, “An adaptive ARQ scheme with packet combining," IEEE Communications Letters, vol. 2, no. 7, pp. 200.202, July 1998.

32] M. Gidlund, “Receiver-based packet combining in IEEE 802.11a wireless LAN," in Proc. IEEE Radio and Wireless Conference (RAWCON), August 2003, pp. 47.50.

33] Y. Liang and S. S. Chakraborty, “ARQ and packet combining with post-reception selection diversity," in Proc. 60th IEEE Semiannual Vehicular Technology Conference (VTC Fall), 2004.

34] Q. Zhang and S. A. Kassam, “Hybrid ARQ with selective combining for fading channels," IEEE Journal on Selected Areas in Communications, vol. 17, no. 5, pp. 867.874, May 1999.

35] T. W. A. Avudainayagam, J.M. Shea and L. Xin. Reliability Exchange Schemes for Iterative Packet Combining in Distributed Arrays. Proc. of the IEEE WCNC, volume 2, pages 832-837, 2003.

36] S. S. Chakraborty, E. Yli-Juuti, and M. Liinaharja. An ARQ Scheme with Packet Combining. IEEE Comm. Letters, 1998.

37] H. Yomo, S. S. Chakraborty, and R. Prasad, “IEEE 802.11 WLAN with Packet Combining", International Conference on Computer and Device 2004 (CODEC-04), January, 2004, Kolkata, India

38] Grace Woo, Pouya Kheradpour, Dawei Shen, and Dina Katabi, “Beyond the Bits: Cooperative Packet Recovery Using Physical Layer Information," ACM MOBICOM, 2007.

39] K. C. Lin, N. Kushman, and D. Katabi. Ziptx: Harnessing partial packets in 802.11 networks. In Mobicom’08, September 2008.

40] E. Soljanin. Hybrid ARQ in Wireless Networks. DIMACS Workshop on Network Inform. Theory, March 2003.

41] E. C. Strinati, S. Simoens, and J. Boutros, “Performance evaluation of some Hybrid ARQ schemes in IEEE 802.11a Networks“, IEEE VTC, 4(4):2735- 2739, 2003

42] G. Caire and D. “Tuninetti. The throughput of Hybird-ARQ protocols for the Gaussion collision channel”. IEEE Trans. Inform. Theory, July 2001.

43] S. Cheng and M. C. Valenti. “Macrodiversity packet combining for the ieee 802.11a uplink”. In IEEE WCNC, 2005.

44] M. C. Valenti. “Improving uplink performance by macrodiversity combining packets from adjacent access points”. IEEE WCNC, pages 636– 641, 2003.

45] S. Lin and D. J. Costello Jr., “Error Control Coding: Fundamentals and Applications," Englewood Cliffs, NJ: Prentice-Hall, 1983.

46] S. Lin and P. S. Yu, “A hybrid ARQ scheme with parity retransmission for error control of satellite channels," IEEE Trans. Commun., vol. 30, pp. 1701–1719, July 1982.

47] Y. Wang and S Lin, “A modified selective-repeat type-II hybrid ARQ system and its performance analyses," IEEE Transactions on Communications 31(5), pp. 124-133, 1983.

48] G. Caire and D. Tuninetti, "The throughput of Hybird-ARQ protocols for the Gaussion collision channel", IEEE Trans. Inform. Theory, 47:1971–1988, July 2001.

49] D. Chase, “Code-combining: A maximum likelihood decoding approach for combining an arbitrary number of noisy packets,” IEEE Trans. Commun., vol. COMM-33, no. 5, pp. 385в“393, May 1985.

50] J. C. Bolot, S. Fosse-Parisis, and D. Towsley, “Adaptive FEC-based error control for internet telephony,” in Proc. IEEE INFOCOM ’99, 1999, vol. 3, pp. 1453–1460.

51] K. Jamieson and H. Balakrishnan. “PPR: Partial Packet Recovery for Wireless Networks”. In ACM SIGCOMM, Kyoto, Japan, August 2007.

52] M. Gidlund, “Receiver-based packet combining in IEEE 802.11a wireless LAN," in Proc. IEEE Radio and Wireless Conference (RAWCON), August 2003, pp. 47.50.

53] T. W. A. Avudainayagam, J.M. Shea and L. Xin. “Reliability Exchange Schemes for Iterative Packet Combining in Distributed Arrays.” Proc. of the IEEE WCNC, volume 2, pages 832-837, 2003.

54] Y. Liang and S. S. Chakraborty, “ARQ and packet combining with post-reception selection diversity," in Proc. 60th IEEE Semiannual Vehicular Technology Conference (VTC Fall), 2004.

55] H. Yomo, S. S. Chakraborty, and R. Prasad, “IEEE 802.11 WLAN with Packet Combining", International Conference on Computer and Device 2004 (CODEC-04), Kolkata, India, January, 2004.

56] Grace Woo, Pouya Kheradpour, Dawei Shen, and Dina Katabi, “Beyond the Bits: Cooperative Packet Recovery Using Physical Layer Information," ACM MOBICOM, 2007.

57] K. C. Lin, N. Kushman, and D. Katabi. “Ziptx: Harnessing partial packets in 802.11 networks,” In Mobicom’08, September 2008.

58] IEEE Computer Society LAN MAN Standard Committee, “Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications," IEEE Std. 802.11-1999, New York, 1999.

59] The Lightweight User Datagram Protocol (UDP-Lite) .

[pic]

Hayder Radha received the Ph.M. and Ph.D. degrees from Columbia University in 1991 and 1993, the M.S. degree from Purdue University in 1986, and the B.S. degree (with honors) from Michigan State University (MSU) in 1984 (all in electrical engineering). Currently, he is a Professor of Electrical and Computer Engineering (ECE) at MSU, the Associate Chair for Research of the ECE Department, and the Director of the Wireless and Video Communications Laboratory. Professor Radha was with Philips Research (1996-2000), where he worked as a Principal Member of Research Staff and then as a Consulting Scientist in the Video Communications Research Department. He was a Member of Technical Staff at Bell Laboratories where he worked between 1986 and 1996 in the areas of digital communications, image processing, and broadband multimedia.

 

Professor Radha is a Fellow of the IEEE, and he was appointed as a Philips Research Fellow in 2000 and a Bell Labs’ Distinguished Member of Technical Staff in 1992. He is an elected member of the IEEE Technical Committee on Image, Video, and Multidimensional Signal Processing (IVMSP) and the IEEE Technical Committee on Multimedia Signal Processing (MMSP). He served as Co-Chair and Editor of a Video Coding Experts Group of the International Telecommunications Union – Telecommunications Section (ITU-T) between 1994-1996. He served on the Editorial Board of IEEE Transactions on Multimedia and the Journal on Advances in Multimedia. He also served as a Guest Editor for the special issue on Network-Aware Multimedia Processing and Communications of the IEEE Journal on Selected Topics in Signal Processing. Professor Radha is a recipient of the Bell Labs Distinguished Member of Technical Staff Award, the AT&T Bell Labs Ambassador Award, AT&T Circle of Excellence Award, the MSU College of Engineering Withrow Distinguished Scholar Award for outstanding contributions to engineering, and the Microsoft Research Content and Curriculum Award. He is a recipient of National Science Foundation (NSF) grants under the Theoretical Foundation, Communications Research, Research in Networking Technology and Systems (NeTS), and Cyber-Trust programs.  His current research areas include wireless communications and networking, video communications, image processing, compressed sensing, sensor networks, and network coding. He has more than 150 peer-reviewed papers and 30 US patents in these areas.

TECHNOLOGY ADVANCES

Four Suggestions for Research on Multimedia QoE Using Subjective Evaluations

Greg Cermak, Verizon Labs, USA

gregory.cermak@

The suggestions that follow are based on my experience doing consumer research and on my contact with video and multimedia practitioners in T1A1, VPQM, VQEG, and QoMEX. There have been many hours of debates about consumer QoE and how to measure it.

Suggestion 1: Measure the consumer experience, but do not try to understand the consumer. The root cause of much distress and fruitless discussion regarding subjective evaluation may be that engineers have the idea that improving customer experience requires understanding why consumers act as they do. Understanding why consumers act as they do is a practical impossibility, and is unnecessary for the kind of product and service evaluations reported in the organizations listed above. Instead, measuring consumers’ behavior while they are interacting with products or product descriptions provides observable data about the products, and avoids potential sources of argument and confusion.

The distinction between (a) understanding why people perceive, feel, and act as they do, and (b) measuring behavior when a person interacts with some stimulus may seem overly subtle. It has been the subject of academic debates at least since the beginnings of experimental psychology. The important point is that the distinction has practical consequences for doing research on quality of customer experience (QoE).

Understanding why a single person judges something pleasant or unpleasant (e.g., a packet loss artifact is more/less annoying than a compression artifact) can require much time and effort on the part of the experimenter. If the testing program has on the order of two dozen consumer “subjects,” the amount of time required to understand the judgments of each one is a practical impossibility in most industrial labs. Furthermore, even if very detailed data on two dozen subjects were obtained, finding common threads among them can be difficult and subjective; seeing the common threads depends on the judgment of the experimenter.

The alternative proposed here is to concentrate on properties of the product first, properties of the interaction of the customer with the product second, and properties of the customer last or not at all. For example, in the case of multimedia, the idea would be to spend most time and effort creating a set of multimedia examples that capture the important elements of multimedia products/services of interest. (In the case of video quality, the important elements such as bit rate and packet loss are known in advance.) Expose consumers to the multimedia examples, and collect some sort of more or less objective rating of the QoE for each example. If the study is designed carefully (see below), the result will be that the relative importance of each of the products’ elements will be revealed. Also, the relative improvement in judged QoE may be revealed as each individual element is improved separately. The engineer will have clear direction for how to improve the overall QoE of the product.

Naturally, there are exceptions to every rule, and the author himself has expended some effort to understand individual differences in perception of QoE for VoIP and videoconferencing [1, 2]. However, in general, and especially for group projects, keep in mind the suggestion that good project results can be achieved without trying to understand the consumer in any depth. Understanding the consumer is actually useful as background information; it makes it possible to proceed on a consumer research project confidently and efficiently. However, background understanding of the consumer does not usually lead to the kind of actionable results that multimedia product engineers need. Some good sources for background information on consumer needs regarding communication are [3, 4, 5]; and for entertainment [6]. A current example of work to understand the end-user is [7].

Suggestion 2: Think of products, services, and lab stimuli as multiattribute “objects.” Of course, it is obvious that products and services are complicated, and that overall QoE depends on many elements. Nevertheless, it often happens in meetings that discussions get turned so that only the influence of a single attribute, or of a few attributes, are considered. A multiattribute framework may counteract this tendency. Also, by thinking of products as multiattribute objects, machinery for experimental design and data analysis follows naturally.

The kind of argument that suffers from lack of a multiattribute perspective might go like this: “If we improve attribute X of our product/service, then QoE will certainly go up.” In fact, that is only true if it is possible to improve attribute X without negatively affecting other attributes, and if QoE increases monotonically with increases in attribute X. Very often changing one attribute in fact also changes other attributes, and the effect of the other attributes compensates for the improvement in the first attribute. For practical use in industrial lab situations, we are assuming a “compensatory” model of utility. Compensatory models are not always as accurate as “noncompensatory” models, but they are almost always good approximations, and they provide a much more experimenter-friendly context for discussing experimental designs and analyses.

That is, a good general-purpose approximation is that overall utility or QoE follows a weighted additive form: QoE = b0 + b1x1 + b2x2 + … + bnxn + e where xi refers to a measurement of the ith attribute, bi refers to the relative weight of the attribute in contributing to the overall utility or QoE of the product/service, and e refers to random error. That is, different attributes have potentially different effects, but they all count (unless the weight is zero). And, they can all be accounted for in this general model. Further, the effects are potentially separable, just as the b weights are distinct.

Another kind of argument that can be forestalled with the multiattribute model might go like this: “We can’t tell what the effect of the codec is, because it also depends on the scene.” True enough for a single scene and codec, but if one has a collection of scenes processed by codecs, then the effect of the scenes can be separated from the effect of the codec(s). Which brings up the topic, referred to earlier, of experimental machinery and data analysis.

Experimental design and data analysis have gone hand-in-hand since the time of Ronald Fisher (see Wikipedia article). Both are based on a generalization of the additive compensatory model sketched above. In order to actually achieve a situation (such as a study of QoE) in which the simple additive model applies, it is necessary to arrange for the elements of the stimuli (e.g., multimedia recordings) to be logically independent. That is, in the collection of stimuli as a whole, the various elements that vary need to do so independently, and all other elements should be held constant. This situation is achieved in a “full factorial” experimental design in which every combination of the elements or attributes is present. Such a design is expensive and not strictly necessary; fractional factorial designs exist, and random sampling of attribute combinations produces good results. See Wikipedia on “factorial design” and “fractional factorial design.”

The usual statistical tool for analyzing data from designed experiments is analysis of variance (ANOVA) or its cousin, the general linear model. The main point is that thinking about products or stimuli as collections of attributes with additive effects leads both to a way of talking about the products, and to a way of designing studies and analyzing data. It also leads to a way of thinking about consumers and research subjects (below). Professional societies of interest include the Psychometric Society, the Society for Judgment and Decision Making, and the INFORMS Society for Marketing Science.

Suggestion 3: Think of consumers and research subjects as having different weights for the importance of product attributes. In the expression for the additive attribute model above, the attribute weights depend only on the attribute, not on the individual consumer or subject. Think of these weights as being the average across a sample of consumers. However, one can also think of the weights as being particular to the individual subjects (representable by adding another subscript). After all, consumers notoriously do weight different product attributes differently – for example, I may weight fuel economy very highly and you may weight acceleration very highly. These differences in taste are often quite stable, not due to inconsistent responding or random noise in the data.

The multiattribute way of thinking of consumers and product attributes immediately accommodates individual differences among consumers. Various data analysis tools for representing individual differences in consumers’ preferences have been adopted by the marketing science community over the past 40 years. Some key words are individual differences multidimensional scaling, preference mapping, and latent class analysis. Examples of the application of multiattribute methods to consumer research can be found in Paul E. Green’s work [8].

If consumers differ in their tastes, and if we have a natural way to represent those differences, then there is less reason to do research with a small homogeneous sample of consumers – such as college students or lab employees. Of course, it could be that a particular product is intended only for a very restricted segment of the market, such as college students or lab employees, but that is rarely the case. If larger and more representative samples of consumers are considered at least theoretically desirable, then the practical issues arise: how to recruit them, how to pay them, how to deal with them in the lab. Search on Green Book for recruiting and paying subjects. Consider hiring experimental psychologists or human factors specialists for dealing with human subjects in the lab.

Another consequence of thinking of consumers as having legitimate reasons for being different from each other – and of these differences being captured in a multiattribute model – is that there is less reason for discarding consumers’ data. Consider the model

QoE = b0j + b1jx1 + b2jx2 + … + bnjxn + ej where the subscript j refers to an individual consumer, the b’s again refer to weights, the x’s refer to values of attributes such as bit rate, and e refers to random error. Then two consumers’ data can correlate poorly either because their error terms e are very large, or because their b weights are quite different. It is the large error term that indicates “bad” data, not possible differences in b-weights. Proper experimental design can make it possible to distinguish between the two cases with ANOVA. The correlation tests that are frequently used in video quality research do not distinguish between the two cases [9].

Suggestion 4: Do not spend your time arguing about the proper rating scale. No rating scale is the “correct” rating scale [10], and most rating scales produce results that are reasonable approximations of each other, especially for aggregate data [11]. Better uses of effort are in producing multi-media stimuli according to a good orthogonal design, or in understanding more about the processes involved in interpersonal communication [e.g., 3, 4, 5].

Reference:

[1] Cermak, G. W. “Verbal descriptors for VoIP speech sounds.” International Journal of Speech Technology, 7, 81-91, 2004.

[2] Cermak, G. W. “Multimedia quality as a function of bandwidth, packet loss, and latency.” International Journal of Speech Technology, 8, 259-270, 2005.

[3] Clark, H. H. Using Language. Cambridge University Press, 1996.

[4] Nofsinger, R. E. Everyday Conversation. Sage Publications, 1991.

[5] Short, J., Williams, E., & Christie, B. The Social Psychology of Telecommunications. New York: Wiley, 1976.

[6] Cermak, G. W.  “An approach to mapping entertainment alternatives.”  In R. R. Dholakia, N. Mundorf, and N. Dholakia (Eds.), New Infotainment Technologies in the Home (pp. 115-134).  Mahwah, NJ: L. Erlbaum Associates, 1996.

[7] Aaltonen, V., Takatalo, J., Hakkinen, J., Lehtonen, M., Nyman, G., and Schrader, M. “Measuring mediated communication experience.” First International Workshop on Quality of Multimedia Experience. San Diego, July, 2009.

[8] ()

[9] K. Brunnstrom, G. Cermak, D. Hands, M. Pinson, F. Speranza, and A. Webster. Draft Final Report From the Video Quality Experts Group On the Validation of Objective Models of Multimedia Quality Assessment, Phase I. ©2008 VQEG.

[10] Shepard, R.N. “Psychological relations and psychophysical scales: On the status of ‘direct’ psychophysical measurement.” Journal of Mathematical Psychology, 24, 21-57, 1981.

[11] Cox, E. P.  “The optimal number of response alternatives for a scale: a review.”  Journal of Marketing Research, vol. XVII, Nov., 1980, 407-422.

[pic]

Gregory W. Cermak received a B.A. in psychology from the University of California, Santa Barbara, in 1968, and a Ph.D. in psychology from Stanford University in 1972.

He worked at the General Motors Research Laboratories from 1972 through 1986, at Information Resources, Inc. from 1987 to 1988, and at the GTE/Verizon Laboratories in Waltham, MA from 1988 to the present. He has published in psychophysics, acoustics, air quality, market research, speech quality, and video quality. He has recently been working with the Video Quality Experts Group on validating objective measures of video quality.

From Cross-Layer Optimization to Cognitive Source Coding for Multimedia Transmission: Adapting Content Formats to the Network

Simone Milani, University of Padova, Italy

simone.milani@dei.unipd.it

The advent of wireless multimedia communications has brought to evidence that the traditional configurations of network protocol stacks are not adequate for delivering multimedia contents over heterogeneous and time-varying networks. The massive amount of data that characterizes multimedia signals, together with the strict Quality of Service (QoS) requirements on bandwidth, delay, and delay jitters, makes difficult to provide multimedia contents to the end users at a satisfying quality. These inconveniences are further exacerbated by the introduction of wireless channels, which are characterized by high data loss rates and varying transmitting conditions, and the limits of traditional transmission protocols and infrastructures.

In order to mitigate these problems, several optimization strategies have been proposed to increase the QoS level of multimedia transmissions by adapting each layer to the transmitted information and the network conditions. However, the modularization of traditional layered architectures could lead to significant inefficiencies depending on a blind set-up of the transmission parameters [1].

During the last years a considerable research effort has been made to investigate efficient cross-layer (CL) solutions that aim at maximizing the quality of the video signal transmitted to the end user by allowing a synergetic interaction between different protocol layers [1]. The main goal of these architectures is to improve the performance of the transmission by jointly tuning the parameters of each layer according to holistic algorithms.

Some of the proposed solutions aim at jointly tuning the parameters of source and channel coders according to the transmission conditions and the characteristics of the video sequence [2] Other solutions [3, 4] differentiate the priorities and the retransmission policies of packets according to the significance of the contained data in the decoding process. Moreover, other solutions accurately control the transmission power in order to vary the Signal-to-Noise Ratio as needed [5].

All these solutions can be jointly combined to optimize the final performance [6], but the computational complexity becomes critical because of the large number of parameters that are involved in the optimization process. Moreover, most of the proposed solutions are focused on finding the optimal parameter setting given a fixed set of source and channel coding solutions at the different layers.

In this scenario, Cognitive Source Coding schemes widen the set of possible cross-layer solutions and significantly improve the performance of traditional schemes. The term Cognitive Source Coding (CSC) has been introduced in analogy with Cognitive Radio [7] architectures adopted for radio transmissions. As defined by Haykin in [8], “Cognitive radio is an intelligent wireless communication system that is aware of its surrounding environment (i.e., outside world), and uses the methodology of understanding-by-building to learn from the environment and adapt its internal states to statistical variations in the incoming RF stimuli by making corresponding changes in certain operating parameters (e.g., transmit-power, carrier-frequency, and modulation strategy) in real-time.”

Similarly, it is possible to change the coding format of multimedia signals according to the available network resources in order to transmit it effectively to a remote user. CSC schemes receive a description of the network conditions from the lowest layers in the protocol stacks (i.e. available bandwidth, number of transmission paths, packet loss probability, average delay, jitters, etc) and adopt the most appropriate source coding solution from a set of possible choices. As a matter of fact, CSC schemes need to be designed in appropriate way in order to satisfy specific requirements:

• providing robust multimedia communications anywhere and anytime while granting a certain level of Quality-of-Experience (QoE) to the end user;

• using effectively the available transmission capacity;

• limiting the required computational load, the involved hardware resources, and the complexity of the transmission architecture.

Like for Cognitive Radio systems, reconfigurability is one of the key elements that permit satisfying these requirements. The possibility of orchestrating the different functional blocks of the coding architecture permits improving the effectiveness of the transmission in terms of perceptual quality experienced by the end user. The control unit of CSC solutions enables and reconfigures the available functional blocks selecting those that prove to be the most suitable to the network status and to the characteristics of the signal to be transmitted. As a matter of fact, an effective CSC scheme needs to identify the key elements that are common to the implemented source coding solutions and design an effective interconnecting network that can be easily reconfigured. Moreover, efficient optimization algorithms must adapt the values of the coding parameters to the features of the coded video signal and to the available data rate. In this way, the coded bit stream fully exploits the transmission capacity available to the terminal avoiding bandwidth waste or underutilization. In the end, reconfigurability permits limiting the size of coding devices while increasing the number of implemented coding solutions.

From these premises, the set of CSC schemes lies within the range of cross-layer coding solutions, but at the same time, they differentiate for the fact that most of the CL solutions jointly tunes the transmission parameters at different layers without changing the structure of the involved coding architecture while CSC implies reconfiguring the architecture of the source coder depending on network status.

As an example, the solution in [9] reconfigures a standard H.264/AVC video coder in order to support both single description (SD) and multiple description (MD) coding. The proposed architecture switches between the SD coder and the MD coder according to the characteristics of the channel, which are inferred from a set of control messages received at MAC level.

Other examples are offered by those solutions that dynamically adopt traditional video coding and Distributed Video Coding (DVC) solutions in order to match both the video signal characteristics and the need for robust video coding. Many DVC schemes [10] adaptively combine Wyner-Ziv video coding with traditional non-predictive source coding depending on the reliability of the channel. Whenever multiple channels are available, it is possible to differentiate the adopted source coding solution according to packet loss probabilities measured from the data carried by RTCP packets [11]. In these cases, strong similarities between source coding solutions of different nature permit reusing a great amount of functional units, and therefore, the required device size and implementation costs are significantly reduced.

Following this trend, research is focusing on finding more efficient low-complexity CSC solutions that enable a stronger reuse of available units and maximizes the quality of the video sequence reconstructing at the decoder. Many efforts are concentrated on assigning the most appropriate source coding solution for given transmission conditions. Moreover, video designers are investigating effective optimization strategies that process the information about the state of the network to infer the most appropriate configuration. In the end, significant research work is also involved in identifying novel video coding schemes that can be easily integrated within the previous ones.

References

1] M. V. der Schaar and S. Shankar, “Cross-layer wireless multimedia transmission: challenges, principles, and new paradigms,” IEEE Trans. Wireless Commun., vol. 12, no. 4, pp. 50–58, Aug. 2005.

2] Q. Qu, Y. Pei, J. W. Modestino, X. Tian, and B. Wang, “Cross-Layer QoS Control for Video Communication over Wireless Ad Hoc Networks,” EURASIP Journal on Wireless Communications and Networking, vol. 5, no. 5,

pp. 743–756, Oct. 2005.

3] B. Girod and N. F¨arber, “Feedback-based error control for mobile video transmission,” Proc. of the IEEE, vol. 87, no. 10, pp. 1707–1723, Oct. 1999.

4] A. Ksentini, M. Naimi, and A. Gu´eroui, “Toward an improvement of H.264 video transmission over IEEE 802.11e through a cross-layer architecture,” IEEE Commun. Mag., vol. 44, no. 1, pp. 107–114, Jan. 2006.

5] Y. Eisenberg, C. E. Luna, T. Pappas, R. Berry, and A. K. Katsaggelos, “Joint Source Coding and Transmission Power Management for Energy Efficient Wireless Video Communication,” IEEE Trans. Circuits Syst. Video Technol. vol. 12, no. 6, pp. 411–424, Jun. 2002.

6] A. K. Katsaggelos, Y. Eisenberg, F. Zhai, R. Berry, and T. N. Pappas, “Advances in Efficient Resource Allocation for Packet-Based Real-Time Video Transmission,” Proc. of IEEE, vol. 93, no. 1, pp. 135–147, Jan. 2005.

7] J. Mitola and G. M. Jr., “Cognitive Radio: Making Software Radios More Personal,” IEEE Personal Commun. Mag., vol. 6, no. 6, pp. 13 – 18, Aug. 1999.

8] S. Haykin, “Cognitive Radio: Brain-Empowered Wireless Communications,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201 – 220, Feb. 2005, (Invited).

9] S. Milani, G. Calvagno, R. Bernardini, and P. Zontone, “Cross-Layer Joint Optimization of FEC Channel Codes and Multiple Description Coding for Video Delivery over IEEE 802.11e Links,” in Proc. of IEEE FMN 2008 (co-located with NGMAST2008), Cardiff, Wales, GB, Sep. 17 – 18, 2008, pp. 472 – 478.

10] F. Pereira, C. Brites, J. Ascenso, and M. Tagliasacchi, “Wyner-Ziv video coding: A review of the early architectures and further developments,” in Proc. of ICME 2008, Hannover, Germany, Jun. 23 – 26, 2008, pp. 625– 628.

11] S. Milani and G. Calvagno, “A Distributed Video Coding Approach forMultiple Description Video Transmission over Lossy Channels,” in Proc. of EUSIPCO 2009, Glasgow, Scotland, UK, Aug. 24 – 28, 2009.

[pic]

Simone Milani was born in Camposampiero (PD), Italy, in 1978. From the University of Padova, Italy, he received the Laurea degree in Telecommunication Engineering in 2002, and the Ph.D. degree in Electronics and Telecommunication Engineering in 2007. In 2006 he was a visiting Ph.D. student at the University of California-Berkeley under the supervision of prof. K. Ramchandran, while in 2007 he was a post-doc researcher at the University of Udine, Italy, collaborating with prof. R. Rinaldo. He has also worked with STMicroelectronics, Agrate Brianza, Italy as consulting engineer.

At the moment, he is enrolled as research associate at the University of Padova within the research project "Analysis and implementation of a scalable video coder for transmission over heterogeneous unreliable networks based on distributed coding principles" under the supervision of prof. Giancarlo Calvagno.

His main research topics are digital signal processing, source coding, joint source-channel coding, robust video transmission over lossy packet networks, distributed source coding, and cognitive source coding.

He is also a IEEE member of Information Theory and Signal Processing Societies and he has also been a reviewer for several magazines and international conferences.

Focused Technology Advances Series

Application Layer QoS Provisioning for Wireless Multimedia Networks with Cognitive Radios

F. Richard Yu, Carleton University, Canada

richard_yu@carleton.ca

Abstract

Most of previous work on wireless multimedia communication networks concentrates on lower layer quality of service (QoS), such as blocking probability, throughput and radio resource utilization, as design criteria. However, from a user’s point of view, application layer QoS, such as multimedia distortion, is more important than that at other layers. In addition, recent study shows that maximizing lower layer QoS does not necessarily benefit QoS at the multimedia application layer. The problem is more severe in cognitive radio (CR) networks, where CR-based secondary users would have a strictly lower QoS than radio services that enjoy guaranteed spectrum access. Therefore, it is necessary to take an integrated approach to jointly optimize application layer QoS for wireless multimedia communication networks.

1. Introduction

Recently, there has been significant growth in the use of wireless multimedia communication services. With the growing demand of resource-intensive multimedia applications in wireless networks, quality of service (QoS) provisioning is one of the major challenges in designing wireless multimedia communication networks.

ALTHOUGH SEVERAL SCHEMES HAVE RECENTLY BEEN PROPOSED FOR QOS PROVISIONING [1], [2], MOST PREVIOUS WORK CONCENTRATES ON LOWER LAYER QOS, SUCH AS BLOCKING PROBABILITY, THROUGHPUT AND RADIO RESOURCE UTILIZATION, AS DESIGN CRITERIA. AS A CONSEQUENCE, OTHER QOS MEASURES, SUCH AS DISTORTION FOR MULTIMEDIA APPLICATIONS, ARE MOSTLY IGNORED IN THE LITERATURE. HOWEVER, RECENT STUDY IN CROSS-LAYER DESIGN SHOW THAT THE SCHEMES THAT ARE OPTIMAL FROM LOWER LAYERS' PERSPECTIVE (E.G., MAXIMIZING THROUGHPUT) DO NOT NECESSARILY BENEFIT QOS AT THE APPLICATION LAYER FOR SOME MULTIMEDIA APPLICATIONS, SUCH AS VIDEOS [3]-[5]. MOREOVER, FROM A USER'S POINT OF VIEW, QOS AT THE APPLICATION LAYER IS MORE IMPORTANT THAN THAT AT OTHER LAYERS. THE PROBLEM IS MORE SEVERE IN COGNITIVE RADIO (CR) NETWORKS [6], WHERE CR-BASED SECONDARY USERS WOULD HAVE A STRICTLY LOWER QOS THAN RADIO SERVICES THAT ENJOY GUARANTEED SPECTRUM ACCESS. THEREFORE, IF THE APPLICATION LAYER QOS IS NOT CAREFULLY CONSIDERED IN WIRELESS MULTIMEDIA COMMUNICATION NETWORKS, THE PERCEIVED REDUCTION IN APPLICATION LAYER QOS MAY IMPEDE THE SUCCESS OF WIRELESS MULTIMEDIA TECHNOLOGIES WITH COGNITIVE RADIOS.

MULTIMEDIA APPLICATIONS SUCH AS VIDEO TELEPHONY, CONFERENCING, AND VIDEO SURVEILLANCE ARE BEING TARGETED FOR WIRELESS NETWORKS, INCLUDING CR NETWORKS. LOSSY VIDEO COMPRESSION STANDARDS, SUCH AS MPEG-4 AND H.264, EXPLOIT THE SPATIAL AND TEMPORAL REDUNDANCY IN VIDEO STREAMS TO REDUCE THE REQUIRED BANDWIDTH TO TRANSMIT VIDEO. COMPRESSED VIDEO COMPRISES OF INTRA- AND INTER-CODED FRAMES. THE INTRA REFRESHING RATE IS AN IMPORTANT APPLICATION LAYER PARAMETER [7]. ADAPTIVELY ADJUSTING THE INTRA REFRESHING RATE FOR ONLINE VIDEO ENCODING APPLICATIONS CAN IMPROVE ERROR RESILIENCE TO THE TIME VARYING WIRELESS CHANNELS AVAILABLE TO SECONDARY USERS IN CR NETWORKS.

IN THIS LETTER, WE CAN TAKE AN INTEGRATED DESIGN APPROACH TO JOINTLY OPTIMIZE APPLICATION LAYER QOS FOR MULTIMEDIA TRANSMISSION OVER COGNITIVE RADIO NETWORKS. BASED ON THE SENSED CHANNEL CONDITION, SECONDARY USERS CAN ADAPT THE INTRA REFRESHING RATE AT THE APPLICATION LAYER, IN ADDITION TO THE PARAMETERS AT OTHER LAYERS.

2. RATE-DISTORTION (R-D) MODEL FOR MULTIMEDIA APPLICATIONS

Highly compressed video data is vulnerable to packet losses where a single bit error may cause severe distortion [8]. This vulnerability makes error resilience at the video encoder essential. Intra update, also called intra refreshing, of macroblocks (MBs) is one approach for video error resilience and protection [9]. An intra coded MB does not need information from previous frames which may have already been corrupted by channel errors. This makes intra coded MBs an effective way to mitigate error propagation. Alternatively, with inter-coded MBs, channel errors from previous frames may still propagate to the current frame along the motion compensation path [10].

GIVEN A SOURCE-CODING BIT RATE RS AND INTRA REFRESHING RATE, WE NEED A MODEL TO ESTIMATE THE CORRESPONDING SOURCE DISTORTION DS. THE AUTHORS IN [7] PROVIDE A CLOSED FORM DISTORTION MODEL TAKING INTO ACCOUNT VARYING CHARACTERISTICS OF THE INPUT VIDEO, THE SOPHISTICATED DATA REPRESENTATION SCHEME OF THE CODING ALGORITHM, AND THE INTRA REFRESHING RATE. BASED ON THE STATISTICAL ANALYSIS OF THE ERROR PROPAGATION, ERROR CONCEALMENT, AND CHANNEL DECODING, A THEORETICAL FRAMEWORK IS DEVELOPED TO ESTIMATE THE CHANNEL DISTORTION, DC. COUPLED WITH THE R-D MODEL FOR SOURCE CODING AND TIME VARYING WIRELESS CHANNELS AN ADAPTIVE MODE SELECTION IS PROPOSED FOR WIRELESS VIDEO CODING AND TRANSMISSION.

WE WILL USE THE RATE-DISTORTION MODEL DESCRIBED IN [7] IN OUR STUDY. THE R-D MODEL FACILITATES ADAPTIVE INTRA-MODE SELECTION AND JOINT SOURCE-CHANNEL RATE CONTROL. THE TOTAL END-TO-END DISTORTION COMPRISES OF DS, THE QUANTIZATION DISTORTION INTRODUCED BY THE LOSSY VIDEO ENCODER TO MEET A TARGET BIT RATE, AND DC, THE DISTORTION RESULTING FROM CHANNEL ERRORS. FOR DCT-BASED VIDEO CODING, INTRA CODING OF A MB OR A FRAME USUALLY REQUIRES MORE BITS THAN INTER-CODING SINCE INTER CODING REMOVES THE TEMPORAL REDUNDANCY BETWEEN TWO NEIGHBORING FRAMES. INTER CODING OF MBS HAS MUCH BETTER R-D PERFORMANCE THAN INTRA MODE. DECREASING THE INTRA REFRESHING RATE DECREASES THE SOURCE DISTORTION FOR A TARGET BIT RATE. HOWEVER INTER CODING RELIES ON INFORMATION IN PREVIOUS FRAMES. PACKET LOSSES DUE TO CHANNEL ERRORS RESULT IN ERROR PROPAGATION ALONG THE MOTION-COMPENSATION PATH UNTIL THE NEXT INTRA CODED MB IS RECEIVED. INCREASING THE INTRA REFRESHING RATE DECREASES THE CHANNEL DISTORTION. THUS WE HAVE A TRADEOFF BETWEEN SOURCE AND CHANNEL DISTORTION WHEN SELECTING THE INTRA REFRESHING RATE. WE AIM TO FIND THE OPTIMAL INTRA REFRESHING RATE TO MINIMIZE THE TOTAL END-TO-END DISTORTION GIVEN THE CHANNEL BANDWIDTH AND PACKET LOSS RATIO.

3. MULTIMEDIA TRANSMISSION OVER COGNITIVE RADIO NETWORKS

The system time is slotted. At the beginning of a slot, the transmitter of secondary users will select a set of channels to sense. Based on the sensing outcome, the transmitter will decide whether or not to access a channel. If the transmitter decides to access a channel, some application layer parameters will be selected and the video content will be transmitted. At the end of the slot, the receiver will acknowledge the transfer by sending the perceived channel gain back to the transmitter. We will assume a system for real-time multimedia applications where packets are discarded if a primary user is using the slot or if the channel is not accessed.

4. SOLVING THE APPLICATION LAYER QOS PROVISIONING PROBLEM IN COGNITIVE RADIO NETWORKS

In wireless multimedia networks with cognitive radios, we need to determine the optimal policy for channel sensing selection, sensor operating point, access decision, and intra refreshing rate to minimize application layer distortion subject to the system probability of collision. With channel sensing and CSI errors, the system state cannot be directly observed. We formulate the whole system as a partially observable Markov decision process (POMDP). Deriving a single POMDP formulation for all policies under the probability of collision constraint would result in a constrained POMDP. However, constrained POMDPs require randomized policies to achieve optimality, which is often intractable. Therefore, we use the separation principle in [11] for the sensor operating point and the access decision. The spectrum sensor operating point is set such that the probability of miss detection of the busy channel used by primary users is the same as the required probability of collision.

AT THE BEGINNING OF THE SLOT, THE SYSTEM TRANSITIONS TO A NEW STATE. USING A POMDP DERIVED POLICY, A CHANNEL IS SELECTED FOR SPECTRUM SENSING. AN ACCESS DECISION IS THEN MADE BASED ON THE SENSING OBSERVATION. USING THE BELIEF OF THE CHANNEL STATE, AN INTRA REFRESHING RATE IS SELECTED. THE RECEIVER ACKNOWLEDGES THE TRANSFER BY SENDING THE QUANTIZED PERCEIVED CHANNEL GAIN BACK TO THE SECONDARY TRANSMITTER. THE IMMEDIATE COST FOR THE TIME SLOT IS DERIVED BASED ON THE PREVIOUS OPERATIONS IN THE SLOT.

5. CONCLUSIONS

In wireless multimedia communication networks, application layer QoS, such as multimedia distortion, should be taken into consideration. In this letter, we took an integrated design approach to jointly optimize multimedia intra-refreshing rate, an application layer parameter, together with access strategy and spectrum sensing for multimedia transmission in a CR network.

REFERENCES

1] F. R. YU, V.W. S.WONG, AND V. C. M. LEUNG, “A NEW QOS PROVISIONING METHOD FOR ADAPTIVE MULTIMEDIA IN WIRELESS NETWORKS,” IEEE TRANS. VEH. TECH., VOL. 57, PP. 1899–1909, MAY 2008.

2] C.-F. Tsai, C.-J. Chang, F.-C. Ren, and C.-M. Yen, “Adaptive radio resource allocation for downlink OFDMA/SDMA systems with multimedia traffic,” IEEE Trans. Wireless Commun., vol. 7, pp. 1734–1743, May 2008.

3] M. van Der Schaar and S. S. N, “Cross-layer wireless multimedia transmission: challenges, principles, and new paradigms,” IEEE Wireless Comm., vol. 12, pp. 50–58, Aug. 2005.

4] S. Khan, Y. Peng, E. Steinbach, M. Sgroi, and W. Kellerer, “Application-driven cross-layer optimization for video streaming over wireless networks,” IEEE Comm. Mag., vol. 44, pp. 122–130, Jan. 2006.

5] Z. Han, G.-M. Su, A. Kwasinski, M. Wu, and K. J. R. Liu, “Multiuser distortion management of layered video over resource limited downlink multicode-cdma,” IEEE Trans. Wireless Commun., vol. 5, no. 11, pp. 3056–3067, 2006.

6] S. Haykin, “Cognitive radio: Brain-empowered wireless communications,” IEEE J. Sel. Areas Commun., vol. 23, pp. 201–220, Feb. 2005.

7] Z. He, J. Cai, and C. Chen, “Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding,” IEEE Trans. Circ. Sys. Video Tech., vol. 12, pp. 511–523, June 2002.

8] K. Stuhlmuller, N. Farber, M. Link, and B. Girod, “Analysis of video transmission over lossy channels,” IEEE J. Sel. Areas Commun., vol. 18, pp. 1012–1032, Jun. 2000.

9] J. Y. Liao and J. Villasenor, “Adaptive intra block update for robust transmission of H.263,” IEEE Trans. Circ. Sys. Video Tech., vol. 10, pp. 30–35, Feb. 2000.

10] G. Cote, S. Shirani, and F. Kossentini, “Optimal mode selection and synchronization for robust video communications over error-prone networks,” IEEE J. Sel. Areas Commun., vol. 18, pp. 952–965, June 2000.

11] Y. Chen, Q. Zhao, and A. Swami, “Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors,” IEEE Trans. Inform. Theory, vol. 54, May 2008.

[pic]

F. Richard Yu (S’00-M’04-SM’08) received the PhD degree in electrical engineering from the University of British Columbia (UBC) in 2003. From 2002 to 2004, he was with Ericsson (in Lund, Sweden), where he worked on the research and development of 3G cellular networks. From 2005 to 2006, he was with a start-up in California, USA, where he worked on the research and development in the areas of advanced wireless communication technologies and new standards. He joined Carleton School of Information Technology and the Department of Systems and Computer Engineering at Carleton University, in 2007, where he is currently an Assistant Professor. He received the Leadership Opportunity Fund Award from Canada Foundation of Innovation in 2009 and best paper awards at IEEE/IFIP TrustCom 2009 and Int’l Conference on Networking 2005. His research interests include cross-layer design, security and QoS provisioning in wireless networks.

He has served on the Technical Program Committee (TPC) of numerous conferences and as the Co-Chair of ICUMT-CWCN'2009, TPC Co-Chair of IEEE IWCMC'2009, VTC'2008F Track 4, WiN-ITS'2007. He is a senior member of the IEEE.

MMTC COMMUNICATIONS & EVENTS

Call for Papers of Selected Journal Special Issues

Ad Hoc Networks (Elsevier)

Special Issue on Multimedia Ad Hoc and Sensor Networks

Guest Editors: Tommaso Melodia, Martin Reisslein

Paper Submission deadline: December 15, 2009

Target Publishing Issue: 3rd Quarter, 2010

CfP Weblink:

Multimedia System Journal

Special Issue on Wireless Multimedia Transmission Technology and Application

Guest Editors: Gabriel-Miro Muntean, Pascal Frossard, Haohong Wang, Yan Zhang, Liang Zhou

Paper Submission deadline: Jan. 15, 2010

Target Publishing Issue: 4th Quarter, 2010

Cfp Weblink:

Call for Papers of Selected Conferences

IEEE GLOBECOM 2010

Website:        

Dates:         December 6-10, 2010

Location:         Miami, USA

Submission Due:    March 15, 2010

Next Issue Partial Content Preview

Scaling P2P Content Delivery Systems Reliably by Exploiting Unreliable System Resources

Kannan Ramchandran et al., University of California, Berkeley, USA

Unified Reliable and Secure Media Transmission: Challenges and Approaches

Chang Wen Chen, University at Buffalo, State University of New York, USA

Peer-to-Peer Streaming of Scalable Coded Video

Mohammed Ghanbari, University of Essex, UK

Locality Aware P2P Delivery: The Way to Scale Internet Video

Jin Li, Microsoft Research, USA

Context-aware Multimedia Services in Ambient-enhanced Collaborative Environments

Min Chen, Seoul National University, Korea

Cooperative Multimedia Communications

Andres Kwasinski, Rochester Institute of Technology, USA

E-Letter Editorial Board

EDITOR-IN-CHIEF

Haohong Wang

TCL-Thomson Electronics

USA

EDITOR

Philippe Roose Chonggang Wang

IUT of Bayonne NEC Laboratories America

France USA

Guan-Ming Su Shiguo Lian

Marvell Semiconductors France Telecom R&D Beijing

USA China

Antonios Argyriou

Phillips Research

Netherlands

MMTC Officers

CHAIR

Qian Zhang

Hong Kong University of Science and Technology

China

VICE CHAIRS

Wenjun Zeng Madjid Merabti

University of Missouri, Columbia Liverpool John Moores University

USA UK

Zhu Li Nelson Fonseca

Hong Kong Polytechnic University Universidade Estadual de Campinas

China Brazil

SECRETARY

Bin Wei

AT&T Labs Research

USA

-----------------------

[1] Many researcher may argue for a more stringent separation between the physical and link layers with the caveat that such separation is necessary for adherence to the traditional OSI layer model, and hence, for the sake of maintaining flexibility in the design and development of these two layers separately and independently.

-----------------------

Type-I

Parity

Data

Link-layer Channel

Decoder

Type-II

Parity

Type-I

Parity

Data

Type-I

Parity

Data

Data

Link-layer Channel

Decoder

Type-I

Parity

Type-II

Parity

Data

Transmitter sends 1st packet

Receiver fails to decode first packet

Transmitter sends 2nd packet with extra parity (Type-II) for 1st packet

Receiver decodes 2nd packet successfully and uses extra Type-II parity for 1st packet decoding

Feedback

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download