Playback Synchronization using XBMC



Media Playback Synchronization in XBMCEdoardo Daelli (edaelli@uccs.edu) and Michael J. Kopps (mkopps@uccs.edu)Abstract—In many homes, there exists many computers and many speaker systems. However, it is difficult or expensive to be able to play a single piece of music on all of these systems. Streaming media servers do exist, but they lack the ability to control the stream from any of the play points. These streaming servers also require an always-on machine to stream the music, a system that in many cases does not exist in the average person's home. Index Terms—Audio Synchronization, Xbox Media Center, XBMC, Home Theatre PC, Home Audio.INTRODUCTIONMany homes contain multiple computer systems, and many families have large digital music and even multimedia collections. However, not many homes have been wired for a whole home audio system. Even in those which are equipped, many times this wiring is not suitable for high quality audio playback. A system in which users can easily synchronize their computer audio playback would allow families to implement a whole home audio system without the daunting and expensive task of installing cabling, speakers, and control systems. There currently exists several off the shelf solutions that can play music using a standard wireless network. However, these solutions are relatively expensive. Furthermore, they require the use of specialized equipment. This equipment can be relatively costly, and they are closed source solutions, restricting the amount of customization that can be performed by the users. Also, these systems require a server computer to stream the music and control the streams. In many homes, this is an inconvenience.The authors propose a new solution to allow users to play media throughout their homes using the computer hardware they already own. Although quite a simple sounding problem, the solution has numerous facets that only come out in a detailed analysis of the problem. Timing control is the biggest issue to be overcome, especially across multiple hardware platforms running at different speeds and different operating systems.DesignHigh Level DesignSynchronizing the playback of media simultaneously on multiple devices that are in the same local area is a seemingly straightforward task. However, a closer inspection of the problem reveals a multitude of issues. To provide a bound on the scope of the project, the following have been decided upon. Firstly, this projects aim is only to synchronize the playback of devices that are all in the same locality. Since the audio output of two or more of these devices may be heard simultaneously, it follows that these two devices must be near each other. As such, these devices will therefore be able to be connected to the same local area network. By imposing this limitation, the latency and low speed of internet communication is eliminated in favor of the low latency and high speed of a LAN. This helps both in clock synchronization and in the acquisition of the actual municationThe systems that are participating in the synchronized media playback will all be controlled initially in the proof of concept using the HTTP API that is present in the Xbox Media Center (XBMC). This API is open source and allows remote computers to retrieve the current playlist from a remote computer. XBMC will be modified to provide the playlist as a SMIL formatted document. SMIL was chosen because it is an extremely flexible, standard format that is capable of transmitting both the playlist and the play times with great precision. SMIL is also able to handle custom tags if there ever is a need for functionality that does not exist in the standard format.HTTP is obviously limited in its half-duplex communication style. Communicating when songs are to be changed, paused, advanced, etc., cannot occur using HTTP due to its single direction format. This functionality does not exist in the proof of concept version. The XBMC project is abandoning the HTTP API in favor of a remote procedure call system called JSON-RPC. This system will allow full duplex communication with a widely available interface. This RPC interface will provide the means to directly control the playback of items with full remote control functionality to provide multicasting of fast forward, pause, stop, next, previous, and other playback controls. Clock Synchronization Using NTPIn order to synchronize the playback of media files across multiple pieces of hardware, there must be a common time source shared on all of these systems. To maximize the extensibility, the design utilizes the Network Time Protocol. This protocol is relatively lightweight and can therefore be implemented on a wide variety of hardware platforms. There also exists a reference application for UNIX, Linux, and Windows platforms. David Mills, the creator of NTP, has measured the accuracy of a workstation synchronized to a primary server over a Local Area Network as having errors in the order of 20 ?sec [Mills1]. This accuracy can be achieved by polling the NTP server once every 15 seconds, which represents a trivial amount of additional traffic on the network. With only microseconds of clock differences, the participating computers are more than able to synchronize playback based off of this common clock.What Is Synchronized?In our quest to play files simultaneously across multiple computers, the inevitable question arises: what is synchronized? Specifically, how much error can be introduced by each computer before a human can discern that the sources are not synchronized?Audio/Video SynchronizationThere has been research done in the analysis of synchronizing audio and video streams, often called the “lip synching” problem. The IBM European Networking Center [2] has conducted experiments to determine the offsets that are noticeable. They determined that the delta between the audio and video signals can be +/- 80 milliseconds and humans will be unable to notice this delay. Most subjects felt the samples in this range were in synch. The researchers found that beyond +/- 160 milliseconds, subjects virtually always noticed a synchronization problem and the skew was distracting to the overall experience. In between these two ranges, the subjects of the experiment would have a more difficult time identifying synch problems or they would not be bothered by these problems. Interestingly, audio lagging behind the video is less distracting to the participants than the other situation.Audio/Audio SynchronizationThere is very little research investigating the amount of delay that can be tolerated when listening to two audio sources. As a result, a small offshoot of this project focused on how much skew could be tolerated before the listener could notice interference.Two songs were selected as reference music. One song, Bach’s Toccata & Fugue in D was selected for its melodic and very subtle detail. The other, Breaking Benjamin’s Blow Me Away was selected for its use of loud, singular beats that would be easy for a listener to distinguish a synchronization problem. The initial experimental setup used Audacity to duplicate the stereo recordings, creating two left and two right tracks. Each stereo set was then offset by varying amounts. With the Toccata & Fugue, a noticeable amount of interference was noticed at between 100 and 125 milliseconds. A time of 75 milliseconds was determined to be a good target skew time for this piece of music. When the track was changed to the Breaking Benjamin song, it became obvious that 75 milliseconds was too great of a skew time. However, it also demonstrated a deficiency in the experimental setup. There was a noticeable amount of interference that could be heard, even when only the left or right stereo was played. It was suspected the interference was caused by either the audio card in the computer or the Audacity software. One of these sources was adding the two left sources and the two left sources before sending the signal to the computer. As a result, the speakers were reproducing a signal that was already distorted. As a result of this deficiency, each stereo recording was condensed into a single mono track. The mono recordings were then duplicated and set as the left and right tracks on a stereo recording. This meant the left and right speaker would be emulating two distinct computers in our distributed system, and they would each be reproducing a clean original signal. The audio signals would only be combined in the air to be interpreted by the brain.These modifications to the experimental setup had no noticeable effect on the Toccata & Fugue. The Breaking Benjamin song was much more noticeable for a particular skew value. Here, a skew time of 75 milliseconds had a very distinct lag that was very distracting to the listener. It had the qualities of a very persistent echo, and the overall quality was very low. Details were unable to be heard and the words in the vocals were muddy and harder to distinguish. The same results could be heard at 50 milliseconds. At 30 milliseconds, the audio quality was significantly higher, but an echo could still be heard. Although the vocals were much clearer, the subtle details were still quite muddy, lowering the overall quality from what is expected given a fully synchronized playback. At 20 milliseconds, there was again a noticeable improvement in overall quality of the playback. At this level, it became difficult for someone relatively unfamiliar with the song to distinguish and hear the skew. When heard from a distance of approximately 5 meters, the skew was completely unnoticeable. More trials were tried at intervals of 10, 12.5, 15, 17.5 millisecond delays, and each became more difficult to hear the skew for an experienced listener while listening at a short distance away from the speakers (approximately 0.75 meters).This experiment has shown the tolerance for playback skew is different for different types of music. Slower and more subtle music such as classical is much more difficult to identify the skew in play. Much more staccato music will tend to expose any differences in playback. The 10 to 20 millisecond range has been identified as the target playback delta time to maintain.Design ImplementationWith the results from the experiment described above, our goal was to generate software that could synchronize audio between two separate computers (or nodes) within a 10 to 15 milliseconds range. Our proposed solution is to change XBMC in two different ways: 1. Modify source code such that any computer on a local network can query any other computer's playlist; and 2. Modify source code such that a computer on a local network can request a playlist from another computer and synchronize to this playlist.The idea is to be able to have a node “join” the current playlist on another computer on the network. To join a playlist, a client has to have the following information from the server:A sequential list of the songs to be played;The location of those songs on the network;The currently playing song; andHow far into the currently playing song the server is.Assuming all the machines on a network have their times synchronized (with NTP as described in previous sections) the client can setup its own playlist to match the server's and start playing the current song at the determined offset.To satisfy the conditions above, the following implementation was created:Server Node - Serving a computer's playlist:Once a node in a local network is playing the contents of a playlist, new code was added to XBMC so that it could communicate (or serve) the contents of that playlist to other nodes in the network. The new functionality is implemented by sending the playlist information through an http HTTP connection using a SMIL document.This SMIL document consists of:A header with information on about the playlist.A list of songs to be played sequentiallyHow much time has elapsed since the currently playing song startedThe start time of the currently playing song.Client Node - Requesting a node's playlist:When a node in the network wants to join a playlist from another computer, it will send an http request with a predefined command string that instructs the server node to reply back with the SMIL document described above. Once the client receives the SMIL document, it can replicate the playlist, calculate the offset in the current song, and “join” the server node.Implementation IssuesWhile trying to implement the solution above, a few issues were encountered:Lack of current support on XBMC's music player to start a song at a certain offset.Hardware dependencies on low level media hardware that sometimes generated synchronization problems.Processor speed discrepancies cause software execution speed differencesInaccuracies in the NTP server being used.Alternative ImplementationDesignTo overcome the issues encountered, an alternative solution was developed as a proof of concept for our design. In this new design, a node does not join a playlist anymore. Instead, it will copy the playlist from a server locally and then send a message to the server node that instructs it to skip to the next song at a certain time.Since we are now synchronizing the start time of the next song, there is no need to start a song at an offset anymore. Every song is always started from the beginning.To go around the hardware dependencies, two exact identical laptop computers were used as the server and client nodes. We still faced the NTP inaccuracies issue, and decided to overcome those by querying the NTP server more often. Figure 1 illustrates the proof of concept algorithm:Node 1Node 2Request playlistReply with SMIL playlistRequest “skip” at time “x”Reply with new SMIL playlistTime “x”Nodes are Synchronized!Time 0Figure 1. Alternative implementation flow.Pseudo codeThe following pseudo code descriptions were used to help the implementation of the alternative design:Client Node:Send command to server to get SMIL documentIf SMIL document received successfully:Parse SMIL document.Create playlist to match server.Send command to server to skip to the next song in 3s.Wait until the synchronize time.Start playlist from the second song.Server Node:When starting playback of a song, record time playback startedListen for client requests for playlists.Once a request is made:Generate and Ssend SMIL playlist (using recorded playback start time).Wait for response.Listen for client request to skip to thenext song.Once a request for skipping is made:Read the request’s time to skip to the next song.Wait for the requested time.Skip to the next song.ResultsThe alternative implementation of the design proved to be a good starting point. Code written for XBMC server node was able to communicate through the httpHTTP API and the nodes were able to share their playlists as SMIL documents. Code written for the XBMC client node was able to use the information from the SMIL documents to create and start playlists at predetermined times.For our tests, both nodes had their times synchronized with NTP, and were running on the exact same hardware and operating system. This setup showed good results and in most tests there was no noticeable delay between the two nodes when playlists were synchronized. Even though the test machines were identical, care had to be taken when running the test to make sure they were both running at the same speed, and no power saving options were enabled. A few tests were executed where one of the setups was in “power save” mode. This caused the systems to be operating at different frequencies and yielded poor results where delay between played audio was noticeable.Pseudo codeClient Node:Send command to server to get SMIL documentIf SMIL document received successfully:Parse SMIL document.Create playlist to match server.Send command to server to skip to the next song in 3s.Wait until the synchronize time.Start playlist from the second song.Server Node:Listen for client requests for playlists.Once a request is made:Send SMIL playlist.Wait for response.Listen for client request to skip to thenext song.Once a request for skipping is made:Read the request’s time to skip to the next song.Wait for the requested time.Skip to the next song.ConclusionSynchronizing audio playing playback on different nodes in a local area network is a complex but viable problem to solve using cheap, readily available components already present in most households today. NTP can be used to synchronize the nodes clock within an acceptable range on a local network and SMIL proved to be a powerful method to communicate playlists and timing information between computers.When synchronizing audio streams, one must ensure that both streams are being played no more than 15ms apart so that the sound quality is not compromised by way of interference of the different nodes playing the media. This requires much more control over the playback than was available during the proof of concept development.This project can be extended and expanded in several ways. Firstly, as previously descrieddescribed, the communication should use the JSON-RPC method which will allow bi-directional messaging. This will be important for controlling the devices that have joined the playback group. Further, this project can be enhanced to encompass more media players, such as Windows Media Player, MediaMonkey, VLC, etc. This will allow a much more diverse number of devices to join and participate, ultimately allowing more people to use the software. ReferencesMills, David L. “NTP Precision Time Synchronization.” Network Time Synchronization Research Project. July 5, 2008. HYPERLINK "" (accessed March 1, 2010).R. Steinmetz, “Human perception of jitter and media synchronization,” IEEE J. Select. Areas Commun., vol. 14, pp. 61-72 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download