MP3 STEGONOGRAPHY - Terminally Incoherent



MP3 STEGONOGRAPHY

Applying Stenography to Music Captioning

Lukasz Grzegorz Maciak

Micheal Alexis Ponniah

Renu Sharma

TABLE OF CONTENTS

MP3 STEGONOGRAPHY 1

TABLE OF CONTENTS 2

PROJECT GOAL 3

WHY MP3? 3

THE ANATOMY OF AN MP3 4

MP3 Encoding 4

MP3 File Structure 6

MP3 Headers 8

STEGONOGRAPHY ON MP3 FILES 11

Non Stegonographic Methods 11

ID3v2 Comment Frame 11

Lyrics3 Standard 12

Non Stegonographic Methods Summary 13

Encode Time Stegonography 13

Low Bit Encoding 13

Phase Encoding 14

Spread Spectrum 14

Echo Data Hiding 15

Flaws of Encode Time MP3 Stegonography 15

Existing Implementations 16

Post Encoding MP3 Stegonography 16

Unused Header Bit Stuffing 17

Padding Byte Stuffing 17

DESIGNING STEGONOGRAPHIC SOFTWARE FOR MP3 FILES 18

Specifications 18

Stegonographic Module – Implementation Notes 19

StegIO Header Algorithm 23

Stegonographic Module - Padding Byte Stuffing 24

Stegonographic Module Usage 25

Stegonographic Module – Implementation Issues 26

IMPLEMENTING GRAPHICAL USER INTERFACE FRONT END 26

JAVA and MP3: 27

Java Sound API 27

Java Media Framework (JMF) 27

JAVA BASED MP3 PLAYER 28

MP3 Player Implementation: 28

MP3 Player Implementation details: 28

MP3 Player Screenshots 28

FUTURE WORK 30

REFERENCES 31

FIGURES 32

EQUATIONS 32

PROJECT GOAL

The goal of this project is to embed textual information into a popular media using stegonography. It can be assume that the text is relatively short when compared to the media file. A good example of this is the relationship between a recoded song, and it's lyrics. The audio file containing the recording is much larger than the song lyrics stored as a plain ASCII files. Therefore it is probably safe to assume that the smaller file could be stegonographically embedded into the larger one without impacting the quality. Similar argument could be made about video data and close captioning information.

This project concentrates on the song/lyrics dynamics in order to create a stegonographically driven karaoke machine. The song lyrics will be seamlessly embedded into an audio file, and then displayed on the screen when the file is played. This research will include implementation of stegonographic algorithm for encoding data inside audio files, as well as technique to dynamically extract that data and play it back.

WHY MP3?

The MP3 format is a very good target for this research because it is currently one of the most popular music encodings. Potential users of karaoke software are most likely to pick the mp3 format above any other audio encoding on the market. Because the end goal of this project is to create a usable piece of software, catering to the tastes and needs of the end users seems to be a good idea.

Furthermore, mp3 is an open standard which means that it is well documented and accessible. Thus the uncovering the inner workings of this format does not pose any legal threats to the researchers. On the other hand, choosing a proprietary, closed format such as Windows Media Audio (WMA) could put the researchers in legal jeopardy.

Doubtlessly any stegnographic research will rely heavily on exploiting certain properties of the data format chosen as the information carrier. This project is no different. However the actual research process, data examination and implementation steps could be replicated for other media to create analogous solutions.

THE ANATOMY OF AN MP3

The implementation proposed in this paper relies heavily on the specific properties of the mp3 data format. Therefore it is only logical to start the discussion by reviewing the structure of these media files.

The mp3 format is designed to store audio data, which is different from visual information stored in images. Therefore image stegonography techniques may not always work with audio data. Furthermore, unlike some image data formats, mp3 files are compressed and encoded in a very storage-conscious way. Thus they are not the best host files for stegonographic data.

MP3 Encoding

MP3 is a lossy data format which aims to preserve the sound quality while minimizing storage space. The encoding process takes into account the properties of human auditory system. For example, humans cannot hear frequencies below 20Hz and above 20kHz. Furthermore human ear is often unable to distinguish between two or more notes with specific frequencies when they are played together. Thus mp3 file can safely discard any sounds with frequencies out of the audible scale, and needs to store only a single copy out of a group of similar sounding notes. This is of course not a trivial process. Mp3 encoders employ a complex psychoacoustic modeling to perceptually optimize the data.[1]

The encoding process is very complex and involves both perceptual optimization, as well as more conventional data compression methods. Figure 1 shows a conceptual model of an mp3 algorithm:

[pic]

Figure 1

Explaining the actual encoding procedure is out of the scope of this paper. However there are several tasks performed by that the encoder that may be of some interest to a stegonography researcher.

The mp3 encoder breaks the audio data into small fragments called frames. Each frame represents a fraction of a second. The size of the frame depends on the audio resolution or bit rate. The most convenient (algorithmically) to do this is to assume a constant bit rate throughout the recording thus forcing same size onto all frames. However music is not structured this way. Very often very dynamic sequences including vocals and many instruments playing at the same time are interweaved with very simple melodic tracks. Therefore using a constant bit rate (CBR) is not always economical. MP3 specification allows the data to be stored in a variable bit rate format (VBR) which means that the audio frames are not the same size. [1]

Each frame is perceptually analyzed using the psychoacoustic model. The frequencies that are not audible are discarded, or allocated minimal number of bits. The exact inner workings of this procedure are complex and beyond the scope of this paper.

Once perceptual optimization is done, the data is compressed using Huffman coding. This is a lossless algorithm so the audio information is preserved, while decreasing storage space. [1] This is an important fact for a stegonographer.

Because of the nature of the compression algorithm, Huffman coded data cannot be easily modified. Huffman coded data is stored using variable length bit strings that are matched against a lookup table. The most frequently used characters are encoded with the shortest possible strings, while the rare ones are coded with longer strings. Thus it is possible that certain values have two or three bit codes. [4] Inverting a single bit therefore can completely change a value of the coded data.

Furthermore the data cannot be easily divided into bytes, words and etc. So the least significant bit of a given byte may actually be the most significant bit of a Huffman coded character. Therefore least significant bit substitution cannot be easily done on Huffman coded data.

Compressed audio data is then reassembled. Each frame is pre-pended with a header which stores information about the bit rate, sample rate and other meta-data [1].

MP3 File Structure

MP3 files are therefore composed from short data frames, padded with headers. MP3 file can also contain some meta-data tags. There are two types of these tags. ID3v1 is the older format which is post-pended at the end of the file. This tag is always 128 bytes long and it contains seven fields which specify the artist name, song title, album, genre and etc… Because of it’s static size, and lack of flexibility, this tag type is slowly replaced by the more advanced ID3v2 standard. [6]

The newer, more flexible ID3v2 tags are pre-pended to the file. Their structure is almost as flexible as the structure of the mp3 file itself. ID3v2 tags are composed of their own frames which store various bits of information. This might be the standard character strings such as artist name and song title or more advanced information about the way the file was encoded. ID3v2 tags can be used to provide useful hints to the decoder. As an example, equalization curves are often stored in ID3v2 tags. There is no set size limit on ID3v2 tags so in theory they can grow indefinitely. [5]

MP3 files in circulation can include either tag type. There is no clear preference, so a stegonographer has to be prepared to deal with information tags present either before or after the audio data stream. However, it is logical to assume that ID3v1 tags will become increasingly rare in the future. Figure 2 below shows a conceptual model of an mp3 file:

[pic]

Figure 2

Due to their extendibility the ID3v2 tags would be an interesting target for embedding information, however they are not guaranteed to be present in every mp3 file. Thus the best approach is to embed the data into the data frames. Before discussing stegonographic methodology however, it would be best to take a closer look at the data frame.

MP3 Headers

As it was mentioned before, MP3 files can be encoded with variable frame rate (VBR) which in fact makes the frames vary in size. Since the frame sizes are not obvious it is necessary to be able to identify where a frame starts and where it ends. This is not as difficult as it would first appear. Each frame is pre-pended with a frame header. All headers are very similar in structure and content. In fact, they will often be identical. Thus, identifying an mp3 header is just a matter of pattern matching.

Each header starts with a 12 bit block called the Sync block (see Figure 3). The Sync is a string of ones which is supposed to help the decoder to home in on a header. Therefore to find a frame one simply needs to detect a 12 consecutive bits initialized to be 1.

[pic]

Figure 3

However, this pattern is not necessarily unique to a header. In fact this pattern can be easily found in any longer data block. There are few other checks that can be performed to identify a 4 byte data block as a header:

• The Layer field cannot be 00

• The Bit-rate field cannot be 0000 or 1111

• The Frequency field cannot be 11

A 4 byte block which starts with the Sync and does not violate the conditions listed above is probably a header. [7]

Figure 4 shows an alternative view of the mp3 header in which the fields are marked with characters. Table 1 provides brief explanations of each field.

[pic]

Figure 4

Table 1

|Position |Purpose |Length |

|A |Frame sync |11 |

|B |MPEG audio version (MPEG-1, 2, etc.) |2 |

|C |MPEG layer (Layer I, II, III, etc.) |2 |

|D |Protection (if on, then checksum follows header) |1 |

|E |Bitrate index (lookup table used to specify bitrate for this MPEG version and layer) |4 |

|F |Sampling rate frequency (44.1kHz, etc., determined by lookup table) |2 |

|G |Padding bit (on or off, compensates for unfilled frames) |1 |

|H |Private bit (on or off, allows for application-specific triggers) |1 |

|I |Channel mode (stereo, joint stereo, dual channel, single channel) |2 |

|J |Mode extension (used only with joint stereo, to conjoin channel data) |2 |

|K |Copyright (on or off) |1 |

|L |Original (off if copy of original, on if original) |1 |

|M |Emphasis (respects emphasis bit in the original recording; now largely obsolete) |2 |

Frame size is a function of bit-rate and sampling frequency. The size of a given frame in bytes can be obtained using the following equation:

Equation 1

[pic]

Most of the fields which are not involved in the size calculation are inessential. In fact, some of them are very rarely used at all. However, the field G or the Padding Bit is very significant. Sometimes the frames need to be padded with some empty bytes to even out the frame rate. This means that some frames contain 1 byte of useless data. This information can be easily exploited by a stegonographer.

STEGONOGRAPHY ON MP3 FILES

Having reviewed the mp3 format and its properties in details, it is now possible to discuss the actual stegonographic approaches to store data in audio files. The available methods can be divided into three categories:

• Non-Stegonographic methods

• Encode time mp3 stegonography

• Post encoding mp3 stegonography

Non Stegonographic Methods

Embedding lyrics into audio files is not a new idea. There are several methods to embed textual information in mp3 files. This can be done by either using the ID3v2 comment tags or the special extension of ID3 called Lyrics3.

ID3v2 Comment Frame

The ID3v2 tags have limited support for embedding song lyrics. As per section 4.11 of the specification document the commend frame is “is indented for any kind of full text information that does not fit in any other frame.” [8]

This means that lyrics can be easily embedded into and extracted from the comment frame. This method is not very efficient because there is no guidelines on how to format the text lyrics, how to split them among text frames and etc. Furthermore, not all mp3 files use the ID3v2 tags so this is not a universal approach.

Lyrics3 Standard

To overcome the shortcomings of ID3v2 Comment Frame the ID3 standard was extended with the Lyrics3 specification. This document describes how to add an extra frame between the audio block and ID3 tag which would contain song lyrics accompanied by meta-data. [9]

[pic]

Figure 5

Figure 5 shows a possible way to embed Lyrics3 frame into an mp3 file. As shown in the picture the lyrics block always starts with a literal string LYRICSBEGIN and ends with LYRICSEND. The Lyrics3 spec also includes fields specifying the author of the lyrics (ie. the person who wrote them down), a link to an album cover and the time it takes for each line of text should be displayed on the screen.[9]

There are many existing plug-ins for popular mp3 players that work with this standard.

Non Stegonographic Methods Summary

Unfortunately neither one of the methods described above can be used in conjunction with this project. While they both are valid and successfully implemented approaches they have nothing to do with stegonography. This projects main focus is attempting to develop a stegonographic method of embedding lyrics into music files.

Encode Time Stegonography

As the name suggest, encode time mp3 stegonography is conducted during the mp3 encoding process. These methods require the researcher to either implement their own encoder, or modify an existing one to introduce stegonographic data into the audio stream after the psychoacoustic optimization, but before Huffman compression.

Encode time mp3 stegonography include:

• Low Bit Encoding

• Phase Coding

• Spread Spectrum Coding

• Echo Data Hiding

Low Bit Encoding

Low bit encoding or Least Significant Bit (LSB) encoding technique proposes to encode the least significant bit of a host file with a bit of the stegonographic data. The this method assumes that this alteration introduces only a minutiae difference into the host file which would be hard to detect. This is not necessarily true in mp3 files. Since mp3 is composed from bit field headers (where flipping a single bit can for example cause the decoder to interpret the frame size wrong) and Huffman Coded data it is nearly impossible to do real LSB stegonography.[2]

Thus the only possible approach is to conduct LSB at encode time, on psycho-acoustically compressed data before it hits the Huffman compression stage. Since Huffman compression is lossless the compression the encoded information will be preserved. Then the embedded information can be recovered by decompressing Huffman code of each frame and extracting the least significant bit.

While effective, this method can introduce significant levels of noise into the music data. This is unacceptable, as this project aims to introduce as little noise as possible.

Phase Encoding

Phase encoding is a much more complex method than the simplistic LSB encoding. Phase encoding “works by substituting the phase of an initial audio segment with a reference phase that represents the data. The phase of subsequent segments is adjusted in order to preserve the relative phase between segments” [10]

The IBM researchers claimed to achieve a capacity that “varied from 8 bps to

32 bps, depending on the sound context” [10]

Similarly to LSB, this method requires manipulation of data before Huffman encoding. It is supposed to introduce much less noise into the recording

Spread Spectrum

Spread spectrum encoding aims to spread out the encoded data across the available frequencies. The text is modulated using pseudorandom noise sequence which then becomes the key. Then the modulated data is attenuated and added to the original file as additive random noise. [10]

Due to the wide spread of the noise, it is possible that it will be almost inaudible to a human ear. However due to the complexity of implementing this method, it was beyond the scope of this project.

Echo Data Hiding

Text can be embedded in audio data by introducing an echo to the original signal. The data is then hidden by varying three parameters of the echo: initial amplitude, decay rate, and offset. It is possible to time the echo in such a way that a human ear will not be able to distinguish between the two signals, and registers the echo as some added resonance. However, doing so is not trivial. [10]

There is no set rule of how long should be the delay between the signal and echo to achieve near synthesis. According to IBM research the best delay for the coded echo is 1/1000 of a second for most listeners. [10]

Flaws of Encode Time MP3 Stegonography

Encode time stegonography is possibly the best method to hide data in audio files. It provides the best diffusion and confusion rates and is optimal for data hiding. However it has some important flaws.

To implement one of the encode time methods a researcher needs to either create or modify an existing mp3 encoder. The decoder also needs to be modified to extract the embedded data and decode time. This makes the encode time methods difficult, and time consuming to implement.

In addition encode time methods are inexplicably tied to a single implementation of mp3 encoder/decoder. This creates portability and extendibility issues, and prevents researchers using proprietary or non-extendible encoding algorithms.

Furthermore to encode textual data one has to have access to a non-mp3 source audio file in the form of a wav file. One should not assume that users of the lyric embedding software will have music stored in a source format. Most of the time music collectors collect mp3’s, not wav’s. Requiring users to have a collection of wav files readily available defeats the purpose of this project. Ideally, a user should be able to obtain a random mp3 file and embed lyrics inside of it without the need for a source file.

Existing Implementations

The mp3stegoc software created by Fabien Petitcolas is a great example of encode time stegonography. His algorithm manipulates the data at a very early stage of encoding before the audio data is optimized. He included built in checks which prevent the introduced noise to exceed the psychoacoustic thresholds optimized by the mp3 encoder. His algorithm resembles phase encoding methodology. [11]

This implementation is however limited as it requires the user to have a 16 bit mono wav source file encoded with 44100Hz pulse code modulation. It does not allow the user to encode stereophonic wav files. [2]

It is difficult to expect anyone to have source files which fit this specification. Thus Petitcilas mp3stego software is not a good model for this project.

Post Encoding MP3 Stegonography

There is very little research done in the area of post encoding stegonography. This is probably due to the fact that post encoding methods cannot achieve very good diffusion of stegonographic data. Mp3 file format is not very flexible, and as it was mentioned above writing over random bytes can irreversible corrupt audio data.

However, for the purpose of this project this seems to be the right approach. Since lyrics do not need to be hidden, diffusion and confusion concerns are not an issue. Post encode time methods usually exploit the quirks of mp3 file structure, and hide data in blocks that do not belong to actual audio data sequences. This makes audio corruption virtually impossible.

There are at lest two possible approaches to post encoding stegonography:

• Unused Header Bit Stuffing

• Padding Byte Stuffing

Unused Header Bit Stuffing

The mp3 frame headers are composed of various fields explained in Table 1. Some of these fields are very rarely used. Good examples are the Private bit, the Copyright bit, the Original bit, and the Emphasis bit. Most mp3 players completely ignore these fields. It is also safe to assume that their value is not essential to the integrity of the audio frame. Changing one of these fields may at worst cause the player to misinterpret copyright information about the given frame.

Therefore each audio frame contains at least 4 bits that can be used for embedding data. Koso et all proposed using this method to implement digital watermarking, but embedding lyrics would be a logical extension of this approach. At 4 bits per frame, one is able to achieve a decent capacity. [12]

Padding Byte Stuffing

Some mp3 audio frames are padded with an “empty” byte to even out the frame rate. On average mp3 files tend to have hundreds of frames which need to be padded. Since the padding bytes do not carry any audio information they are a good target for data stuffing.

Padding byte stuffing is an attractive method because it is relatively straightforward to implement and has good average storage capacity. It is possible to encode information at 1 byte per frame as long as padding bytes are available. Since lyrics files are usually small, there should usually be enough padding bytes to contain the whole text.

Surprisingly, there is very little research dealing with padding byte stuffing. In fact there is very little documentation available regarding the nature of padding bytes, their location in the frame and similar information. It seems to be a new avenue for stegonographic research.

Thus this project proposes to use padding byte stuffing to embed song lyrics into mp3 files.

DESIGNING STEGONOGRAPHIC SOFTWARE FOR MP3 FILES

Specifications

After considering all available options the following set of specifications and requirements was agreed upon the whole research team:

1. The project must be platform independent and portable.

2. The implementation should not be tied to a single mp3 decoder.

3. As per 1 and 2, Post Encoding Stegonography should be used

4. The software should use Padding Byte Stuffing as the primary stegonographic method.

5. A graphical mp3 player interface with a text box for displaying lyrics should be part of implementation.

6. The lyrics should be synchronized with the song

7. There should be clear separation between the stegonographic layer and display layer.

Java was chosen as the primary language for implementing the software due to the platform independence requirement. This was also the language the research team was familiar with.

As per requirement 7, the project was divided into two distinct parts: the stegonographic layer and presentation layer.

Stegonographic Module – Implementation Notes

The stegonographic layer was implemented as stand alone application. The mp3stego.jar file can be downloaded from

The application performs two functions: embedding the text into mp3 files and extracting embedded text. To conserve space the text is encrypted using lzw algorithm.

The lzw algorithm is a dictionary based compression. “The dictionary starts off with 256 entries, one for each possible character (single byte string). Every time a string not already in the dictionary is seen, a longer string consisting of that string appended with the single character following it in the text, is stored in the dictionary. The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits.” [14]

[pic]

Figure 6

Figure 6 shows an example application of lzw algorithm. In the example above the savings are 12.5%. Longer text will usually yield more compression than a short one. In fact, documents of significant length can be even compressed by a factor of 2 or more. [15]

For song lyrics (which are usually relatively short – text documents below 5kB) an average compression rate of 20 to 30% was achieved.

The java based implementation of lzw used in this project was developed by Cheok Yan Cheng. [12]

Figure 7 shows the UML diagram of a proposed implementation. Below is a short overview of the most important classes.

mpstego.mp3.Header – can be initialized with an integer array of size 4 which represents a header read from the file. The isValid() method can be used to test if the array represents a valid mp3 header. If it does, this class allows one to calculate the size of the frame described by this header, and availability of a padding byte.

mpstego.file.text.press – implementation of the lzw compression algorithm developed by Cheok Yan Cheng. It will generate a compressed lzw file that can be used for embedding in an audio file.

[pic]

Figure 7

mpstego.file.text.lzw.Decompress - implementation of the lzw compression algorithm developed by Cheok Yan Cheng. It reads a lzw file and generates a plain text ASCII file.

mpstego.file.mp3.StegIO – is the input/output class. The mp3 file is edited in place using the java.io.RandomAccessFile library. StegIO loops through the file, reading 4 byte segments and tries to match a header using the Header class. Once a header is identified, StegIO seeks to the padding byte location and either reads or writes out a byte. StegIO also obtains the lyrics from a text file using java.io.FileInputStream.

StegIO has a dual role of reading and writing data into the file. Initially, the plan was to split these roles among two different classes. However, the process of identifying headers and searching for padding bytes is quite complicated. The same algorithm must be used in both cases. It would be a bad design choice to replicate the code in two different classes. Thus reading and writing was integrated in a single class.

Figure 8 and Figure 9 show potential flow of control diagram. The Main class calls either the Compress or Decompress to do the lzw compression, and then calls the StegIO class to do actual embedding. This is a very simple, and yet powerful design.

[pic]

Figure 8

[pic]

Figure 9

StegIO Header Algorithm

The StegIO class uses a fairly straightforward (but not trivial) algorithm to find padding bytes. The procedure used to accomplish this can be expressed in the pseudo code in Figure 10.

| |

|calculate the length of the message in bytes |

|prepend the length to the message |

|put the message on a byte queue |

| |

|while(there are still bytes to be written) |

|{ |

|Header = read 4 bytes from the file |

| |

|if(Header is_valid && contains padding byte) |

|{ |

|seek to the end of frame |

|pop the byte from the queue |

|write the popped byte into file |

|} |

|} |

Figure 10

To read back the data, and reverse the process see the pseudo code in Figure 11.

| |

|length = integer > 4 |

|counter = 0 |

|while (counter < length) |

|{ |

|Header = read 4 bytes from the file |

|if(Header is_valid && contains padding byte) |

|{ |

|if(counter == 4) |

|{ |

|length = to_integer ( pop 4 bytes from queue) |

|} |

|seek to the end of frame |

|read a byte from the file |

|push the read byte onto queue |

|} |

|} |

Stegonographic Module - Padding Byte Stuffing

Using the padding byte stuffing method to embed the lyrics into a mp3 file poses a significant problem. How to store compressed text in a way that is easy to extract later? For example, how does extraction algorithm know that it has read the last byte of lyrics? One could assume that padding bytes always hold the value of 0 but that does not have to be always true.

Another method would be to put a stop marker as the last byte of encrypted text. However, since text is stored as compressed binary file, it is not always possible what kind of characters will it contain. Since lzw tokens can be from 9 to 16 bits, some data bytes can contain unpredictable combinations.

Thus another method of marking the end of lyrics is needed. For the purpose of this project, the length of the lyrics in bytes is appended to the beginning of the textual message. It is stored as an integer, so the first 4 padding bytes always contain the information about the length of the lyrics data.

This makes it possible to encode up to 232 bytes of data in a given mp3 file. Fortunately no mp3 file can possibly contain that much empty space. Thus this limit is perfectly acceptable.

Stegonographic Module Usage

To embed text into an mp3 file one needs to specify the mp3 file and the lyrics file on the command line:

java –jar mp3stego.jar mp3_file.mp3 text_file.txt

Where mp3_file.mp3 is the path to the host mp3 file and text_file.txt is a plain text ASCII file containing the lyrics. This will create a compressed lzw file with the same name as the text file in the same directory. The user does not need to do anything with this file, he/she should just be aware that one will be created.

To extract lyrics one needs to specify just the mp3 file name on the command line:

java –jar mp3stego.jar mp3_file.mp3

This will create both a compressed lzw file and plain text ASCII file in the same directory as the mp3 file.

Stegonographic Module – Implementation Issues

The main problem with Padding Byte Stuffing method is the limited amount of information available about how the mp3 frames are padded. The assumption that the padding byte is the very last byte of the audio block of a frame does not necessarily need to be correct. This was the assumption used in this project.

Unfortunately the algorithm developed for this implementation failed to correctly identify padding bytes. This was either because this assumption was incorrect, or due to programming errors, The ext data was stuffed into what appeared to be the last byte of the audio frame, but these bytes were empty (ie 0 or 0xFF) only half of the time.

Thus written data usually introduced significant amount of noise into the recording. Because the mp3 file is always edited “in place”, it is highly recommended to use this algorithm only on redundant copies of mp3 files due to the high risk of data corruption.

Because of time constraints, it was not possible to identify single source of the data corruption. It is likely that this is due to the limited knowledge about the nature of Padding Bytes. Stuffing might not a valid stegonographic method.

It is also possible that the code contains a trivial error that is yet to be discovered.

IMPLEMENTING GRAPHICAL USER INTERFACE FRONT END

Having a functional stegonographic back end was only part of the project. Next part would be implementing a graphical user interface to play mp3 files and display lyrics on screen. Once again, this is not a trivial task. Since the language of choice for this project was Java, it would be a good idea to review the support for playing mp3 files in this language.

JAVA and MP3:

Java provides a low-level API for effecting and controlling the input and output of sound media, including both audio and Musical Instrument Digital Interface (MIDI) data, called sound API. More advanced features can be found in external libraries created by the Java community. One example of such library set is the Java Media Framework (JMF).

Java Sound API

The Java Sound API provides the lowest level of sound support on the Java platform. It provides application programs with a great amount of control over sound operations, and it is extensible. For example, the Java Sound API supplies mechanisms for installing, accessing, and manipulating system resources such as audio mixers, MIDI synthesizers, other audio or MIDI devices, file readers and writers, and sound format converters. The Java Sound API does not include sophisticated sound editors or graphical tools, but it provides capabilities upon which such programs can be built. It emphasizes low-level control beyond that commonly expected by the end user.

Java Media Framework (JMF)

There are other Java platform APIs that have sound-related elements. The Java Media Framework (JMF) is a higher-level API that is currently available as a Standard Extension to the Java platform. JMF specifies a unified architecture, messaging protocol, and programming interface for capturing and playing back time-based media. JMF provides a simpler solution for basic media-player application programs, and it enables synchronization between different media types, such as audio and video. On the other hand, programs that focus on sound can benefit from the Java Sound API, especially if they require more advanced features, such as the ability to carefully control buffered audio playback or directly manipulate a MIDI synthesizer. Other Java APIs with sound aspects include Java 3D and APIs for telephony and speech. An implementation of any of these APIs might use an implementation of the Java Sound API internally, but is not required to do so. [15]

JAVA BASED MP3 PLAYER

Due to limited support in the standard API a java based mp3 player has to employ the some external libraries. JMF is a mature project and it is employed in many existing java applications. Thus it seemed a logical choice to use it.

MP3 Player Implementation:

JMF is a very popular framework, and there is an abundance of existing JMF based players. Writing another player from scratch would be an interesting exercise, but it seems out of the scope for this project (which is focused on stegonography). Therefore, instead of inventing the wheel the choice was made to use an open source java player called “Java MP3 player”. []

This is purely a swing based MP3 player.

Goal using this player:

1. Able to read text from a file.

2. Display the lyrics and change them accordingly.

MP3 Player Implementation details:

The javax.media.Player class has a method getMediaTime() which returns a Time class. Using this method what we are trying to do is to track the time lapsed from the start of the song till any give given time when the song is played.

MP3 Player Screenshots

Figure 10 shows the main window of the mp3 player. It contains all the important controls (ie. play, stop, rewind). The long empty rectangle is a label which will display the lyrics.

[pic]

Figure 11

Figure 11 shows a screen allowing the user to create a play list. This generates an m3u file which then can be played as shown in Figure 12.

[pic]

Figure 12

[pic]

Figure 13

[pic]

Figure 14

Figure 14 shows the actual player in action. The lyrics are displayed in the viewing area, along with the progress bar.

The player utilizes the stegonographic module to extract the lyrics from a mp3 file prior to playing the song.

SYNCHRONIZING MUSIC AND TEXT

An integral part of the project was to develop a way to synchronize the music and the text displayed on the screen to create Karaoke like effect. To achieve this, the embedded lyrics had to include some sort of meta-data that would specify how long each line should be displayed on the screen.

Lyrics Meta Data Format

The following standard for recording lyrics was adopted:

• Each line starts with an integer specifying the time offset from the beginning of the song (in milliseconds)

• The time offset is followed by a pipe character “ | ”

• The pipe character is followed by the text to be displayed on the screen

• The line is ended with a single new line character (“ \n ”)

• The lyrics file is terminated with a single hash mark (“ # “) on a new line

Figure 15 shows a sample lyrics file would be formatted

| |

|19100|May the good Lord be with you |

|21250|Down every road you roam |

|28300|And may sunshine and happiness |

|31150|surround you when you're far from home |

| |

|… |

| |

|207790|Forever Young |

|217240|For, forever young |

|227050|Forever Young |

|# |

Figure 15

Synchronization Method

To obtain the time offset a manual method was used. In other words, the time readings for each line were obtained by timing the song using a stopwatch and recording the results. This method is far from perfect, but due to time constraints this was the only markup method available to this research team.

Because of the limits of human reflexes and response time, this method introduces a small delay. The lyrics are displayed a fraction of a second after the vocalist starts singing, because this is how long it takes for the human brain to process the information and record the time. Thus the time offsets often have to be tweaked by hand to achieve full synchronization.

FUTURE WORK

Because of the problems with finding Padding Bytes it would be very interesting to implement the stegonographic module with another method. The Unused Header Bit approach seems to be the most appropriate one. It could be easily modified to work within the existing framework of the stego module.

It would also be worth it to slightly redesign the class structure to achieve better encapsulation. StegIO class should be made abstract, and the actual reading and writing tasks should be moved into its subclasses. This is purely a cosmetic change, but it would greatly improve the readability and maintainability of the code base.

The manual synchronization method also should be improved or replaced with an automated one. It would be a very interesting project to see if it would be possible to generate time offset information by analyzing an mp3 file, when given a lyrics file.

REFERENCES

1. Hacker, Scot, “MP3, The Definitive Guide”, 1st Edition, March 2000, O'Reilly Publishing.

2. Noto, Mark, MP3Stego: Hiding Text in MP3 Files, September 2001, SANS Institute.

3. Koichi Takagi, Shigeyuki Sakazawa, Yashuiro Takishima, “Light Weight MP3 Watermarking Method for Mobile Terminals”, KDDI R&D Labs, MM'05 November 6-11, 2005 Singapore, ACM

4. “Huffman Code”, Wikipedia,

5. M. Nilsson, “ID3 tag version 2.4.0 - Main Structure”, November 2000,

6. Predrag Supurovic, “MPEG Audio Frame Header”, 1998 DataVoyage,

7. “The Private Life of MP3 Frames”,

8. M. Nilsson, “ID3 tag version 2.3.0”, February 1999,

9. Strnad Peter, Gingold Peter, “Lyrics3 Tag v2.00”, Jun 1998,

10. Bender W., Gruhl D., Morimoto N., Lu A., “Techniques for data hiding”, 1996, IBM,

11. Petitcolas Fabien A. P., “mp3stego”, 1997–2005,

12. Koso A., Turi A., and Obimbo C., “Embedding Digital Signatures in MP3s”, from proceedings 477 Internet and Multimedia Systems, and Applications, 2005

13. Cheng Cheok Yan, “Introduction On Text Compression Using Lempel, Ziv, Welch (LZW) method”,

14. “LZW”, Wikipedia,

15. “Information-and-Entropy”, MIT, Spring 2003,

FIGURES

1. Fraunhofer-Gesellschaft,

2. Image © 2005, Lukasz Grzegorz Maciak.

3. “The Private Life of MP3 Frames”,

4. Hacker, Scot, “MP3, The Definitive Guide”, 1st Edition, March 2000, O'Reilly Publishing.

5. Strnad Peter, Gingold Peter, “Lyrics3 Tag v2.00”, Jun 1998,

6. “Information-and-Entropy”, MIT, Spring 2003,

7. Image © 2005, Lukasz Grzegorz Maciak

8. Image © 2005, Lukasz Grzegorz Maciak

9. Image © 2005, Lukasz Grzegorz Maciak

10. Code Snippet © 2005, Lukasz Grzegorz Maciak

11. Code Snippet © 2005, Lukasz Grzegorz Maciak

12. Image © 2005, Micheal Alexis Ponniah

13. Image © 2005, Micheal Alexis Ponniah

14. Image © 2005, Micheal Alexis Ponniah

EQUATIONS

1. Hacker, Scot, “MP3, The Definitive Guide”, 1st Edition, March 2000, O'Reilly Publishing.

-----------------------

Frame 1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download