Structure of Speech - University of Delaware

Structure of Speech

? Physical acoustics

? Time-domain representation ? Frequency domain representation ? Sound shaping

? Speech acoustics

? Source-Filter Theory ? Speech Source characteristics ? Speech Filter characteristics

? Acoustic Phonetics

? Classification and Features ? Segmental structure ? Coarticulation ? Suprasegmentals

Physical acoustics pertains to what sound is and how it is described. Most simply, sound is the propagation of pressure variations through a medium such as air, water, train rails, etc. We cover the representation of sound in the timedomain as the time course of pressure variations, and in the frequency domain as the amplitude and phase of one or more sinusoids. We draw a distinction between sound sources (things that generate sound by vibrating), and way sound is altered by the acoustic properties of objects it interacts with such as resonating tubes. Physical acoustics provides the background to understand how speech sounds are generated and controlled by the vocal tract and articulators. The discussion of speech acoustics is primarily about speech production. Finally, we will examine the relationship between the linguistic characteristics of speech and the structure of the acoustic speech signal. This will include discussion of segments or phonemes (corresponding roughly to vowels and consonants), sub-phonemic acoustic features of speech such as voicing, manner, place that can be used to characterize phonemes, as well as properties that extend over several phonemes or even an entire utterance.

1

Physical Acoustics: Time-domain

? Simple waveforms

? Amplitude, Frequency, and Phase ? Physical versus perceptual characteristics

? Complex waveforms ? Periodic and aperiodic waveforms

Simple waveforms are sinusoidal signals that are completely described by their amplitude, frequency, and phase. Amplitude describes the degree of fluctuation in pressure, frequency is the number of pressure fluctuations per unit time, and phase is the relative alignment of the pressure fluctuations with respect to a specific instant. Perceptually, amplitude corresponds to loudness of a sound, and frequency to its pitch. Under most conditions, phase is not perceptually significant. The relationship between changes in loudness/pitch and changes in amplitude/frequency is roughly logarithmic. That is, at low amplitudes/frequencies, small changes result in large perceived changes in loudness or pitch, while at high amplitudes/frequencies much larger amplitude/frequency changes are needed to produce the same perceived change in loudness or pitch. Complex waveforms contain multiple sinusoidal (simple) waveforms. Sinusoids are periodic. Each cycle of a sinusoid is repeated exactly in the next cycle. Complex waveforms may also be periodic if they contain cyclic patterns that repeat exactly. Aperiodic waveforms are complex waveforms that do not contain cyclic patterns that repeat exactly. Many biological and other realworld signals--including the speech signal--may contain sequences of very similar patterns that are nearly but not exactly the same. Such sounds are similar to periodic sounds and are called quasiperiodic.

2

Simple Waveform

For all integer i:

( ) yi = a ? sin + 2 i f fs

a = Amplitude = Phase f = Frequency fs = Sampling Rate

NOTES: 1) By definition, simple waveforms are sinusoids. Any sinusoid can be completely specified by three parameters:

?Amplitude (a) - The extent of pressure variation. ?Frequency (f) - The rate of pressure variation in terms of the number of complete cycles of the sinusoid per second (Hertz abbreviated Hz).

?Phase ()- An offset of the function with respect to a specific time.

2) Although we are technically dealing with analog signals, this is obviously a discrete approximation to a sine function with an additional parameter, the sampling rate (fs). For frequency specified in Hz, fs is the number of equally spaced instants every second at which the function is evaluated.

3

Simple Waveforms

175 Hz 225 Hz 275 Hz 325 Hz 375 Hz 425 Hz 475 Hz

NOTES: 1) Here are several sinusoids illustrating variations both frequency, amplitude, and phase. These are orthogonal (each can be varied independently of the others), but not in these diagrams where the top right frame is both lower in amplitude and higher in frequency than the top left. 2) The three graphs are:

?Top left - 200 Hz tone. ?Top right - 600 Hz tone at lower amplitude. ?Bottom left - two 200 Hz tones of the same amplitude differing in phase 3) It is worthwhile noting that the physical features frequency and amplitude correspond to the perceptual features of pitch and loudness respectively, but the relationships between physical and perceptual features are not linear. The relationship is roughly logarithmic: small physical changes at low frequencies/amplitudes are perceptually much larger than equal physical changes at high frequencies/amplitudes. This is illustrated in the sounds linked to the buttons on the bottom right - All steps are 50 Hz, but the pitch difference between 175 and 225 Hz is greater than the pitch difference between 425 and 475 Hz. [I don't know how to make this link work in a PDF file - HTB]

4

Complex waveforms

For all integer i: For all integer k | (0 < k < K):

K

yi = ak ? sin(k + 2ifk fs ) k=0

ak = Amplitude of kth component = Phase of kth component fk = Frequency of kth component fs = Sampling Rate

NOTES: 1) Complex waveforms can be described as the summation of a series of two or more simple waveforms.

2) In the discrete but unbounded case, K = . Later, we will explore the

consequences of using finite i and K.

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download