Formant - Wikipedia
Sundberg models the vocal tract as a closed tube resonator, suggesting that the three prominent formants seen in vowel sounds correspond to the harmonics 1. TABLE I. Harmonic, resonance, and formant symbols for quantitative relations. . Comparison of vocal tract resonance characteristics using LPC and impedance. I have come to understand that both FFT (not specific to voice only) and LPC are . I admit I was not clear of the difference between harmonics and formants. .. Formants are vocal tract resonances, and it sometimes happens that there is a.
The vertical axis on these graphs roughly corresponds to the jaw position high or low or the size of the lip opening. The horizontal axis corresponds to the position of the tongue constriction. Vowel planes for two accents of English Ghonim et al. These data were gathered in a large, automated survey in which respondents from the US left and Australia right identified synthesised words of the form h[vowel]d: You can map your own accent in this way on this web site.
Vowels and some other phonemes may be sustained over time: Consequently, so are the variations with time of the associated frequency bands, as is the broad band sound associated with the opening or closing Smits et al. Like vowels, liquids r and l and nasal consonants n, m, gn are voiced and have a characteristic set of spectral peaks.
For these, the tongue provides a narrower constriction. In speech, vowels are in a sense less important than consonants: On the other hand, vowels are more important in singing, because the vowel is sustained to produce a note. The geometry of the vocal and nasal tracts determines how they filter the sound, but the acoustical properties resulting from this geometry are thought to affect the operation of the vocal folds. We talk about these complications below. Contrasting the voice with wind instruments If we neglect the influence of the articulators on the larynx, we have the Source-Filter model.
Superficially, it may seem obvious to a singer that the larynx and the articulators are independent: In contrast, an analogous argument would seem a very odd approximation to someone who plays brass instruments. The range of fundamental frequencies of the trombone lies within the range of the bore resonances. You are probably thinking that this difference — and therefore the approximation that the resonator doesn't affect the source — is most questionable for high pitches, when the fundamental of the voice enters the range of vocal tract resonances.
The Source-Filter model In the Source-Filter model Fantinteractions between sound waves in the mouth and the source of sound are neglected. Although oversimplified, this model explains many important characteristics of voice production. In the spectrum of a whispered voice, the spectrum produced by turbulent flow between the vocal folds produces very many frequencies: For normal voiced speech, however, the motion of the vocal folds is a periodic vibration that modulates the flow of air, which produces a harmonic spectrum.
More details and graphs are given below. The cartoon below uses these to illustrate the Source Filter model. A schematic of the source-filter model, from Wolfe et al The periodic spectrum corresponds to normal speech and singing. See What is a sound spectrum? The vertical lines indicate the harmonics, and the lowest of these, about Hz, is the fundamental frequency at which the vocal folds vibrate. The continuous spectrum corresponds to whispering. Vertical axes are logarithmic.
One or other signal is input to the vocal tract,which we treat as a filter whose gain shows peaks at two frequencies in the range sketched. At the mouth, high frequencies are better radiated, as indicated in the next graph.
Voice Acoustics: an introduction
The last pair of graphs sketch the spectra of the output sound. The vertical axes of all graphs are logarithmic. On another site, we give some practical examples of the source-filter modelwith sound files. We shall discuss these in the more detailed sections below. For a fundamental frequency of Hz as sketchedthe third, ninth and neighbouring harmonics are more efficiently radiated than are other harmonics.
Peaks in the radiation of the whispered sound occur at similar frequencies. In speech, these high power frequency bands — these broad peaks in the spectral envelope — are very important. The frequencies at which they occur are close to but not exactly equal to those of the peaks in the gain function of the tract.
There is a good reason why the various spectra in the preceding figure are sketched: So, in the illustrations below, where we illustrate the Source-Filter model experimentally, we have to resort to indirect measurements.
Here, we do that using an electroglottograph EGG: The magnitude of the current that flows varies as the folds come into contact and separate.
What are formants?
The spectra and sound files at the top of the figure are an EGG signal. Below that, we show the results of measurements of the resonances of the vocal tract, made at the mouth, during speech. This gives a quasi-continuous line whose peaks identify the resonances. It also shows the harmonics of the voice. We discuss this technique here. Below that are the spectra measured for that particular vowel, in the same gesture. Here we contrast two vowels: The top graphs and sound files are for experimental measurements of the vocal fold contact.
Note that this measurement of the source shows little difference between the two vowels: The next pair of graphs are measurements of the vocal tract, made from the mouth, during the vowel. More on this technique here.
The broad peaks identify resonances of the vocal tract, the sharp lines are the harmonics. Here, because the tract is in a different configuration for the two vowels, the resonances occur at different frequencies. The next two rows show the voice output for voiced speech and for whispering, measured in the same vocal gesture.
More detail on these examples here. Some difficulties Before we leave this brief overview, it is worth noting that there is still much about the voice that is still incompletely understood. One of the reasons for this is the difficulties of doing experiments.
Some of the data that we should like to know — the gain function of the vocal tract sketched above, the mass and force distribution in the vocal folds, for instance — are impossible to measure while the voice is operating, not only ethically but practically. For most human physiology, much information has been obtained from other species, whose organs function in similar ways. When it comes to the voice, however, there is no such similar species — no-one is very interested in the voice of the lab rat.
Much of our knowledge comes from experiments using just the sound of the voice as experimental input. Other knowledge comes from medical imaging. Another approach is to use a mathematical model: The next step is to solve the equations for this simple system and to predict the sound it would make, and to see how this correlates with sounds of speech or singing. Another is to make artificial systems with the shape of the vocal tract and some sort of aero-mechanical oscillator at the position of the glottis.
Yet other knowledge comes from other experiments and observations that are often, for practical and ethical reasons, somewhat indirect. Because of the importance of the human voice, these are all active research areas. We now look more closely at some of the topics introduced above.
Other reviews are given by, for example, Lieberman and Blumenstein, ; Stevens, ; Hardcastle and Laver, ; Johnson, ; Clark et al. References are given below. The source at the larynx To speak or to sing, we usually expel air from the lungs.
The air passes between the vocal folds, which are muscular tissues in the larynx. If we get the air pressure and the tension and position of the vocal folds just right, the folds vibrate at acoustic frequencies. This means we have an oscillating valve, letting puffs of air flow into the vocal tract at some frequency f0. These sketches illustrate the larynx, viewed from above, in position for phonation and for breathing.
Technically, we move the arytenoid cartillages closer than their separated breathing position, which brings the vocal folds closer to each other, called adduction Scherer This reduced aperture between the folds is called the glottis. Compared to the breathing position, the narrow glottis restricts the flow of air, which in turn means that the steady pressure drop across the larynx is greater when the aperture is small. The higher pressure drop means that the speed of air through the glottis is high, but the small cross section means that the volume flow in litres per second is less.
Which breath lasts longest i. The schematic at right sketches the vocal folds in cross section. In athe pressure acting below the vocal folds tends to force them upwards and apart. Ths pressure difference is also responsible for accelerating air through the glottis to produce the high-speed air flow: The rapid air flow through the glottis creates a suction black arrows.
These alternating effects tend to excite a cycle of closing and opening of the folds, which is assisted by the inherent springiness or elasticity of the folds which provide a restoring forcetheir mass which provides inertia and the inertia of the air flow itself, which maintains high flow rates even as the folds are closing. Muscles do not directly vibrate the vocal folds, which is a passive effect described above Van den Berg, However, muscles contribute to its control, by determining how much the folds are pushed together and how much they are stretched.
If you get these parameters right, and hold them steady, you can produce a note with a fixed pitch, which means that the folds are vibrating in a regular, periodic way. In normal speech, the pitch varies during each syllable, usually in a smooth way. The speed of sound c is about m. So here is an important point: Different registers and vocal mechanisms How to cover a wide range of pitch? On a violin or guitar, one can change the length of a string, but to cover a large range, one can also cross to a new string.
In trumpet, trombones, clarinets, flutes etc, one can change the length of a pipe with valves, a slide or keys but one can also change registers, which means changing the mode of vibration in the pipe. In the voice, we can change the muscle tension and the pressure to vary the pitch. However, to cover a range of a few octaves, we usually need to use different registers Garcia, The distinctions among registers in singing are not always clear, however, because changing registers corresponds to both laryngeal and vocal tract adjustments Miller, The vocal folds can vibrate in at least four different ways, called mechanisms Roubeau et al.
Here the tension of the folds is so low that the vibration is not periodic meaning that successive vibrations have substantially different lengths. M0 sounds low but has no clear pitch Hollien and Michel, This is used to produce low and medium pitches. In M1, virtually all of the mass and length of the vocal folds vibrates Behnke, and frequency is regulated by muscular tension Hirano et al.
The glottis opens for a relatively short fraction of a vibration period Henrich et al.
It is used to produce medium and high pitches for women, and high frequencies for men. In M2, a reduced fraction of the vocal fold mass vibrates. The moving section involves about two thirds of their length, but less of the breadth. The glottis is open for a longer fraction of the vibration period Henrich et al.
Little has been published on this: Although some people use M0 in speech, especially at the end of sentences, and coloratura sopranos are said to use M3 in their highest range, speech and singing usually use M1 and M2.
Consequently, with their lower overall range, men typically use M1 for nearly all speech and most singing. However, in some styles of pop music and some operatic styles, men use M2 extensively: Plosives and, to some degree, fricatives modify the placement of formants in the surrounding vowels.
The time course of these changes in vowel formant frequencies are referred to as 'formant transitions'. If the fundamental frequency of the underlying vibration is higher than a resonance frequency of the system, then the formant usually imparted by that resonance will be mostly lost.
This is most apparent in the example of soprano opera singers, who sing high enough that their vowels become very hard to distinguish. Control of resonances is an essential component of the vocal technique known as overtone singingin which the performer sings a low fundamental tone, and creates sharp resonances to select upper harmonicsgiving the impression of several tones being sung at once.
Spectrograms are used to visualise formants. The next formant occurs just above these, between 1 and 2 Khz. Then the next is just above that, between 2 and 3kHz. When you look at a spectrogram, like this example, you will see formants everywhere, in both vowels and consonants. To understand why, you must recall the source-filter theory of speech production. The vocal tract filters a source sound e. Formants occur, and are seen on spectrograms, around frequencies that correspond to the resonances of the vocal tract.
But there is a difference between oral vowels on the one hand, and consonants and nasal vowels on the other. For consonants, there are also antiresonances in the vocal tract at one or more frequencies due to oral constrictions. Consequently, they attenuate or eliminate formants at or near these frequencies, so that they appear weakened or are missing altogether when you look at spectrograms. That is why, for example, it is difficult to see formants below Hz for the two instances of [s] in the spectrogram above.
Furthermore, nasal consonants and nasal vowels can exhibit additional formants, nasal formants, arising from resonance within the nasal branch. Consequently, nasal vowels may show one or more additional formants due to nasal resonance, while one or more oral formants may be weakened or missing due to nasal antiresonance.