Pages

Thursday, August 4, 2016

Measuring vowels...

So, it's been over a year since I blogged. My apologies! I did finish my dissertation, graduate, and secure a position as a post-doc. Blogging sorta dropped down the ole priority list. Anyway, I'll try to post more regularly in the future.

Now that I'm Dr. Mountain Man, I have had the pleasure of telling lots of people about my work recently. One thing I did notice is when I described what I did for my dissertation, often people didn't really know what was talking about. So, I figured a blog post could help!

This particular post was inspired by what I did a lot of for my dissertation: measuring vowels. This is a somewhat time-consuming task, necessary but not glamorous (at times it can get boring). However, I have noticed that non-linguists have literally no idea what I mean when I say that I measure vowels. When people asked me what all I did for my dissertation, one of the things I would tell them was 'measuring vowels'. The blank stares were sort of disconcerting. I realized that lots of people don't really know what a vowel actually is, so I decided to remedy that situation!

Caveat: this will be a very nerdy, wonkish post. If you'd like to avoid the nuts and bolts, jump down to the last paragraph.

Most people tend to think of vowels as 'A, E, I, O, U, and sometimes Y'. This is something that I was taught in elementary school. But, this little mnemonic is really talking about letters. When we linguists refer to vowels, we are referring to the sounds themselves. A vowel is actually kind of hard to define. The typical definition mentions a voiced sound with a relatively open vocal tract. Voiced mean's the vocal folds are vibrating. Open vocal tract means the air can flow with no obstruction, i.e., your tongue isn't really close to anything. So, there are many more sounds that fit this description that the 6 letters mentioned earlier. In fact, depending on the variety, there are about 16 vowels in English. You can hear the vowels in the following words: beet, bit, bait, bet, bat, but, bot, bought, boat, good, boot, but, bite, bout, void, cute (these can vary by region/nation/social factors, etc.).

Since the vocal tract is relatively open and the vocal folds are vibrating, some really cool acoustic things are happening. Your vocal tract is basically a long tube that is closed at one end (your larynx) and open at the other (your mouth and nose). It is a certain length and thickness, and has two main sections: the part behind your tongue (your pharynx) and your oral cavity (I am going to discuss nasality and nasal vowels in another post). The length, thickness, and two part nature of the tube means that it has certain resonant properties (certain frequencies will be enhanced). The buzzing of your vocal folds produces a spectrum of harmonics (basically, it vibrates at a certain frequency and the whole number multiples of that frequency). And, since your tongue can be in different positions, the relative sizes of the cavities (your pharynx and oral cavities) can change. This produces a filter on the harmonic spectrum from the vocal folds. This filter means that certain areas of the spectrum are 'enhanced' and others are not. These enhanced areas are called formants. So, you take the harmonic spectrum that is the result of the buzzing from your vocal folds (the source) and the resonance of the tube and add in the shape and relative sizes of the cavities (the filter), this produces formants, which we can then measure. A picture (from here) is probably worth a couple hundred words:



When we measure vowels, we are measuring formants, which are the result of the relationship between the cavities and their impact on the spectrum. As the tongue moves, the relationship and relative sizes and shapes of the two cavities change, and this produces the different vowel sounds. For example, for the vowel in the word beet, the tongue is high in the oral cavity. This means the pharyngeal cavity (your throat) is relatively large and the oral cavity is relatively smaller. Large things vibrate slower than small things, so the formant associated with the pharyngeal cavity (F1) is relatively low, and the formant associated with the oral cavity (F2) will be relatively high, because it is smaller (both of these are still dependent on the harmonic spectrum of the source).

Now, there is one other aspect that needs to be mentioned. Males tend to have bigger heads and longer vocal tracts that females, and as a result, their frequencies are affected. A longer tube vibrates more slowly, and we hear that as a lower pitch (this is modulated by many, many things, but at its core, males tend to have 'deeper' voices and lower pitch). This affects all the frequencies and their resulting formants. So, when we compare males and females, we have to adjust for these differences. This is called 'vowel normalization'. There are literally books on the subject, so I can't really go into the details here, but think of it as a way to adjust for the difference in vocal tract length while maintaining some of the differences related to pronunciation.

Now, that is probably way more information than is necessary, but it points towards what I did for part of my dissertation and other research. I record a person, and then I transcribe what we said. I then locate all the vowels in the speech stream, and then I measure the first and second formants (F1 and F2), normalize the data. I then run statistical tests to compare the results, based on all sorts of linguistic and social factors. So, now you know what we linguists mean when we say we 'measure vowels'. We are actually measuring the frequencies of formants, which are the result of a filter (the resonance of the vocal tract and the articulation - where and how their tongue and jaw are positioned) placed on the source (the vibration of the vocal folds). Simple, right?