Robert Mac Auslan, Ph.D. Vice President, Operations, Phonologics, Inc.
Joel Mac Auslan, Ph.D. Chief Technology Officer, Phonologics, Inc.
Linda J. Ferrier-Reid, Ph.D. Chief Linguistics Officer, Phonologics, Inc.
English is the lingua franca of business, medicine, government, technology, and many other industries; and the need to speak intelligibly is increasing across the world with the expansion of global business. Therefore, the poor intelligibility of non-native speakers is a problem in both the academic world and in the workforce.
In order to train for better intelligibility, we need to be able to quickly and accurately judge that intelligibility. Human judges of intelligibility need extensive training and their judgments are often biased and inconsistent. Technology is stepping in here to provide quicker and more objective ratings of speaker intelligibility. This article introduces a variety of such technologies available today and the areas in which they are particularly critical.
Reduced Intelligibility Can Lead to Fatal Miscommunications
Miscommunication can occur in any human interaction, as medical institutions know to their cost. Anecdotes of such miscommunications are very common, particularly in the airline industry, where the results can be fatal.
Communication in the air is generally carried out in English. Indeed, nothing underscores the subtle complexities of speech communication more strikingly than the miscommunications that occur among pilots, crewmembers, and air traffic controllers. When different words or phrases sound exactly or nearly alike, it can problematic. Confusion is possible, for example, because “left” can sound very much like “west”.
According to a Federal Air Surgeon’s Medical Bulletin, entitled, Thee…Uhhmm…Ah… , ATC-Pilot Communications, “When you produce these hesitations while speaking, you are using … ‘place holders,’ or ‘filled pauses’, a type of speech dysfluency especially common in pilot-controller exchanges”, says Mike Wayda. Until recently, such speech dysfluencies and other mistakes were not considered to be important; however, new research suggests that there is a correlation between miscommunications and mistakes.
What is Intelligibility?
How do we define intelligibility and how is it measured? Intelligibility refers to the ability of a listener to recognize and understand a word, phrase, or sentence of a non-impaired speaker. Intelligibility is influenced by the social and linguistic context of the speech. If the listener is familiar with the topic under discussion, intelligibility will be higher. In addition, intelligibility is higher if the speaker is in a noise-free background. Finally, intelligibility varies according to how familiar the listener is with the speech pattern of the speaker. (A well-known phenomenon is the miraculous improvement in intelligibility of a non-native speaker over time in the view of his/her teacher, when objective testing shows no real improvement!)
Intelligibility is often measured by the number of phonemes that can be accurately transcribed from listening to recorded speech. It is also often also rated on Likert scales, where the listener selects from options ranging from, for example, “totally unintelligible” to “completely intelligible.”
What is a “Foreign Accent”?
We are interested in foreign accent to the extent that it reduces intelligibility. (We concentrate only on pronunciation and ignore vocabulary and grammar.) Non-native speakers are often unintelligible because the speech patterns of their first language interfere with their pronunciation of American English. Indian speakers, for example, often substitute /v/ for /w/. Some languages, such as Mandarin Chinese, do not allow obstruents (sounds created by restricting air-flow through the oral cavity) at the end of a word or syllable so the final consonant is omitted – in the word “rice” the final /s/ sound is left off. In some languages the /t/ sound is produced more like a /d/. This can lead to meaning confusions such as English listeners hearing “die” instead of “tie”!
Prosodic effects are also important. Prosody covers a number of systems that affect intelligibility, including intonation, and sentence stress or accent, determined in English mostly by the speaker’s focus and whether this is the first mention of an item to the conversation. Unfortunately, there are few simple rules to guide the learner of English; word stress patterns must generally be learned on a word-by-word basis. In addition, speakers of tone languages, such as Chinese and Korean, have difficulty carrying an uninterrupted pitch contour over an utterance and assigning correct sentence stress to the most important word/s in a sentence. To the ears of native speakers, their productions sound “jerky”.
How Did Speech Assessment Evolve?
Human-Scored Testing
Initially, all speech testing relied on the judgments of a human listener, who is, of course, prone to fatigue, bias, and unreliability. This is probably still the most common way to evaluate speaking effectiveness and intelligibility. Speakers are evaluated in reading, responding to prompts, or in free conversation.
The SPEAK Test (www.toeflgoanywhere.org)
The Speaking Proficiency English Assessment Kit (SPEAK) is an oral test developed by the Educational Testing Service (ETS) and perhaps epitomizes the traditional way of evaluating speech. Its aim is to evaluate the examinee’s proficiency in spoken English. ETS developed the four skills (listening, reading, speaking, and writing) TOEFL iBT test. The Speaking portion of the test is scored by human listeners and, according ETS, has undergone extensive statistical and reliability analysis. The Speaking section of the TOEFL is not available separately from the other sections, but institutions wishing to test speaking skills only may choose to use the TOEIC (Test of English for International Communication) Speaking Test, also developed by ETS, and available as a stand-alone assessment.
Acoustic Analysis of Speech
Since acoustic analysis methods became readily available in the 1960s, there has been a steady stream of research documenting particular features of standard American English speech in single words and sentences and, more recently, of non-native speech, allowing comparison of the two. These studies have allowed the computer analysis of speech in programs such as the Versant Testing System, Carnegie Speech Assessment, and the Automated Pronunciation Screening Test (APST). These use large-scale statistical studies on native and non-native speech as the basis for assessments. Because of the difficulty of training listeners to achieve reasonable reliability with each other, and the time it takes to score spoken tests, computer-based testing offers the hope of more rapid and reliable intelligibility assessment. The three tests noted above that use computer analysis, are further described below.
The Versant Testing System (www.versant.com)
Versant Technology originally developed a telephone-based test in which the speaker repeated items or responded to prompts. This first test primarily evaluated speaker fluency. More recently, Versant has developed a system presented on a computer, described on the website:
“The Versant testing system, based on the patented Ordinate® technology, uses a speech processing system that is specifically designed to analyze speech from native and non-native speakers of the language tested. In addition to recognizing words, the system also locates and evaluates relevant segments, syllables, and phrases in speech. The Versant testing system then uses statistical modeling techniques to assess the spoken performance.”
“Base measures are then derived from the linguistic units (segments, syllables, words), based on statistical models built from the performance of native and non-native speakers. The base measures are combined into four diagnostic sub-scores using advanced statistical modeling techniques. Two of the diagnostic sub-scores are based on the content of what is spoken, and two are based on the manner in which the responses are spoken. An Overall Score is calculated as a weighted combination of the diagnostic sub-scores.”
Carnegie Speech Assessment (www.carnegiespeech.com)
This system “uses speech recognition and pinpointing technology under license from Carnegie Mellon University to assess an individual’s speech. By pinpointing exactly what was correct and incorrect in the speaker’s pronunciation, grammar and fluency, accurate and objective English assessments can be made. Specific features, as described on the website, include:
- Rapid assessment of spoken English by analyzing each student’s speech against a statistical composite voice model of native speakers.
- Self-directed tutorials that reduce administrative requirements and costs.
- Tunable grading scale customizes results to each organization’s operational or educational requirements.
- Immediately available and objective reports that can be compared across multiple applicants as well as across the business and educational enterprises.
- Detailed reports on individual users allow information on each applicant’s language proficiency to flow from hiring to training departments, eliminating redundant assessments.”
The Phonologics Automated Pronunciation Test (APST) (www.phonologics.com)
APST uses knowledge-based speech analysis and is based on the careful study and acoustic analysis of the target — speech. It is designed to test large groups of non-native speakers quickly, accurately, and objectively. Speakers first practice recording items and then read words and sentences, which are recorded into the computer. These recordings are sent to Phonologics via the web, where they are automatically scored and a report is made available to the test administrator within minutes. The test provides sub-scores on particular aspects of speech and a summary score that indicates the intelligibility of the speaker to American English listeners.
The initial human-scored version of APST was developed to screen the large numbers of non-native speakers at Northeastern University in Boston, MA. The program provided a summary and sub-scores and was used with standard TOEFL scores to determine whether international teaching assistants should be allowed into the lab or classroom or first receive intelligibility training. This first version showed the need for a more objective and quickly scored version of the test. A second automated prototype was developed with funding from NIH. Further development of APST has been under the auspices of Speech Technology and Applied Research Corp.
How Well Do Automated Intelligibility Tests Correspond with Human Judgments?
It is important to test how well automated tests correspond with the judgments of human listeners. To check this, the authors first got intelligibility rankings using APST of three non-native speakers and one native speaker. Then they took recordings used for the APST analysis and asked five native English listeners to judge their speech. The judges were asked to do two things: rank speakers on a nine-point intelligibility scale and place them for intelligibility in the top, middle, or bottom positions. On both measures, the human evaluators all rated the speakers consistently with their APST scores. (A full version of this study is available on the Phonologics website.) So the study showed that APST agrees favorably with human judges and that the test does what it says it does and may be used with confidence
These new technologies offer the prospect of accurate results that agree with the judgments of human listeners, but without the labor and time commitments, and with the promise of more objective results. This allows us to place speakers in classes or positions more quickly and accurately, and without the bias that unfortunately can often creep into the human-scored process.