Linda J. Ferrier-Reid, Ph.D., Chief Linguistics Officer, Phonologics, Inc.
Doctor Ferrier is also professor emeritus of the Department of Speech-Language Pathology and Audiology at Northeastern University in Boston, MA
Robert Mac Auslan, Ph.D., Vice President, Academic Operations, Phonologics, Inc.
Doctor Robert Mac Auslan is also a professor in the IALC at Washington State University.
Joel Mac Auslan, Ph.D., Chief Technology Officer, Phonologics, Inc. Joel Mac Auslan is also the founder and President of Speech Technology and Applied Research (STAR).
Rita Mac Auslan, Ph.D., Director, Customer Relations, Phonologics, Inc.
Rita Mac Auslan is also an adjunct professor at both Southern New Hampshire University’s ILE and Manchester Community College.
The purpose of this article is to describe the current version of the Automated Pronunciation Screening Test (APST), a computer-based test that analyzes digitized speech samples from Non-Native Speakers of English (NNS) who speak with a foreign accent, providing a norm-referenced intelligibility score. We also briefly describe a validity test we conducted on APST results using human judges.
APST was developed to screen large numbers of NNSs quickly and objectively, such as international teaching assistants to determine if they should be allowed to teach, or call-center employees whose intelligibility is critical. It measures the intelligibility of such speakers to American English listeners. It does not measure other aspects of accent such as vocabulary and grammar. Other tests generally used for this purpose frequently rely on human judgments of different dimensions of speech intelligibility. Such rating scales are subjective, time-consuming and often inaccurate. It is intended primarily for institutional use.
What is APST?
Technologically, APST is built upon knowledge-based speech analysis, acoustics, and the physiology of human speech production. The phrase knowledge-based refers to a system that is based on the careful study and analysis of the target; in this case, non-native speech. Two key components of such a system include a knowledge base (derived from the expertise of a human “domain expert”) and inference mechanisms (a decision or classification engine). It was developed initially with support from NIH and more recently from private investors. In its current version, APST is suitable for adult and high-school age users, and we are currently carrying out research on its applicability to junior high level students.
The test includes a recording component, the APST Recorder, that is downloaded onto the user’s PC, either Mac or Windows, and a web component, the Phonologics scoring engine, to which spoken recordings are up-loaded and scored. These scores are then returned very rapidly to the test administrator as either individual or class/group scores. APST takes from seven to fifteen minutes of speaker time, depending on how cautious the speaker is. There is also a final test of validity in the scoring engine to determine if the recording was inadequate due to recording conditions. The scoring engine provides an overall score with ranges that can be used for placement purposes and five sub-scores that have some diagnostic functionality.
Before beginning to use APST with multiple test-takers, the customer and Phonologics staff work together to install, test, and validate the APST Recorder on each computer to be used. Once this has been done, testing can begin.
The test is designed to be given in a quiet, preferably enclosed area using a headset. When the test-taker double-clicks on the APST Recorder icon, the Phonologics logo will appear followed by the APST Login screen, and then the test screen. Following test-taker information, initial testing allows speaker volume or microphone/headset setting adjustment. Then the test begins.
The words and phrases recorded while “This word or phrase is for practice only” is visible are not scored. The test-taker clicks the Record button, which then turns to Stop, and repeats the word on the screen once the blue timing bar appears. He/she can control the recording by using the Stop button to stop recording or let the time run out. For each word or phrase, the test taker can press the Go Back to Prior Word button once and rerecord. When “This word or phrase is for practice only” disappears, the scored test has begun.
When the test is finished, the Test Completion screen appears. When the test-taker clicks OK, an optional survey is displayed. The test-taker can take the survey and click Complete Test or just scroll to the end and click Complete Test. When the Congratulations notice appears, the test-taker can close the screen.
The Test Administrator will be sent an email with instructions for retrieving the test results from the online Administrator’s Console. Scores can be reviewed online or downloaded either as individual scoresheets or a combined (group) scoresheet.
Scores consist of an overall score and five subscores:
- coarticulation – how combinations of consonants and vowels are pronounced in certain contexts
- prosody – the cadence or rhythm of the speaker
- phonetics – how the sounds of English differ from those of a native speaker
- fluency – how smoothly a speaker produces phrases
- enunciation – the emphasis and clarity of syllables
An interpretation sheet for the scores is included with the scoresheets.
Who Should Use APST?
APST is valid for use with all NNSs of high-school age or over. It is in use both academically and commercially. APST is used both to determine an applicant’s qualifications for a job requiring spoken American English (such as a health-care provider or teaching assistant) or for placement in language classes. It is being used both for placement determination (does this student need spoken language training?) and to show progress.
Validity Study of the effectiveness of APST
As part of its usual research and development, Phonologics tests new versions of each product, which are constantly and iteratively undergoing modification and improvement with feedback from our users. To test the correlation of APST results with human judges, we carried out a small study asking the following question:
Will the current version of APST and the judgments of a small group of naïve Native English Listeners agree on whether preselected speakers are most intelligible, least intelligible, or in the middle of the intelligibility range?
Will APST results agree with 1) intelligibility ratings and 2) position rankings of a small group of naïve American English listeners who listen to digitized recordings of sentences produced by NNSs?
We relied upon our early research findings, showing that:
- Sentence Intelligibility (i.e., the number of words in sentences heard correctly) correlated highly with rating scales of intelligibility (nine-point anchored scale) in sentences by naïve listeners.
- Phonetic intelligibility (i.e., the percentage of phonemes correct in single words assessed by a phonetician) also correlated highly with rating scales of intelligibility, supplying additional confidence in intelligibility rating scales
- Listener-provided rankings of NNSs that can be used to sort speakers into most intelligible, least intelligible, and mid-range.
Five native speakers of American English, consisting of four untrained listeners and one trained pronunciation coach, listened to four sets of recordings in order to evaluate (using a nine-point scale) and rank (top/middle/bottom) their overall intelligibility. The recordings consisted of one Spanish male (SMJPA5) with a mid-range intelligibility score from the APST, two Chinese females, one with a very high score (CFJLO5) and the other with a very low score (CFXLO9), and one male native speaker with a very high APST score (EMSGA21). The APST ranking is thus EMSGA21, CFJLO5, SMJPA5, CFXLO9. On both measures, the evaluators all rated the recording subjects in a fashion consistent with their APST scores.
Note: To calculate Mean and Median in the graphs below, T/M/B was converted to 0/1/2.
Using both the nine-point scale and the T/M/B ranking system the research team tested for the likelihood that listeners would score subjects in a fashion consistent with the APST by chance. For both the T/M/B ranking system and the nine-point scale we were able to reject this hypothesis.
We conclude from this study that the current version of APST correlates favorably with human judges and that our test is valid and may be used with confidence. Our findings are all the more remarkable due to their exact correspondence on the nine-point scale with the speaker ratings based upon their APST scores. Similarly the T/M/B ranking system also corresponds closely to the APST scores with one notable exception wherein a highly intelligible native speaker is given an M (Middle) ranking despite the same listener being given the highest possible score on the nine-point scale. We assume that this latter may have been due to a clerical error in recording the listener’s judgments.