For any message or interaction that precedes getting the caller to the first task-based interaction, it is important to employ a method that is as efficient as possible, and this certainly applies when IVRs serve a multilingual caller base -- a situation that is becoming more and more common (Ahlén et al., 2004; Klie, 2010). The most common second language in the US is Spanish; in Canada, French. In some other countries, such as India, there can be a need to support more than two languages.

To accomplish the goal of providing the shortest possible path to the initial prompt, a good solution is to provide a separate phone number (DNIS) for each language (Balentine, 2007).

Go straight from the language prompt to the next menu, incorporating its recognition into that menu
If enterprises prefer to have a single phone number, then the most efficient approach developed to date is to provide a key for switching from the primary to the secondary language and to prompt for that key just before playing the initial prompt in the primary language rather than creating a separate dialog step for language selection. For example, suppose the initial prompt is a main menu with three task-based options. The beginning of the call, including language selection, could be something like the following.
  • Welcome to AutoRez car rentals. Para español, oprima nueve. To get help with a reservation, say make, change, or cancel.

Note that unlike earlier approaches to language selection, with this approach the only delay primary language speakers encounter is the time it takes to prompt for the second language. After that prompt plays, primary language speakers are immediately into the initial prompt without needing to take any explicit action. Second-language callers also benefit because they have the length of the entire initial prompt to press the key (in this example, 9) that will switch languages.

For earlier language-selection strategies in which language selection was its own dialog step, there was always a risk that a second-language caller would press the language-switching key just as the dialog timed out, leaving the caller in the primary language and with the additional burden of having just made a selection from the initial prompt. With this strategy (Balentine, 2007), second-language callers have all the time available in the initial prompt (including timeouts and associated help messages) to press the language-switching key -- no more risk associated with a 2-3 second timeout for language selection.

Note that those timeouts associated with having language selection in its own dialog step also penalized primary-language callers who preferred to not press a key to get to the initial prompt in the primary language. Having to wait for a timeout often annoys and aggravates speakers of the primary language.

Sometimes the initial prompt of an application is for a numeric string (e.g., user ID or account number) rather than a menu. In that case, use the same overall strategy but assign language selection to the * key.

"Using a touchtone key rather than a speech prompt for language selection avoids a number of issues associated with multilingual language selection via speech, is very concise, and is extendible to more than one alternate language, within reason" (Lewis, 2011, p. 204).

If for whatever reason you must have a separate dialog state for the language selection, be wary of asking the caller to say the name of the language
Prompts like this exist in many systems.
  • Para español, diga español.

In addition to all the reasons discussed above why this approach is generally a bad idea, it has the added disadvantage that understanding the utterance can be a challenge as well. In many cases the IVR will only be offered in the primary language. If they select the secondary, they get transferred. Sometimes the secondary will get a DTMF version. In both these situations, the only language loaded in the recognizer is the primary, so the utterance of the secondary language name has to be done with the available phonemes of the primary language. Sometimes this works out, but often times it makes the recognition task difficult.

References

Ahlén, S., Kaiser, L., & Olvera, E. (2004). Are you listening to your Spanish speakers? Speech Technology, 9(4), 10-15.

Balentine, B. (2007). It’s better to be a good machine than a bad person. Annapolis, MD: ICMI Press.

Klie, L. (2010). When in Rome. Speech Technology, 15(3), 20-24.

Lewis, J. R. (2011). Practical speech user interface design. Boca Raton, FL: CRC Press, Taylor & Francis Group.