Some background on habitability
Habitability refers to the ease, naturalness, and effectiveness with which callers can use spoken-language systems (Watt, 1968). According to habitability theory, there are four domains in which users of spoken-language systems must stay:

  • Conceptual (stock quote systems can't respond to questions about current road traffic)
  • Functional (the system might only be able to handle simple requests or might also be able to parse multiple tokens from an utterance)
  • Syntactic (there might be many or only a few structural alternative allowed by the grammar)
  • Lexical (the system might allow many or only a few, if any, synonyms for the words in the application)

"A natural language system must be made habitable in all four domains because it will be difficult for users to learn which domain is violated with the system rejects an expression" (Ogden & Bernick, 1997, p. 139).

The elements of habitability are largely controlled by the capability of the active grammar(s).

Start with the smallest grammar that has a chance of being habitable
All other things being equal, the more habitable an application is, the less accurate the recognition accuracy because the potential for acoustic confusion has increased. Indeed, one common cause of misrecognition is acoustic confusability among currently active phrases in grammar (Fosler-Lussier et al., 2005).

Keeping in mind that callers tend to mimic what they hear in prompts (see Mimicry of Prompt), to as great an extent as possible, create choices that have high acoustic distinctiveness. Don't go overboard, though. If "A" and "B" are what everybody calls two things and there are no reasonable synonyms that could be used in place of one or the other of them, then by no means should you artificially stuff one option into an ill-fitting synonym. Once you've created the choices, restrict the grammar to those choices. Adding back in the acoustically confusable synonyms defeats the purpose of making the choices in the prompt distinct. Also keep in mind, however, that systems that strive for a more conversational tone will necessarily have more complex grammars -- but it is often possible to satisfy the desire for a conversational tone and the presentation of choices that have high acoustic distinctiveness. In addition, somewhat counterintuitvely, longer menu options such as "I need assistance" can be easier to for the recognizer to understand than shorter menu options such as "help." But, this has to be counterbalanced with the need to hear, understand and repeat back the menu option for it to be usable at all.

Build grammars that contain what callers are most likely going to say for each of the menu options
The grammar for each option needs to include more than just the verbatim wording given in the prompt. Every reasonable utterance based on the prompt needs to be included, and don't forget to look at reprompts in case there are any variations. However, you don't want to overgenerate your grammars from the start. Put it what's ''reasonable,'' not what's ''possible.'' Tuning is when you expand the list of synonyms based on actual utterances.

Consider this prompt:

  • To get you to the person who can help you the fastest, please choose one of the following... You can say website help, set up payment arrangements, special offers, billing, cancel my service, or for everything else say customer care. (1.5 second silence) You can also say main menu to stay in the self-service system.

What synonyms should be included for 'set up payment arrangements?' You would definitely want to include 'payment arrangements' by itself without the 'set up' part. Beyond that, it's probably best to wait for tuning data before adding anything else.

Consider ALL prompts at a dialog state
When determining which grammar items to include, make sure to review all prompts that may be spoken at a given dialog state. While an initial prompt may offer an option one way, the reprompts may provide alternate verbiage that should also be included in the response. For example, at a given input state the prompts may be:

Initial Prompt: Which would you like to do? "hear my balances", "make a payment", or "setup payment arrangements".
First Reprompt: Sorry? You can say "balances", "make a payment" or "payment arrangements".

The grammar for this dialog state would need to include both "hear (my) balances" and "balances", as well as "setup payment arrangements" and "payment arrangements' based on the prompted options.

Test and tune the grammar
Some phrases will work better than others. During habitability testing, usability testing, and grammar tuning, identify the phrases that have relatively high failure rates and modify them as indicated by the specific patterns of failure. These modifications can include:

  • Modifying the grammar (FSG) or language model (SLM) (see Directed Dialog vs. SLM)
  • Adding additional pronunciations for frequently misrecognized words

For example, consider the US airports "Addison" and "Madison". Suppose one of the prompts in the application is "Where are you flying from?" Also suppose callers mimic the prompt and say "Flying from Addison" or "Flying from Madison". Due to coarticulation, "from Addison" and "from Madison" will sound very much alike. You can't arbitrarily decide to not recognize an airport in an airport grammar, but there are some things you could do:

  • Consider changing the prompt, say, to something like: "What's your departure airport" to reduce the likelihood that callers would start responses with "from"
  • In the grammar, make states optional for high-traffic airports but require them for low-traffic airports known to be confusable with high-traffic airports -- design help prompting and disambiguation routines to seamlessly get the caller through the dialog step (e.g., using n-best and skip lists to eliminate already-rejected possibilities -- see Using n-best Lists, Advanced confirmation and error correction with confidence levels and n-best lists)

Pick synonyms that fix the problem
When the use of synonyms is possible, consider using a synonym in both prompting and grammar modification. For example, suppose the system is confusing "no" and "new". You might be able to replace "new" with "recent", depending on the context, dropping "new" from the grammar.

If it isn't obvious what synonym to use, there are a number of strategies for finding them:

  • Check a thesaurus (but don't rely on it without user validation)
  • Check the enterprise's website to see if it uses synonyms that would be less confusable (but don't rely on this approach without user validation)
  • As part of user testing (either habitability or usability), ask participants who experience a recognition issue what word(s) they would prefer to use
  • Check logs or recorded calls to see if user behavior provides any clues as to potential synonyms

The ultimate check on any change is whether the outcome variables of recognition accuracy and task completion rates improve.

References


Fosler-Lussier, E., Amdal, I., & Juo, H. J. (2005). A framework for predicting speech recognition errors. Speech Communication, 46, 153–170.

Ogden, W. C., & Bernick, P. (1997). Using natural language interfaces. In M. Helander, T. K. Landauer, & P. Prabhu (Eds.), Handbook of human-computer interaction (pp. 137–161). Amsterdam, Netherlands: Elsevier.

Watt, W. C. (1968). Habitability. American Documentation, 19(3), 338–351.