Be relevant

Relevance is the key characteristic of one of the four Gricean maxims -- relation. Grice noted that there is an assumption of a principle of cooperation among the participants in a conversation—in Grice’s words (1975, p. 45), “Make your conversational contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.” Research by Polkosky (2008) found that of four key elements of VUI usability, the most important was User Goal Orientation – the extent to which the IVR kept moving the caller toward completion of the desired goal.

Limit the number of words and sentences
Try to use exactly the length necessary. This is easy to say, but how do you know when enough is enough, or too much? Some specific ways to achieve conciseness are:
  • Avoid lengthy lead-in phrases for lists of options – for example, instead of prompting with “Please choose one of the following options,” just use “Please select”.
  • Try phrasing a prompt as a question rather than a statement to make the prompt more concise and more conversational; for example, “Transfer how much?” instead of “Please say the amount you would like to transfer.”
  • If removing a word will not change the meaning of a prompt or message, then consider removing it (while keeping in mind that a certain amount of structural variation in a group of prompts increases the naturalness of the dialog).
  • In general, if there is a choice between long and short words that mean essentially the same thing, choose the short word. In most cases, the short word will be more common than the long word, and callers will hear and process it more quickly. For example, “use” is often a better choice than “utilize,” and “help” is often better than “assistance.” This is pretty straightforward in those portions of the IVR where you are providing explanation to a caller. However, there will be times as a designer that you'll have to choose the word or phrase that the IVR will prompt the caller to say. In those situations, it's often better to use words or phrases that are at least 2-3 syllables long. Phonetic recognition engines have a little bit more speech to work with, and we see fewer recognition errors than we do with words and phrases supplied by a caller that are only a syllable long. In this case, "assistance" would be better than "help".
  • Good prompts do not necessarily use good grammar. In normal conversation, many natural phrases do not abide by the rules of grammar, and are often far shorter than their grammatical equivalents. Remember, people speak differently than they write.
  • Use pronouns, contractions, and ellipsis to avoid excessive repetition and reduce length. Ellipsis is the omission of a word or phrase necessary for a complete syntactical construction but not necessary for understanding, for example, “Transfer funds from your checking account, savings, or money market?” rather than “Transfer funds from your checking account, your savings account, or your money market account?”

At the end of the day, it's a balancing act between clarity and brevity. Craft messages that “thread the needle” between the goals of conciseness and appropriate, conversational service provider tone (Polkosky, 2008). This is not easy. It takes time and multiple drafts and reviews. In a famous quotation, Pascal apologizes for the length of a letter because he didn’t have the time to make it shorter. VUI designers must make the time.

Don't play unnecessary non-speech audio
Another potential source of unnecessary audio is in non-speech audio, especially in branding tones; for example, Sprint’s sound of a pin dropping. When used in radio or television spots, the timing is such that the branding audio will fit. When re-used in an IVR, there can be elements of the branding audio that make it last longer than it sounds – especially if there is an echo or fading effect.

When designing audio formatting, keep the tones short: tones can be as short as 50−75 ms, and should typically be no longer than 500−1000 ms (Balentine & Morgan, 2001). Shorter tones are generally less obtrusive, so callers are more likely to perceive them as useful rather than distracting. If re-using branding tones, analyze them for trimming opportunities to ensure that they do not waste any time, but be sure to work with the enterprise's marketing department to get their buy-in on any edits – enterprises take their branding very seriously.

The key here is "unnecessary." Not all non-speech audio qualifies. Specifically, see the section below on waiting.

Use discourse markers judiciously
The pragmatic purpose of discourse markers in conversation is to indicate a relation between the most recently spoken utterance and the next one (Cohen et al., 2004; Gleason & Ratner, 1993).

Some common discourse marker types and examples are:
  • Enumerative: first, second, next, then, finally
  • Reinforcing: also, furthermore, in addition, what’s more
  • Equative: equally, likewise, similarly
  • Transitional: by the way, incidentally, now
  • Summative: then, in conclusion, to sum up
  • Apposition: namely, in other words, for example
  • Result: consequently, so, therefore, as a result
  • Inferential: else, otherwise, then, in other words, in that case
  • Reformulatory: better, in other words, rather
  • Replacive: alternatively, rather, on the other hand, otherwise
  • Contrastive: instead, by comparison, on the other hand
  • Concessive: anyway, besides, however, nevertheless, still, after all
  • Temporal: meantime, meanwhile
  • Attitudinal: actually, strictly speaking, technically
  • Acknowledgment: OK, alright, thank you, thanks
  • Signal problem: oh, hmm, sorry

Properly used, discourse markers can provide concise guidance to callers regarding the direction of the conversation. Probably the most common types used in VUIs are enumerative (when providing a list of instructions), transitional (when changing topics -- e.g., "By the way, did you know you're eligible for a free upgrade?"), acknowledgement, and signal problem. Think of discourse markers as a kind of spice to use when crafting conversations -- just the right amount at just the right time. Also, keep in mind that some common discourse markers (e.g., “OK”) can have very different meanings depending on the intonation, so be sure to specify the intonation when necessary (Lewis, 2011).

Be cognizant of overuse of the same ones, especially the acknowledgments. Change them up some to keep the system from sounding repetitive and robotic.

Apologize very little if at all
Avoid excessive apologies in no match error reprompting. If a user says something out of grammar in response to a system prompt, a system response like "I'm sorry, I didn't get that" sounds disingenuous and has a chance to further frustrate the user. However, the use of a short, "Sorry?" immediately followed by the reprompt can be effective.

In the case of no input error reprompting, apologizing is not warranted. A verbatim repetition of the initial prompt is usually sufficient. Users often don't respond to a system prompt because they are distracted, and simply hearing a prompt again is often enough to get them back on track.

Consider apologies after a complete failure at accomplishing a task, such as "I'm sorry I'm not getting it; let me transfer you to an agent." This acknowledges that we're giving up and sending them on, that we have failed the task at hand.

Provide feedback when users must wait
It is important to inform callers any time there is the potential for a delay (large grammar loads, web service delays, etc.). The worst thing for the caller is to experience dead air and wonder if the system is still there. Sometimes, a simple "One moment" does the trick. It signifies to the caller that the IVR needs a minute to retrieve some information (much like an agent would say if she were looking up account information).

Another approach is to play audio during a delay so the caller knows the system hasn't disconnected. This can be used on its own or with verbal cues like "one moment." Speech applications can be designed such that any time a certain amount of delay has occurred, some audio plays. A popular form of audio to use is the percolator sound. It makes a "thinking" sound that lets the caller know the system is active even though no prompts are being played.

In troubleshooting IVRs, playing music is a nice way to let the caller know the system is in "wait" mode. For example, if the system instructs the caller to locate the modem and unplug the power source for 10 seconds, it may say something like "Just say 'I'm ready' when you've completed this step" and during the wait time, the system will play hold music, letting the caller know that just because the system isn't playing a prompt, it's still there listening whenever the caller is ready to speak.

Additionally, it is very helpful to inform the caller of the expected wait time before an agent becomes available if they have requested to speak to one. If the wait time is longer than 5 minutes, callers may prefer to return to automation and attempt to complete their task within the IVR.


When to Landmark

It's important to landmark significant chunks of functionality
Probably the most commonly landmarked is the main menu. In some instances, landmarking with "Main Menu" may help the caller remember which menu the IVR refers to as the "main" menu, so later on in the call, if the caller hears a main menu option, she can remember where that will take her. It is also effective to overlay an audio icon with verbiage, like "Main Menu." Sometimes it's easier for a caller to associate certain points in the IVR with a sound instead of trying to remembering what the system actually said.

Other examples of places to landmark are things considered to be common or important tasks within the system, like "make a payment" or "change of address." At the beginning of such tasks, it's helpful to the caller to remind her what she's starting. This being said, if the caller utters "make a payment" at a main menu and the IVR confirms "You'd like to make a payment. Is that correct?" it doesn't always make sense to turn around and landmark with "Make a Payment." It becomes redundant and wastes the caller's time. On the contrary, if the utterance "make a payment" doesn't trigger a confirmation prompt, it makes absolute sense to landmark with "Make a payment." This gives the caller confidence that the IVR understood her properly and isn't about to take her somewhere she doesn't want to go.

As a word of caution, it is possible to "over-landmark." Imagine driving to visit a friend at her new home in Tennessee for the first time. Looking for landmarks is an effective way to ensure one is on the right path and following the directions correctly. However, if the driver is given too many landmarks, it becomes confusing and is counter-productive. Such is true with the IVR.

Landmarks vs. Implicit Confirmations

Landmarks and implicit confirmations are very similar. A landmark is a form of implicit confirmation, but the opposite isn't necessarily true. If a caller selects "customer service" and the IVR implicitly confirms, "Okay, customer service," this wouldn't be considered a landmark because transferring the caller to an agent will end the interaction with the IVR, so there's no reason for the caller to remember the customer service landmark. In this case, the IVR is simply informing the caller that it understood what she said.

Additional Landmarking Tips

It almost goes without saying that the landmark should be a term both familiar to callers and consistent with the rest of the system prompting. In banking applications, the term "transfer funds" is used quite often. If the main menu offers an option called "transfer funds" and the caller says "transfer funds," then a landmark prompt that says "move money" will likely confuse the caller. Even though "move money" and "transfer funds" mean the same thing, the caller isn't expecting to hear "move money" and will more than likely panic, thinking she is being sent down the wrong path.

In an application employing an SLM, it may be necessary to gather data for a certain period of time to see which utterances are the most popular. This is a great way to decide which utterances to use as landmarks. Using terms that the business owners believe are commonly understood is not always the best approach. It's better to let the data speak for itself.

Helping the caller along

Help callers when they don't have the information they're prompted for
Some strategies for helping callers through an information gathering task include:
  • Preparing the caller with a prompt at the beginning of the interaction as to what information they will need to provide.
  • Designing wait logic so the caller can fetch the requested information while the system waits for a key word or phrase to proceed.
  • Providing help if the information may be something the caller has access to but can't find, like a serial number on a product or a case number in a reference letter.
  • Providing “I don’t know” logic that can take the caller to a help dialog (see previous bullet) or offer the caller access to an agent.


Balentine, B., & Morgan, D. P. (2001). How to build a speech recognition application: A style guide for telephony dialogues, 2nd edition. San Ramon, CA: EIG Press.

Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.

Gleason, J. B., & Ratner, N. B. (1993). Psycholinguistics. Fort Worth, TX: Harcourt Brace Jovanovich.

Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics, Volume 3: Speech acts (pp. 41–58). New York, NY: Academic Press.

Lewis, J. R. (2011). Practical speech user interface design. Boca Raton, FL: CRC Press, Taylor & Francis Group.

Polkosky, M. D. (2008). Machines as mediators: The challenge of technology for interpersonal communication theory and research. In E. Konjin (Ed.), Mediated interpersonal communication (pp. 34–57). New York, NY: Routledge.