There is another section on prosody in the chapter on recordings. The execution has to match the intent for it to work. Here we focus on the intent, and that section has more execution details.

Format information for “easy hearing”
When conveying information to the caller, think about how to present it to make it most likely to be taken in and comprehended. This is important for a single sentence, but even more so as playback length grows beyond a single sentence.

Pauses are important. They provide processing time for the caller and also provide auditory cues as to how things are grouped. Use them when you you are providing a listing of actions to take (as in a menu) or when listing results (e.g. store locations). Pauses may also be effective after you’ve provided a chunk of possible actions to take (i.e., a menu) but before inviting the user to take another possible action (e.g., repeat the menu, start over, go back, etc).

Be careful not to make pauses too long in duration (personal experience has been no longer than about 3 seconds). Longer duration pauses tend to be perceived as some sort of technical glitch and cause confusion on the part of the user.

Consider the rate of speech inherent in the prompt recordings. Will your user population include persons who may prefer a slower rate (e.g., older populations)? Do your prompts provide a wealth of information elements? If so, slow it down. You may also choose to design the system, or parts of it, such that users may also pause, resume, and even elect to skip prompts.

For capturing long or complex information, consider adding turn-taking. Having the caller respond gives them some control over the process. Questions like, “Does that make sense so far?” or “Do you need more detail?” help eliminate the designer’s debate about how much information is too much for the caller.

Use the longest chunks possible
Concatenating prompts together for playback will always be necessary, but minimize it wherever possible. In the early days of IVR, space was at a premium and messages were reused as much as possible, breaking them down to their smallest pieces and reassembling on the fly. That problem no longer exists.

Look at what your system uses the most, and record prompts in their entirety whenever possible. As an example, say you have a prompt along these lines:
  • I have a hotel reservation for 3 nights.

In the old days that would break into three parts: “I have a hotel reservation for,” the number, and “nights.” And if someone was going above and beyond, they’d record the singular “night” as well. Even further beyond, they’d realize “one night” could be recorded together.

Look at the stats for your company and see what’s really common. Perhaps 95% of bookings are for one to ten nights. Go ahead and record everything from one to ten in a complete prompt. Then use the concatenated version as a fallback for everything over ten.

There are more examples on the sister page to this one.

Intonation and emphasis

These two items are tightly coupled. Intonation gives the listener cues as to what’s going on and what’s expected of them when. Be aware that the rules or tendencies are not universal across languages and cultures.


  • Define intonation - pitch, tones and contours.
    • Types of contours:
      • Rising-falling, final
      • Rising
      • Rising-falling, non-final
  • Use of intonation in prompt recording.
    • Yes/No questions
    • Lists
    • Questions to elicit information
    • Multiple choice questions

Use a final intonation for a declarative sentence

Use a flat intonation if you are recording a prompt segment that will be concatenated and it will be in the middle of the prompt


Determine and document emphasis
The words that receive the emphasis tell the listener a lot. Emphasis can be used to stress what’s important, new, or out of the ordinary. Misplaced emphasis can render a prompt completely ineffective. Documenting the emphasis in your prompt listing helps the voice talent capture your intent.

A fun example of emphasis is the following sentence:
  • I never said she stole my money.

Reading it seven times with the emphasis on a different word each time results in seven different meanings.

More pertinent to IVRs, pauses, emphasis, and inflection make the difference between a yes/no prompt and a multiple choice with two options. Look at this prompt.
  • Should I list the departure cities or arrival cities for you?

If the voice talent plows right through this with a strong upward inflection at the end, it sounds like a yes/no. Emphasize departure, pause ever so briefly, emphasize arrival, with the end almost flat instead of rising, and it becomes multiple choice.

Emphasis can have more subtle effects as well, such as in this example:
  • You must speak with an agent before contacting your bank to stop the payment.

If the emphasis is on "must," the statement risks sounding condescending. Putting more on "speak" than "must" makes it more informational.

In this prompt, "home" and "phone" can't have the same emphasis. "Home" needs more to show that it contrasts with the other options.
  • To change your home phone, press 1. Cell, 2. Work, 3. Or to return to the main menu, press 4.