Working with your audio agency

Create a thorough, annotated list of messages
Part of the preparation for a recording session is to compile a recording list (aka script or recording manifest) that lists all of the audio segments that the voice talent will record. It should include:
  • The file name for each segment
  • Notes to the voice talent and coach about how to speak the segment (see Coaching, inflection)
  • Post-processing notes for the engineer

Agencies can work with different formats for the manifest, but typically have a preferred format. To minimize your effort in developing your format, ask the agency to send you a few samples.

Make sure the agency maintains a high degree of consistency across recording sessions
Some of the tips for ensuring this are:
  • Have a high degree of consistency in the recording area and especially microphone placement
  • Identify a few key phrases (e.g., the introduction) to use for future sessions to help the voice talent recapture the desired style and tone


Generally plan to use .wav recordings that are 8 kHz, 8-bit, CCITT u-law compressed
Note that this is a post-processing recommendation. Always record messages at higher fidelity, generally full CD quality.

Although there are exceptions, the typical sampling rate for audio segments to store and play over the phone is 8 kHz.

The typical bit rate is 8 bits, although 16 bits is also common.

The most common file formats for VoiceXML-compliant IVRs are .wav and .au.

For .wav files, it is common to use CCITT u-law compression, although any other compression supported by the system should be fine.

Make sure audio segments have proper trimming
The recording manifest should specify how, during post-production, the agency should trim (remove silence from) the beginnings and ends of the audio segments -- e.g., it is common to have trimming rules based on the types of ending punctuation in the written script (for example, 500 ms for sentence-final punctuation, 250 ms for commas, 100 ms for colons, 0 ms for all others -- except when otherwise specified, 0 ms for initial trimming).

Maintain copies of all recorded audio in its originally recorded format
If you have copies of all recorded audio, then with a little practice you can do some of your own post-production, including using an audio editor to create new segments from existing ones (keeping in mind coarticulation effects and inflection when doing this editing). This can be helpful when you need a slight variation on an existing prompt and you need it fast. Your odds of being able to do this successfully rise as the size of your recording library does. It's not always possible.

Here's a basic example. Let's say you have a message that says a particular specialist is available 9AM-5PM. The business comes to you and says guess what, we extended hours until 7PM, and it happened yesterday, can you change the IVR? Chances are you have any number of recordings with "seven" in them from that system's messages. Chances also are that because they were recorded to be used in strings, they won't sound right here. But let's say you have another department whose hours are 8AM-7PM. Now you're talking. You have a much better chance of being able to extract the entire "7PM" or even "to 7PM" section out of that one for your new message. What if your only 7:00 is a message that says you'll reopen at "7AM tomorrow" where it has AM instead of PM? In this case, coarticulation is not your friend. The "n" at the end of "seven" sounds completely different going into "AM" than it does going into "PM."

This is a skill that takes some practice and some linguistic background. Don't assume it's always doable.