From a usability metrics perspective (Sauro & Lewis, 2012), the fundamental measurements at the task level are measures of effectiveness (e.g., successful task completion rate), efficiency (e.g., successful task completion time), and satisfaction (collected at the end of a task, at the end of a session, or both). Bloom et al. (2005) provided ten key criteria, based on classical usability metrics but focused on IVRs, for measuring the effectiveness of voice user interfaces.

Caller satisfaction
Assessed at a minimum with one or two five-point Likert items, such as, “I was satisfied with the automated portion of this call” and “I was satisfied with the agent during this call.” For more detailed evaluation of caller satisfaction use psychometrically qualified instruments such as the 34-item Subjective Assessment of Speech System Interfaces (SASSI) (Hone & Graham, 2000), the 11-item Pragmatic Rating Scale for Dialogues (Polkosky, 2002), or the 25-item Framework of SUI Service Quality (Polkosky, 2008).

Perceived ease of use
Assessed at a minimum with one five-point Likert item, such as a variant of the Single Ease Question (Sauro & Lewis, 2012) -- “The application was easy to use.” For more detailed evaluation of perceived ease of use, see the items for the User Goal Orientation and Customer Service Behavior factors of the Framework of SUI Service Quality (Polkosky, 2008).

Perceived quality of output
Assessed at a minimum with two five-point Likert items (“The voice was understandable” and “The voice sounded good.”) For more detailed evaluation of voice quality, see the five items for the Speech Characteristics factor of the Framework of SUI Service Quality (Polkosky, 2008) or, for a multidimensional assessment, the 15-item MOS-X (Polkosky & Lewis, 2003).

Perceived first-call resolution rate
Assessed at a minimum with a yes or no answer to the question, “Did you accomplish your goal?”

Time to task
The time that callers spend in the IVR before they can begin the desired task. Lower values of time-to-task lead to greater caller satisfaction. Items that increase time-to-task include lengthy up-front instructions, references to a Web site, and marketing messages (see What Not to Include at the Beginning).

Task completion rate
The rate at which callers actually accomplish tasks (an objective measure in contrast to the subjective measure of perceived first-call resolution rate).

Task completion time
The time required for callers to complete tasks. Generally, shorter task times are better for both the caller and for the service provider.

Correct transfer rate
Percentage of transferred calls getting to the right agent.

Abandonment rate
Percentage of callers who hang up before completing a task. Ideally the design of the IVR and associated logging discriminates between expected (probably not a problem) and unexpected (probably a problem) disconnections (see Logging Strategy).

Containment rate
Percentage of calls not transferred to human agents. Although this is a common metric, it is deeply flawed. See the discussion about this in Logging Strategy.

References

Bloom, J., Gilbert, J. E., Houwing, T., Hura, S., Issar, S., Kaiser, L., et al. (2005). Ten criteria for measuring effective voice user interfaces. Speech Technology, 10(9), 31–35.

Hone, K. S., & Graham, R. (2000). Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Engineering, 6(3–4), 287–303.

Polkosky, M. D. (2002). Initial psychometric evaluation of the Pragmatic Rating Scale for Dialogues (Tech. Rep. 29.3634). Boca Raton, FL: IBM.

Polkosky, M. D. (2008). Machines as mediators: The challenge of technology for interpersonal communication theory and research. In E. Konjin (Ed.), Mediated interpersonal communication (pp. 34–57). New York, NY: Routledge.

Polkosky, M. D., & Lewis, J. R. (2003). Expanding the MOS: Development and psychometric evaluation of the MOS-R and MOS-X. International Journal of Speech Technology, 6, 161–182.

Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience: Practical statistics for user research. Burlington, MA: Morgan Kaufmann.