A confidence level is a measure of how closely a speech recognizer matches a spoken utterance to a recognized word or set of words. An n-best list is a list of the top n matches, ranked by confidence.

It is not always necessary to complicate dialog specifications with confidence level or n-best lists. In fact, it's possible to use this information inappropriately, for example, to assume that a confidence score of 0 is absolute proof that an utterance was out of grammar -- which is not necessarily true (Rolandi, 2003).

At other times, however, it is possible to use this information to refine strategies for confirmation and error correction (Balentine & Morgan, 2001; Cohen et al., 2004; Lewis, 2011).

Using confidence to modify the confirmation strategy
If confidence is high, hold back on confirmation; if confidence is low, consider a reprompting strategy. The following examples are from Balentine and Morgan (2001).

Example 1: High confidence

  • System: Which stock?

  • Caller: Texaco <recognized with high confidence>

  • System: How many shares of Texaco? <note implicit confirmation in this prompt>

  • Caller: 500.

  • System: Sell 500 shares of Texaco?

  • Caller: Yes.

Example 2: Low confidence

  • System: Which stock?

  • Caller: Texaco <recognized as PepsiCo, but with low confidence>

  • System: How many shares? <note avoidance of implicit confirmation>

  • Caller: 500.

  • System: One more time, please. What was that stock?

  • Caller: Texaco <recognized this time with high confidence>

  • System: Sell 500 shares of Texaco?

  • Caller: Yes.


These next examples are based on Boyce and Viets (2010).

Example 3: High confidence

  • System: Thank you for calling XYZ Bank. What would you like to do?

  • Caller: Transfer funds. <recognized with very high confidence>

  • System: OK, how much?

Example 4: Low confidence

  • System: Thank you for calling XYZ Bank. What would you like to do?

  • Caller: Transfer funds. <recognized with lower confidence, but not so low as to trigger explicit confirmation at this point>

  • System: OK, transfer funds. If that's not right, say Go Back. How much do you want to transfer?

Using n-best lists for disambiguation
Consider using n-best lists to disambiguate among homonyms or top recognition candidates that have close confidence scores.

It's best to use back-end logic to perform the disambiguation, for example, using checksums to distinguish valid from invalid account codes.

When this isn't possible and when it's unlikely that the caller's utterance is out of grammar, the next dialog turn can offer the next top candidate, for example:

  • Caller: Buy Texaco.

  • System: Do you want to buy PepsiCo?

  • Caller: No.

  • System: Buy Texaco?

  • Caller: Yes.

Also see Skip Lists and Using n-best Lists.

References

Balentine, B., & Morgan, D. P. (2001). How to build a speech recognition application: A style guide for telephony dialogues, 2nd edition. San Ramon, CA: EIG Press.

Boyce, S., & Viets, M. (2010). When is it my turn to talk?: Building smart, lean menus. In W. Meisel (Ed.), Speech in the user interface: Lessons from experience (pp. 108–112). Victoria, Canada: TMA Associates.

Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.

Lewis, J. R. (2011). Practical speech user interface design. Boca Raton, FL: CRC Press, Taylor & Francis Group.

Rolandi, W. (2003). When you don’t know what you don’t know. Speech Technology, 8(4), 28.