Файл:Recognising Conversational Speech - What an Incremental ASR Should Do for a Dialogue System and How to Get There BaumannEtAl16IWSDS.pdf
Timo Baumann, Casey Kennington, Julian Hough and David Schlangen
Automatic speech recognition (ASR) is not only becoming increasingly accurate, but also increasingly adapted for producing timely, incremental output. However, overall accuracy and timeliness alone are insufﬁcient when it comes to interactive dialogue systems which require stability in the output and responsivity to the utterance as it is unfolding. Furthermore, for a dialogue system to deal with phenomena such as disﬂuencies, to achieve deep understanding of user utterances these should be preserved or marked up for use by downstream components, such as language understanding, rather than be ﬁltered out. Similarly, word timing can be informative for analyzing deictic expressions in a situated environment and should be available for analysis. Here we investigate the overall accuracy and incremental performance of three widely used systems and discuss their suitability for the aforementioned perspectives. From the differing performance along these measures we provide a picture of the requirements for incremental ASR in dialogue systems and describe freely available tools for using and evaluating incremental ASR.
Keywords: Automatic speech recognition (ASR), Spoken dialogue system (SDS), Sphinx-4, Google’s web-based ASR API, Kaldi
Нажмите на дату/время, чтобы просмотреть, как тогда выглядел файл.
|текущий||17:04, 22 декабря 2016||0 × 0 (3,77 МБ)||Slikos|
- Вы не можете перезаписать этот файл.