Файл:DNN-HMM Based Multilingual Recognizer of Telephone Speech F3-DP-2016-Fiala-Jiri-DP Jiri Fiala.pdf
This thesis deals with the multilingual acoustic modeling problem based on the shared global phones inventory for five East Eurpoean languages: Czech, Russian, Hungarian, Slovak and Polish which are available within SpeechDat-E, i.e. the set of telephone speech databases. Because the SAMPA with unnormalized convention is used to represent the phonetic content of the particular languages and dierent symbols are in several cases representing the same phone, the mapping to the general X-SAMPA phonetic alphabet was proposed in the first step. The impact of a multilingual acoustic modeling was analyzed on the basis of a continuous speech recognition. The analysis of the acoustic modeling in the LVCSR task was performed for the GMM-HMM system and for the DNN-GMM approach. The experiments were performed for the LVCSR with the language specific acoustic model same as for the multilingual system. The particular recognizers were implemented via the Kaldi toolkit. One of this thesis goals is to provide a tutorial-style description of the Kaldi usage and create the recipe for the SpeechDat databases. Depending on the language, the best obtained accuracy of HMM recognizers was 18%-28%WER. DNN-HMM improved the results about 4%WER on average. The results for the multilingual HMM system reached the values from 25%-37%WER. The DNN approached had significant impact on the speech recognition accuracy for the multilingual system as well and it reduced theWER about 9% on average.
Keywords continuous speech recognition; LVCSR; GMM-HMM system; DNN-HMM system; multilingual system; multilingual acoustic modeling; IPA; SAMPA; X-SAMPA; Kaldi
Нажмите на дату/время, чтобы просмотреть, как тогда выглядел файл.
|текущий||19:36, 3 февраля 2017||0 × 0 (2,01 МБ)||Slikos|
- Вы не можете перезаписать этот файл.