Файл:Vocal Tract Length Perturbation (VTLP) improves speech recognition.pdf

Материал из Материалы по машинному обучению
Перейти к: навигация, поиск
Vocal_Tract_Length_Perturbation_(VTLP)_improves_speech_recognition.pdf(0 × 0 пикселей, размер файла: 287 КБ, MIME-тип: application/pdf)

Navdeep Jaitly ndjaitly@cs.toronto.edu University of Toronto, 10 King’s College Rd., Toronto, ON M5S 3G4 CANADA Geoffrey E. Hinton hinton@cs.toronto.edu University of Toronto, 10 King’s College Rd., Toronto, ON M5S 3G4 CANADA


Abstract

Augmenting datasets by transforming inputs in a way that does not change the label is a crucial ingredient of the state of the art methods for object recognition using neural networks. However this approach has (to our knowledge) not been exploited successfully in speech recognition (with or without neural networks). In this paper we lay the foundation for this approach, and show one way of augmenting speech datasets by transforming spectrograms, using a random linear warping along the frequency dimension. In practice this can be achieved by using warping techniques that are used for vocal tract length normalization (VTLN) - with the difference that a warp factor is generated randomly each time, during training, rather than fitting a single warp factor to each training and test speaker (or utterance). At test time, a prediction is made by averaging the predictions over multiple warp factors. When this technique is applied to TIMIT using Deep Neural Networks (DNN) of different depths, the Phone Error Rate (PER) improved by an average of 0.65% on the test set. For a Convolutional neural network (CNN) with convolutional layer in the bottom, a gain of 1.0% was observed. These improvements were achieved without increasing the number of training epochs, and suggest that data transformations should be an important component of training neural networks for speech, especially for data limited projects.

История файла

Нажмите на дату/время, чтобы просмотреть, как тогда выглядел файл.

Дата/времяРазмерыУчастникПримечание
текущий17:27, 22 декабря 20160 × 0 (287 КБ)Slikos (обсуждение | вклад)
  • Вы не можете перезаписать этот файл.

Следующая 1 страница ссылается на данный файл: