UTokyo Repository 東京大学
 

UTokyo Repository >
124 情報理工学系研究科 >
40 電子情報学専攻 >
1244025 修士論文(電子情報学専攻) >

このページ(論文)をリンクする場合は次のURLを使用してください: http://hdl.handle.net/2261/51733

タイトル: Research on Dynamic Features Derived From Speech Structure
その他のタイトル: 音声の構造的表象から導出される動的特徴に関する研究
著者: Shimizu, Shinya
著者(別言語): 清水, 信哉
発行日: 2012年3月22日
抄録: Due to the spread of smartphones, automatic speech recognition (ASR) systems are getting more and more popular as an interface to computers. While their successes have shown that the ASR systems have reached a practical level, the basic algorithm of state-of-the-art ASR systems is still Hidden Markov Model (HMM) based algorithm, which has been the de facto standard algorithm for ASR since 1980s. The HMM-based algorithms assume the frame-by-frame Markov property to decrease the calculation amount to the realistic level. Because of the assumption, long-term features, which cannot be defined for each time frame, such as duration of words, can never be considered. Researchers have developed various methods to improve the performance of ASR systems with the constraint of Markov property. However, the ASR algorithms are undergoing a paradigm shift. The new paradigm algorithms don't assume the Markov property have been proposed, and they showed better performance than HMM-based old paradigm algorithms in the practical calculation time. Those new paradigm algorithms can consider long-term features, which can never be considered in the old paradigm algorithms. Therefore, effective long-term features are now being investigated by researchers. Speech structure is one of the long-term features, which can potentially be a effective feature for the new paradigm algorithm. Speech structure was proposed as a feature that is invariant for non-linguistic variations, such as the difference of speakers, recording environment, etc. While the speech structure has been applied to several applications, such as pronunciation proficiency assessment, and has shown the good performance, it has not been applied to continuous speech recognition, because it is not a frame-by-frame feature but a long-term feature and cannot be used as a feature for the old paradigm algorithms. On the contrary, the new paradigm algorithms can leverage the speech structure. An preliminary experiment on combining the speech structure with a new paradigm algorithm was already carried out and showed the good performance. However, the current implementation of speech structure is still immature and can be improved in some aspects. Dynamic feature is one of them. Dynamic features are defined as temporal derivatives of static features. They were firstly proposed in 1986, and are now effectively used in almost all the speech systems including ASR, speech synthesis, speaker identification, etc. However, no algorithms to leverage dynamic features in speech structure was proposed, and dynamic features are omitted in previous studies on speech structure. To solve the problem, I propose two algorithms to leverage dynamic features derived from speech structure, differential speech structure and trajectory speech structure. By using these algorithms, the dynamic features, can be effectively used for speech systems based on speech structure. Several experiments were carried out to show the effectiveness of proposed methods. By using the differential speech structure 11.0% relative decrease in word error rate was obtained in an experiment of isolated word recognition. Furthermore, by using the trajectory speech structure, 28.5% relative decrease in word error rate was obtained in an experiment of N-best rescoring of isolated word recognition. These results show that the proposed method works effectively and contributes to the speech structure as the feature for the new paradigm algorithms.
内容記述: 報告番号: ; 学位授与年月日: 2012-03-22 ; 学位の種別: 修士 ; 学位の種類: 修士(情報理工学) ; 学位記番号: ; 研究科・専攻: 情報理工学系研究科電子情報学専攻
URI: http://hdl.handle.net/2261/51733
出現カテゴリ:025 修士論文
1244025 修士論文(電子情報学専攻)

この論文のファイル:

ファイル 記述 サイズフォーマット
48106415.pdf5.17 MBAdobe PDF見る/開く

本リポジトリに保管されているアイテムはすべて著作権により保護されています。

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - ご意見をお寄せください