WEKO3
アイテム
Efficient Sequence Data Analysis with Hidden Markov Models
https://doi.org/10.15083/00005482
https://doi.org/10.15083/00005482def0d023-2613-4d41-9722-1c9f545a4f56
名前 / ファイル | ライセンス | アクション |
---|---|---|
48087421.pdf (1.4 MB)
|
|
Item type | 学位論文 / Thesis or Dissertation(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2014-02-24 | |||||
タイトル | ||||||
タイトル | Efficient Sequence Data Analysis with Hidden Markov Models | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_46ec | |||||
資源タイプ | thesis | |||||
ID登録 | ||||||
ID登録 | 10.15083/00005482 | |||||
ID登録タイプ | JaLC | |||||
その他のタイトル | ||||||
その他のタイトル | 隠れマルコフモデルによる高速な系列データの解析手法 | |||||
著者 |
Fujiwara, Yasuhiro
× Fujiwara, Yasuhiro |
|||||
著者別名 | ||||||
識別子Scheme | WEKO | |||||
識別子 | 11440 | |||||
姓名 | 藤原, 靖宏 | |||||
著者所属 | ||||||
値 | 東京大学大学院情報理工学系研究科電子情報学専攻 | |||||
著者所属 | ||||||
値 | Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo | |||||
Abstract | ||||||
内容記述タイプ | Abstract | |||||
内容記述 | Sequential data analysis, a relatively young and interdisciplinary field of computer science, is the process of extracting patterns, rules, or meaning from large sequence data sets by combining methods from statistics and artificial intelligence with database management. With recent tremendous technical advances in processing power, storage capacity, and Internet, sequential data analysis is seen as an increasingly important approach by modern business. This is because it can give an informational advantage by transforming unprecedented quantities of sequence data into business intelligence. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. Since sequential data analysis can bring real value for real applications, demand for novel technologies in sequential data analysis is growing these days. Hidden Markov model (HMM) is a popular tool for sequential data analysis, and is receiving considerable attention in various communities. And many applications that use HMM have emerged such as sequence labeling, speech recognition, mental task classification, biological analysis, traffic monitoring, and anomaly detection. The Viterbi algorithm is used in these applications. However, the Viterbi algorithm unfortunately requires quadratic CPU time for the number of states which are used in HMMs. And data in these applications are recently explosively increasing. Therefore efficient classification approach is needed in HMM. To enhance the processing speed in HMM, many approximation approaches have been proposed [Ney 92, F. Jelinek 99, Tsuruoka 05, Toutanova 03, Siddiqi 05b, Cohn 06, Jeong 09]. However, these approaches have two major problems. First, they cannot guarantee error bound. Therefore these approaches can affect the qualities of real applications. Second, approximate algorithms usually require hyperparameters, which control the tradeoff between accuracy and efficiency, and have to be manually adjusted for each task. The proposed approaches in this thesis can efficiently process HMM without sacrificing the computational results; the proposed approaches output the results as exactly same as the Viterbi algorithm. And the proposed approaches need the same memory cost as the Viterbi algorithm; they can handle large size of dataset used in real application. This thesis mainly handles three typical problems in HMM; the first problem is exact and efficient states sequence detection for single HMM and static sequence of of arbitrary length, and the second problem is exact and efficient identification of the model whose state sequence has the highest likelihood for the given query sequence, and the third problem is exact and efficient monitoring of streaming data sequences to find the best model. I propose Staggered decoding for the first problem and SPIRAL for the second and third problem, a fast search method for HMM datasets. The proposed approaches are based on two ideas; approximation and pruning. Approximation is an idea that aggregates several state to discard unlikely states or models. And pruning is an idea that computes exact likelihood of viable states/models by pruning unlikely state transition. Experiments verify the effectiveness of Staggered decoding and SPIRAL. That is, they are much faster than the naive Viterbi algorithm based method. And, to show the generality of our approach for other graphical data structure, I applied our approach for the problem to monitoring best centrality nodes of time-evolving graphs. | |||||
書誌情報 | 発行日 2011-09-27 | |||||
日本十進分類法 | ||||||
主題Scheme | NDC | |||||
主題 | 007 | |||||
学位名 | ||||||
学位名 | 博士(情報理工学) | |||||
学位 | ||||||
値 | doctoral | |||||
学位分野 | ||||||
値 | Information Science and Technology (情報理工学) | |||||
学位授与機関 | ||||||
学位授与機関名 | University of Tokyo (東京大学) | |||||
研究科・専攻 | ||||||
値 | Department of Information and Communication Engineering, Graduate School of Information Science and Technology (情報理工学系研究科電子情報学専攻) | |||||
学位授与年月日 | ||||||
学位授与年月日 | 2011-09-27 | |||||
学位授与番号 | ||||||
学位授与番号 | 甲第27570号 | |||||
学位記番号 | ||||||
値 | 博情第355号 |