2021-07-31T21:30:06Zhttps://repository.dl.itc.u-tokyo.ac.jp/oaioai:repository.dl.itc.u-tokyo.ac.jp:000054912021-03-01T19:38:43Z隠れマルコフモデルによる高速な系列データの解析手法Efficient Sequence Data Analysis with Hidden Markov ModelsFujiwara, Yasuhiro007University of Tokyo (東京大学)博士(情報理工学)Sequential data analysis, a relatively young and interdisciplinary field of computer science, is the process of extracting patterns, rules, or meaning from large sequence data sets by combining methods from statistics and artificial intelligence with database management. With recent tremendous technical advances in processing power, storage capacity, and Internet, sequential data analysis is seen as an increasingly important approach by modern business. This is because it can give an informational advantage by transforming unprecedented quantities of sequence data into business intelligence. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. Since sequential data analysis can bring real value for real applications, demand for novel technologies in sequential data analysis is growing these days. Hidden Markov model (HMM) is a popular tool for sequential data analysis, and is receiving considerable attention in various communities. And many applications that use HMM have emerged such as sequence labeling, speech recognition, mental task classification, biological analysis, traffic monitoring, and anomaly detection. The Viterbi algorithm is used in these applications. However, the Viterbi algorithm unfortunately requires quadratic CPU time for the number of states which are used in HMMs. And data in these applications are recently explosively increasing. Therefore efficient classification approach is needed in HMM. To enhance the processing speed in HMM, many approximation approaches have been proposed [Ney 92, F. Jelinek 99, Tsuruoka 05, Toutanova 03, Siddiqi 05b, Cohn 06, Jeong 09]. However, these approaches have two major problems. First, they cannot guarantee error bound. Therefore these approaches can affect the qualities of real applications. Second, approximate algorithms usually require hyperparameters, which control the tradeoff between accuracy and efficiency, and have to be manually adjusted for each task. The proposed approaches in this thesis can efficiently process HMM without sacrificing the computational results; the proposed approaches output the results as exactly same as the Viterbi algorithm. And the proposed approaches need the same memory cost as the Viterbi algorithm; they can handle large size of dataset used in real application. This thesis mainly handles three typical problems in HMM; the first problem is exact and efficient states sequence detection for single HMM and static sequence of of arbitrary length, and the second problem is exact and efficient identification of the model whose state sequence has the highest likelihood for the given query sequence, and the third problem is exact and efficient monitoring of streaming data sequences to find the best model. I propose Staggered decoding for the first problem and SPIRAL for the second and third problem, a fast search method for HMM datasets. The proposed approaches are based on two ideas; approximation and pruning. Approximation is an idea that aggregates several state to discard unlikely states or models. And pruning is an idea that computes exact likelihood of viable states/models by pruning unlikely state transition. Experiments verify the effectiveness of Staggered decoding and SPIRAL. That is, they are much faster than the naive Viterbi algorithm based method. And, to show the generality of our approach for other graphical data structure, I applied our approach for the problem to monitoring best centrality nodes of time-evolving graphs.thesis2011-09-272011-09-27application/pdf甲第27570号https://repository.dl.itc.u-tokyo.ac.jp/record/5491/files/48087421.pdfeng