Efficient Sequence Data Analysis with Hidden Markov Models

Fujiwara, Yasuhiro

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Efficient Sequence Data Analysis with Hidden Markov Models

https://doi.org/10.15083/00005482

名前 / ファイル	ライセンス	アクション
48087421.pdf (1.4 MB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2014-02-24

タイトル

Efficient Sequence Data Analysis with Hidden Markov Models

言語

eng

資源タイプ

資源

http://purl.org/coar/resource_type/c_46ec

タイプ

thesis

ID登録

10.15083/00005482

ID登録タイプ

JaLC

その他のタイトル

隠れマルコフモデルによる高速な系列データの解析手法

著者

Fujiwara, Yasuhiro

著者別名

識別子Scheme

WEKO

識別子

11440

姓名

藤原, 靖宏

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

Abstract

内容記述タイプ

Abstract

内容記述

Sequential data analysis, a relatively young and interdisciplinary field of computer science, is the process of extracting patterns, rules, or meaning from large sequence data sets by combining methods from statistics and artificial intelligence with database management. With recent tremendous technical advances in processing power, storage capacity, and Internet, sequential data analysis is seen as an increasingly important approach by modern business. This is because it can give an informational advantage by transforming unprecedented quantities of sequence data into business intelligence. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. Since sequential data analysis can bring real value for real applications, demand for novel technologies in sequential data analysis is growing these days. Hidden Markov model (HMM) is a popular tool for sequential data analysis, and is receiving considerable attention in various communities. And many applications that use HMM have emerged such as sequence labeling, speech recognition, mental task classification, biological analysis, traffic monitoring, and anomaly detection. The Viterbi algorithm is used in these applications. However, the Viterbi algorithm unfortunately requires quadratic CPU time for the number of states which are used in HMMs. And data in these applications are recently explosively increasing. Therefore efficient classification approach is needed in HMM. To enhance the processing speed in HMM, many approximation approaches have been proposed [Ney 92, F. Jelinek 99, Tsuruoka 05, Toutanova 03, Siddiqi 05b, Cohn 06, Jeong 09]. However, these approaches have two major problems. First, they cannot guarantee error bound. Therefore these approaches can affect the qualities of real applications. Second, approximate algorithms usually require hyperparameters, which control the tradeoff between accuracy and efficiency, and have to be manually adjusted for each task. The proposed approaches in this thesis can efficiently process HMM without sacrificing the computational results; the proposed approaches output the results as exactly same as the Viterbi algorithm. And the proposed approaches need the same memory cost as the Viterbi algorithm; they can handle large size of dataset used in real application. This thesis mainly handles three typical problems in HMM; the first problem is exact and efficient states sequence detection for single HMM and static sequence of of arbitrary length, and the second problem is exact and efficient identification of the model whose state sequence has the highest likelihood for the given query sequence, and the third problem is exact and efficient monitoring of streaming data sequences to find the best model. I propose Staggered decoding for the first problem and SPIRAL for the second and third problem, a fast search method for HMM datasets. The proposed approaches are based on two ideas; approximation and pruning. Approximation is an idea that aggregates several state to discard unlikely states or models. And pruning is an idea that computes exact likelihood of viable states/models by pruning unlikely state transition. Experiments verify the effectiveness of Staggered decoding and SPIRAL. That is, they are much faster than the naive Viterbi algorithm based method. And, to show the generality of our approach for other graphical data structure, I applied our approach for the problem to monitoring best centrality nodes of time-evolving graphs.

書誌情報

発行日 2011-09-27

日本十進分類法

主題Scheme

NDC

主題

007

学位名

博士(情報理工学)

学位

値

doctoral

学位分野

Information Science and Technology (情報理工学)

学位授与機関

学位授与機関名

University of Tokyo (東京大学)

研究科・専攻

Department of Information and Communication Engineering, Graduate School of Information Science and Technology (情報理工学系研究科電子情報学専攻)

学位授与年月日

2011-09-27

学位授与番号

甲第27570号

学位記番号

博情第355号

戻る

views

See details

	Views

Versions

Ver.1

2021-03-01 19:38:43.631126

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Efficient Sequence Data Analysis with Hidden Markov Models

× Fujiwara, Yasuhiro

Versions

Share

Cite as

エクスポート