WEKO3
アイテム
Synchrony-based Audiovisual Analysis
https://doi.org/10.15083/00002430
https://doi.org/10.15083/000024307ec77370-88df-416f-8aca-0edb62767b18
名前 / ファイル | ライセンス | アクション |
---|---|---|
37067413.pdf (5.0 MB)
|
|
Item type | 学位論文 / Thesis or Dissertation(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2012-03-01 | |||||
タイトル | ||||||
タイトル | Synchrony-based Audiovisual Analysis | |||||
言語 | ||||||
言語 | eng | |||||
キーワード | ||||||
主題 | Audiovisual analysis | |||||
主題Scheme | Other | |||||
キーワード | ||||||
主題 | Synchrony | |||||
主題Scheme | Other | |||||
資源タイプ | ||||||
資源 | http://purl.org/coar/resource_type/c_46ec | |||||
タイプ | thesis | |||||
ID登録 | ||||||
ID登録 | 10.15083/00002430 | |||||
ID登録タイプ | JaLC | |||||
その他のタイトル | ||||||
その他のタイトル | 同期性に基づく音と映像の統合解析 | |||||
著者 |
劉, 玉宇
× 劉, 玉宇 |
|||||
著者別名 | ||||||
識別子 | 6707 | |||||
識別子Scheme | WEKO | |||||
姓名 | Liu, Yuyu | |||||
著者所属 | ||||||
著者所属 | 大学院情報理工学系研究科電子情報学専攻 | |||||
著者所属 | ||||||
著者所属 | Graduate School of Information Science and Technology Department of Information and Communication Engineering The University of Tokyo | |||||
Abstract | ||||||
内容記述タイプ | Abstract | |||||
内容記述 | This thesis presents a computational framework to jointly analyze auditory and visual information. The integration of audiovisual information is realized based on synchrony evaluation, which is motivated by the neuroscience discovery, that synchrony is a key for human beings to perceive across the senses of different modalities. The works in this thesis focus on answering two questions: how to perform and where to apply this audiovisual analysis with synchrony evaluation. To answer the first question, we develop novel effective methods to analyze the audiovisual correlation, and perform a classification and an experimental comparison of the existing techniques, including the ones we developed. Since this is the first work that classifies and experimentally compares the methods of this field, it supplies a basis for designing algorithms to computationally analyze the audiovisual correlation. To answer the second question, we apply audiovisual correlation analysis to solve three different problems. The first problem is the detection of a speaker's face region in a video, whose previous solutions either require special devices like microphone array or supply only highly fragmental results. Assuming that speaker is stationary within an analysis time window, we introduce a novel method to analyze the audiovisual correlation for speaker using newly introduced audiovisual differential feature and quadratic mutual information, and integrate the result of this correlation analysis into graph cut-based image segmentation to compute the speaker face region. This method not only achieves the smoothness of the detected face region, but also is robust against the change of background, view, and scale. The second problem is the localization of sound source. General sound sources are diverse in types and usually non-stationary while emitting sounds. To solve this problem, we develop an audiovisual correlation maximization framework to trace the sound source movement, and introduce audiovisual inconsistency feature to extract audiovisual events for all kinds of sound sources. We also propose an incremental computation of mutual information to significantly speed up the computation. This method can successfully localize different moving sound sources in the experiments. The third problem is the recovery of drifted audio-to-video synchronization, which used to require both special device and dedicated human effort. Considering that the correlation reaches the maximum only when audio is synchronized with video, we develop an automatic recovery method by analyzing the audiovisual correlation for a given speaker in the video clip. The recovery demonstrates high accuracy for both simulation and real data. While the theoretical justification and experimental justification are performed independently, this thesis taken as a whole lays a necessary groundwork for jointly analyzing audiovisual information based on synchrony evaluation. | |||||
書誌情報 | 発行日 2009-09 | |||||
日本十進分類法 | ||||||
主題 | 548 | |||||
主題Scheme | NDC | |||||
学位名 | ||||||
学位名 | 博士(情報理工学) | |||||
学位 | ||||||
値 | doctoral | |||||
学位分野 | ||||||
Information Science and Technology (情報理工学) | ||||||
学位授与機関 | ||||||
学位授与機関名 | University of Tokyo (東京大学) | |||||
研究科・専攻 | ||||||
Department of Information and Communication Engineering, Graduate School of Information Science and Technology (情報理工学系研究科電子情報学専攻) | ||||||
学位授与年月日 | ||||||
学位授与年月日 | 2009-09-28 | |||||
学位授与番号 | ||||||
学位授与番号 | 甲第25373号 | |||||
学位記番号 | ||||||
博情第255号 |