UT Repository 東京大学
 

UT Repository >
124 情報理工学系研究科 >
40 電子情報学専攻 >
1244025 修士論文(電子情報学専攻) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2261/25847

タイトル: IMPROVING COHERENCE IN MULI-DOCUMENT SUMMARIZATION THROUGH PROPER ORDERING OF SENTENCES
その他のタイトル: 複数文書自動要約における要約文の並び順による一貫性向上に関する研究
著者: Bollegala, Danushka
著者(別言語): ボッレーガラ, ダヌシカ
キーワード: multi-document summarization
sentence ordering|text coherence
machine learning
Issue Date: 2-Feb-2007
抄録: The problem of extracting salient information to include in a summary has been researched extensively in the field of automatic text summarization. However, coherent arrangement of the extracted information has received little attention. Specially, in the case of extractive multi-document text summarization, sentences that convey important information are selected from a set of documents. There is no guarantee that this set of extracted sentences will form a coherent summary by itself. The order of presentation of information is an important factor that affects the coherence of a summary. This thesis focuses on the problem of automatically generating a coherent summary from a given set of documents by ordering the extracted sentences. I propose two different approaches to this problem: a pair-wise sentence comparison approach and a bottom-up text structuring approach. The pair-wise sentence comparison approach first compares all possible pairs of sentences and decides partial orderings between the two sentences in pairs. It then creates a total ordering that optimizes a certain function. In the bottom-up text structuring approach, I define four criteria for sentence ordering: chronology, topical-closeness, precedence and succedence. I then use support vector machines to integrate these four different criteria to compute the strength of association between two sentences. For training I use a set of manually ordered summaries. A hierarchical text clustering algorithm is used to produce a total ordering of sentences. I begin by ordering the pair of sentences that has the highest strength of association. I then repeatedly order the two segments of texts with the maximum association strength until a single segment with all sentences ordered is formed. I compare the sentence orderings produced by the proposed algorithm against manually ordered summaries using various rank correlation measures. Moreover, I perform a subjective grading of the generated summaries. Both automatic evaluation and subjective grading suggest that the proposed sentence ordering algorithms significantly outperforms all existing sentence ordering methods for multi-document summarization. Moreover, I investigate the problem of automatically evaluating a sentence ordering for its coherence and propose Average Continuity as an automatic evaluation measure for this task. The proposed automatic evaluation measure reports a high correlation with human ratings.
内容記述: 報告番号: ; 学位授与年月日: 2007-03- ; 学位の種別: 修士; 学位の種類: 修士() ; 学位記番号: 修第号 ; 研究科・専攻: 情報理工学系研究科電子情報学専攻
URI: http://hdl.handle.net/2261/25847
Appears in Collections:1244025 修士論文(電子情報学専攻)
025 修士論文

Files in This Item:

File Description SizeFormat
Bollegala.pdf528.73 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback