Webと携帯端末向けの新聞記事の対応コーパスからの文末言い換え抽出

岩越, 守孝; 増田, 英孝; 中川, 裕志

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Webと携帯端末向けの新聞記事の対応コーパスからの文末言い換え抽出

http://hdl.handle.net/2261/29447

名前 / ファイル	ライセンス	アクション
v12n5_07.pdf (302.9 kB)

アイテムタイプ

学術雑誌論文 / Journal Article(1)

公開日

2009-12-15

タイトル

Webと携帯端末向けの新聞記事の対応コーパスからの文末言い換え抽出

言語

jpn

キーワード

主題Scheme

Other

主題

言い換え

キーワード

主題Scheme

Other

主題

携帯端末

キーワード

主題Scheme

Other

主題

Web

キーワード

主題Scheme

Other

主題

文末表現

キーワード

主題Scheme

Other

主題

Paraphrase

キーワード

主題Scheme

Other

主題

Mobile terminal

キーワード

主題Scheme

Other

主題

Web

キーワード

主題Scheme

Other

主題

Sentence final part

資源タイプ

資源

http://purl.org/coar/resource_type/c_6501

タイプ

journal article

その他のタイトル

Extraction of Paraphrasing Pattern by Aligned Corpora of Web and Mobile Terminal News Articles

著者

岩越, 守孝
増田, 英孝
中川, 裕志

著者別名

識別子Scheme

WEKO

識別子

106368

姓名

Iwakoshi, Moritaka

著者別名

識別子Scheme

WEKO

識別子

106369

姓名

Masuda, Hidetaka

著者別名

識別子Scheme

WEKO

識別子

106370

姓名

Nakagawa, Hiroshi

著者所属

東京電機大学工学部

著者所属

東京大学情報基盤センター

著者所属

School of Engineering, Tokyo Denki University

著者所属

Information Technology Center,The University of Tokyo

抄録

内容記述タイプ

Abstract

内容記述

本研究では，数十文字程度の長さで携帯端末向けに配信されている新聞記事と数百文字程度の長さのWeb 新聞記事の両者を約3 年に渡って収集した．こうして収集したコーパスから文末表現の縮約などの言い換え表現の抽出を機械的に行った．まず，Web から収集した携帯向け新聞記事とWeb 新聞記事からなるコーパスに対して記事単位の対応付けを行い，次に文単位の対応付けを行った．次に携帯向け記事文の文末の表現を形態素解析を用いて抽出し，その文に対応するWeb 新聞記事の文を集める．そしてWeb 新聞記事の文の文末から形態素ごとに言い換え先表現を抽出し，それに対して頻度等を用いた得点付け，および必要な名詞を欠落させてしまう不適切な言い換えの除去を行うことにより言い換え表現の抽出精度向上を図った．

抄録

内容記述タイプ

Abstract

内容記述

We have collected both Web news-paper articles of several hundreds of characters, for three years and their counter parts distributed for mobile terminals, which consist of fifty to a hundred characters. Then, we extracted a number of candidates of paraphrases of the final part of sentences from them automatically. At first we have aligned these two types of corpus first at article level, then at sentence level. Next, we extract the final part of mobile article sentences using morphological analyzer, and collect their counterpart expressions of Web article sentences. Finally, we extracted the candidates of morpheme sequence from the final part of Web article sentence, then we propose the combination of two methods for them in order to improve the extraction accuracy of the sets: 1) ranking based on frequency, branching factor and length of string, and 2) filtering to remove inappropriate expressions which eliminate semantically indispensable nouns.

bibliographic_information

自然言語処理

巻 12, 号 5, p. 157-184, 発行日 2005-10

ISSN

収録物識別子タイプ

ISSN

収録物識別子

13407619

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10472659

フォーマット

内容記述タイプ

Other

内容記述

application/pdf

日本十進分類法

主題Scheme

NDC

主題

007

出版者

言語処理学会

出版者別名

The Association for Natural Language Processing

戻る

views

See details

	Views

Versions

Ver.1

2021-03-02 00:39:13.626357

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

Webと携帯端末向けの新聞記事の対応コーパスからの文末言い換え抽出

× 岩越, 守孝

× 増田, 英孝

× 中川, 裕志

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

Webと携帯端末向けの新聞記事の対応コーパスからの文末言い換え抽出

× 岩越, 守孝

× 増田, 英孝

× 中川, 裕志

Versions

Share

Cite as

Other

エクスポート

コミュニティ