A study on attributional and relational similarity between word pairs on the Web

BOLLEGALA, DANUSHKA

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

A study on attributional and relational similarity between word pairs on the Web

https://doi.org/10.15083/00002426

名前 / ファイル	ライセンス	アクション
48077408.pdf (1.2 MB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2012-03-01

タイトル

A study on attributional and relational similarity between word pairs on the Web

言語

eng

キーワード

主題Scheme

Other

主題

similarity

キーワード

主題Scheme

Other

主題

Web Mining

資源タイプ

資源

http://purl.org/coar/resource_type/c_46ec

タイプ

thesis

ID登録

10.15083/00002426

ID登録タイプ

JaLC

その他のタイトル

ウェブ上での単語対間の属性類似性に関する研究

著者

BOLLEGALA, DANUSHKA

著者別名

識別子Scheme

WEKO

識別子

6700

姓名

ボッレーガラ, ダヌシカ

著者所属

大学院情報理工学系研究科電子情報学専攻

著者所属

Graduate School of Information Science and Technology Department of Information and Communication Engineering The University of Tokyo

Abstract

内容記述タイプ

Abstract

内容記述

Similarity is a fundamental concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Similarity provides the basis for learning, generalization and recognition. Similarity can be broadly divided into two types: semantic (or attributional) similarity, and relational similarity. Attributional similarity is the correspondence between the attributes of two objects. If two objects have identical or close attributes, then those two objects are considered attributionally similar. For example, the two concepts, car and automobile, both have an identical set of attributes: four wheels, doors, and both are used for transportation. Consequently, the two words car and automobile show a high degree of semantic (attributional) similarity and are considered synonymous. On the other hand, relational similarity is the correspondence between the implicit semantic relations that exist between two pairs of words. For example, consider the two word-pairs (ostrich, bird) and (lion, cat). Ostrich is a large bird and lion is a large cat. The implicitly stated semantic relation is a large holds between the two words in each word-pair. Therefore, those two word-pairs are considered relationally similar. Typically, word analogies show a high degree of relational similarity. This thesis addresses the problem of measuring both semantic (attributional) and relational similarity between words or pairs of words from the web. I define the two types of similarity in detail and analyze the concept of similarity from numerous view-points, its philosophical, linguistic, and mathematical interpretations. In particular, I compare different models such as the contrast model, transformational model and relational model, of similarity. I presents a supervised approach to measure the semantic similarity between two words using a web search engine. To measure the attributional similarity between two given words, first, I search for those words individually and also in a conjunctive combination in a web search engine. I then extract lexical patterns that describe numerous semantic relations between the two words under consideration. Moreover, I compute popular co-occurrence measures such as the Jaccard coefficient, Overlap coefficient, Dice coefficient, and pointwise mutual information, using the page counts retreived from a search engine. All those measures are integrated with lexical patterns through a machine learning framework. The training data for the algorithm are selected from synsets inWordNet. The proposed method reports a high correlation with human ratings in a benchmark dataset for semantic similarity. Moreover, The proposed semantic similarity is used in a community clustering task and a word sense disambiguation task. Both those tasks show the ability of the proposed semantic similarity measure to accurately compute the similarity between named-entities. This is particularly useful because semantic similarity measures that require manually created lexical resources such as dictionaries are unable to compute the similarity between named-entities, which are not well covered by dictionaries. Chapter 3 studies the problem of relational similarity. Given two word-pairs (A,B) and (C,D), I propose a relational similarity measure, relsim((A;B); (C;D)) to compute the similarity between the implicit semantic relations that hold between the two words A and B, and C and D. To represent the implicitly stated semantic relations between two words, I extract lexical patterns from the snippets retrieved from a web search engine for the two words. However, not all lexical patterns describe a different semantic relation. Some relations can be represented by more than one lexical pattern. For example, both patterns X is a Y, and Y such as X describe a hypernymic relation between X and Y. Then the extracted patterns are clustered using distributional similarity to identify the different patterns that describe a particular semantic relation. Finally, machine learning approaches are used to compute the relational similarity between two given word-pairs using the lexical patterns extracted for each word-pair. I experiment with support vector machines, and information theoretic metric learning approach to learn a relational similarity measure. The second half of this thesis describes the applications of semantic and relational similarity. As a working problem, I concentrate on personal name disambiguation on the web. A name of a person can be ambiguous on the web because of two main reasons. First, different people can share the same name (namesake disambiguation problem). Second, a single individual can have multiple aliases on the web (alias detection problem). Chapter 4 analyzes the namesake disambiguation problem, whereas, Chapter 5 focuses on the alias detection problem. I propose fully automatic methods to solve both these problems with a high accuracy. Extracting attributes for a particular person such as date of birth, nationality, affiliation, job title, etc. is particularly helpful to disambiguate that person from his or her namesakes on the web. I present some preliminary work that I have conducted in this area in Chapter 6. In Chapter 7, I present a relational model of semantic similarity that links relational and semantic similarity measures that were introduced in the thesis. In contract to the feature model of semantic similarity, which models objects using their attributes, the relational model attempts to compute the semantic similarity between two given words directly using the numerous semantic relations that hold between the two words. In Chapter 8, I conclude this thesis with a description of potential future work in web-based similarity measurement.

書誌情報

発行日 2009-09

日本十進分類法

主題Scheme

NDC

主題

548

学位名

博士(情報理工学)

学位

値

doctoral

学位分野

Information Science and Technology (情報理工学)

学位授与機関

学位授与機関名

University of Tokyo (東京大学)

研究科・専攻

Department of Information and Communication Engineering, Graduate School of Information Science and Technology (情報理工学系研究科電子情報学専攻)

学位授与年月日

2009-09-28

学位授与番号

甲第25374号

学位記番号

博情第256号

戻る

views

See details

	Views

Versions

Ver.1

2021-03-01 19:59:32.297832

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

A study on attributional and relational similarity between word pairs on the Web

× BOLLEGALA, DANUSHKA

Versions

Share

Cite as

エクスポート