ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 124 情報理工学系研究科
  2. 40 電子情報学専攻
  3. 1244020 博士論文(電子情報学専攻)
  1. 0 資料タイプ別
  2. 20 学位論文
  3. 021 博士論文

A study on attributional and relational similarity between word pairs on the Web

https://doi.org/10.15083/00002426
https://doi.org/10.15083/00002426
e34908f3-5828-4f31-a29e-714a2150e817
名前 / ファイル ライセンス アクション
48077408.pdf 48077408.pdf (1.2 MB)
Item type 学位論文 / Thesis or Dissertation(1)
公開日 2012-03-01
タイトル
タイトル A study on attributional and relational similarity between word pairs on the Web
言語
言語 eng
キーワード
主題Scheme Other
主題 similarity
キーワード
主題Scheme Other
主題 Web Mining
資源タイプ
資源 http://purl.org/coar/resource_type/c_46ec
タイプ thesis
ID登録
ID登録 10.15083/00002426
ID登録タイプ JaLC
その他のタイトル
その他のタイトル ウェブ上での単語対間の属性類似性に関する研究
著者 BOLLEGALA, DANUSHKA

× BOLLEGALA, DANUSHKA

WEKO 6699

BOLLEGALA, DANUSHKA

Search repository
著者別名
識別子Scheme WEKO
識別子 6700
姓名 ボッレーガラ, ダヌシカ
著者所属
著者所属 大学院情報理工学系研究科電子情報学専攻
著者所属
著者所属 Graduate School of Information Science and Technology Department of Information and Communication Engineering The University of Tokyo
Abstract
内容記述タイプ Abstract
内容記述 Similarity is a fundamental concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Similarity provides the basis for learning, generalization and recognition. Similarity can be broadly divided into two types: semantic (or attributional) similarity, and relational similarity. Attributional similarity is the correspondence between the attributes of two objects. If two objects have identical or close attributes, then those two objects are considered attributionally similar. For example, the two concepts, car and automobile, both have an identical set of attributes: four wheels, doors, and both are used for transportation. Consequently, the two words car and automobile show a high degree of semantic (attributional) similarity and are considered synonymous. On the other hand, relational similarity is the correspondence between the implicit semantic relations that exist between two pairs of words. For example, consider the two word-pairs (ostrich, bird) and (lion, cat). Ostrich is a large bird and lion is a large cat. The implicitly stated semantic relation is a large holds between the two words in each word-pair. Therefore, those two word-pairs are considered relationally similar. Typically, word analogies show a high degree of relational similarity. This thesis addresses the problem of measuring both semantic (attributional) and relational similarity between words or pairs of words from the web. I define the two types of similarity in detail and analyze the concept of similarity from numerous view-points, its philosophical, linguistic, and mathematical interpretations. In particular, I compare different models such as the contrast model, transformational model and relational model, of similarity. I presents a supervised approach to measure the semantic similarity between two words using a web search engine. To measure the attributional similarity between two given words, first, I search for those words individually and also in a conjunctive combination in a web search engine. I then extract lexical patterns that describe numerous semantic relations between the two words under consideration. Moreover, I compute popular co-occurrence measures such as the Jaccard coefficient, Overlap coefficient, Dice coefficient, and pointwise mutual information, using the page counts retreived from a search engine. All those measures are integrated with lexical patterns through a machine learning framework. The training data for the algorithm are selected from synsets inWordNet. The proposed method reports a high correlation with human ratings in a benchmark dataset for semantic similarity. Moreover, The proposed semantic similarity is used in a community clustering task and a word sense disambiguation task. Both those tasks show the ability of the proposed semantic similarity measure to accurately compute the similarity between named-entities. This is particularly useful because semantic similarity measures that require manually created lexical resources such as dictionaries are unable to compute the similarity between named-entities, which are not well covered by dictionaries. Chapter 3 studies the problem of relational similarity. Given two word-pairs (A,B) and (C,D), I propose a relational similarity measure, relsim((A;B); (C;D)) to compute the similarity between the implicit semantic relations that hold between the two words A and B, and C and D. To represent the implicitly stated semantic relations between two words, I extract lexical patterns from the snippets retrieved from a web search engine for the two words. However, not all lexical patterns describe a different semantic relation. Some relations can be represented by more than one lexical pattern. For example, both patterns X is a Y, and Y such as X describe a hypernymic relation between X and Y. Then the extracted patterns are clustered using distributional similarity to identify the different patterns that describe a particular semantic relation. Finally, machine learning approaches are used to compute the relational similarity between two given word-pairs using the lexical patterns extracted for each word-pair. I experiment with support vector machines, and information theoretic metric learning approach to learn a relational similarity measure. The second half of this thesis describes the applications of semantic and relational similarity. As a working problem, I concentrate on personal name disambiguation on the web. A name of a person can be ambiguous on the web because of two main reasons. First, different people can share the same name (namesake disambiguation problem). Second, a single individual can have multiple aliases on the web (alias detection problem). Chapter 4 analyzes the namesake disambiguation problem, whereas, Chapter 5 focuses on the alias detection problem. I propose fully automatic methods to solve both these problems with a high accuracy. Extracting attributes for a particular person such as date of birth, nationality, affiliation, job title, etc. is particularly helpful to disambiguate that person from his or her namesakes on the web. I present some preliminary work that I have conducted in this area in Chapter 6. In Chapter 7, I present a relational model of semantic similarity that links relational and semantic similarity measures that were introduced in the thesis. In contract to the feature model of semantic similarity, which models objects using their attributes, the relational model attempts to compute the semantic similarity between two given words directly using the numerous semantic relations that hold between the two words. In Chapter 8, I conclude this thesis with a description of potential future work in web-based similarity measurement.
書誌情報 発行日 2009-09
日本十進分類法
主題Scheme NDC
主題 548
学位名
学位名 博士(情報理工学)
学位
値 doctoral
学位分野
Information Science and Technology (情報理工学)
学位授与機関
学位授与機関名 University of Tokyo (東京大学)
研究科・専攻
Department of Information and Communication Engineering, Graduate School of Information Science and Technology (情報理工学系研究科電子情報学専攻)
学位授与年月日
学位授与年月日 2009-09-28
学位授与番号
学位授与番号 甲第25374号
学位記番号
博情第256号
戻る
0
views
See details
Views

Versions

Ver.1 2021-03-01 19:59:32.297832
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3