{"created":"2021-03-01T06:19:02.464795+00:00","id":2432,"links":{},"metadata":{"_buckets":{"deposit":"394f7ee8-3ffb-442c-81f2-2068e414fbf8"},"_deposit":{"id":"2432","owners":[],"pid":{"revision_id":0,"type":"depid","value":"2432"},"status":"published"},"_oai":{"id":"oai:repository.dl.itc.u-tokyo.ac.jp:00002432","sets":["34:105:330","9:233:280"]},"item_7_alternative_title_1":{"attribute_name":"その他のタイトル","attribute_value_mlt":[{"subitem_alternative_title":"ウェブ上での単語対間の属性類似性に関する研究"}]},"item_7_biblio_info_7":{"attribute_name":"書誌情報","attribute_value_mlt":[{"bibliographicIssueDates":{"bibliographicIssueDate":"2009-09","bibliographicIssueDateType":"Issued"},"bibliographic_titles":[{}]}]},"item_7_date_granted_25":{"attribute_name":"学位授与年月日","attribute_value_mlt":[{"subitem_dategranted":"2009-09-28"}]},"item_7_degree_grantor_23":{"attribute_name":"学位授与機関","attribute_value_mlt":[{"subitem_degreegrantor":[{"subitem_degreegrantor_name":"University of Tokyo (東京大学)"}]}]},"item_7_degree_name_20":{"attribute_name":"学位名","attribute_value_mlt":[{"subitem_degreename":"博士(情報理工学)"}]},"item_7_description_5":{"attribute_name":"抄録","attribute_value_mlt":[{"subitem_description":"Similarity is a fundamental concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Similarity provides the basis for learning, generalization and recognition. Similarity can be broadly divided into two types: semantic (or attributional) similarity, and relational similarity. Attributional similarity is the correspondence between the attributes of two objects. If two objects have identical or close attributes, then those two objects are considered attributionally similar. For example, the two concepts, car and automobile, both have an identical set of attributes: four wheels, doors, and both are used for transportation. Consequently, the two words car and automobile show a high degree of semantic (attributional) similarity and are considered synonymous. On the other hand, relational similarity is the correspondence between the implicit semantic relations that exist between two pairs of words. For example, consider the two word-pairs (ostrich, bird) and (lion, cat). Ostrich is a large bird and lion is a large cat. The implicitly stated semantic relation is a large holds between the two words in each word-pair. Therefore, those two word-pairs are considered relationally similar. Typically, word analogies show a high degree of relational similarity. This thesis addresses the problem of measuring both semantic (attributional) and relational similarity between words or pairs of words from the web. I define the two types of similarity in detail and analyze the concept of similarity from numerous view-points, its philosophical, linguistic, and mathematical interpretations. In particular, I compare different models such as the contrast model, transformational model and relational model, of similarity. I presents a supervised approach to measure the semantic similarity between two words using a web search engine. To measure the attributional similarity between two given words, first, I search for those words individually and also in a conjunctive combination in a web search engine. I then extract lexical patterns that describe numerous semantic relations between the two words under consideration. Moreover, I compute popular co-occurrence measures such as the Jaccard coefficient, Overlap coefficient, Dice coefficient, and pointwise mutual information, using the page counts retreived from a search engine. All those measures are integrated with lexical patterns through a machine learning framework. The training data for the algorithm are selected from synsets inWordNet. The proposed method reports a high correlation with human ratings in a benchmark dataset for semantic similarity. Moreover, The proposed semantic similarity is used in a community clustering task and a word sense disambiguation task. Both those tasks show the ability of the proposed semantic similarity measure to accurately compute the similarity between named-entities. This is particularly useful because semantic similarity measures that require manually created lexical resources such as dictionaries are unable to compute the similarity between named-entities, which are not well covered by dictionaries. Chapter 3 studies the problem of relational similarity. Given two word-pairs (A,B) and (C,D), I propose a relational similarity measure, relsim((A;B); (C;D)) to compute the similarity between the implicit semantic relations that hold between the two words A and B, and C and D. To represent the implicitly stated semantic relations between two words, I extract lexical patterns from the snippets retrieved from a web search engine for the two words. However, not all lexical patterns describe a different semantic relation. Some relations can be represented by more than one lexical pattern. For example, both patterns X is a Y, and Y such as X describe a hypernymic relation between X and Y. Then the extracted patterns are clustered using distributional similarity to identify the different patterns that describe a particular semantic relation. Finally, machine learning approaches are used to compute the relational similarity between two given word-pairs using the lexical patterns extracted for each word-pair. I experiment with support vector machines, and information theoretic metric learning approach to learn a relational similarity measure. The second half of this thesis describes the applications of semantic and relational similarity. As a working problem, I concentrate on personal name disambiguation on the web. A name of a person can be ambiguous on the web because of two main reasons. First, different people can share the same name (namesake disambiguation problem). Second, a single individual can have multiple aliases on the web (alias detection problem). Chapter 4 analyzes the namesake disambiguation problem, whereas, Chapter 5 focuses on the alias detection problem. I propose fully automatic methods to solve both these problems with a high accuracy. Extracting attributes for a particular person such as date of birth, nationality, affiliation, job title, etc. is particularly helpful to disambiguate that person from his or her namesakes on the web. I present some preliminary work that I have conducted in this area in Chapter 6. In Chapter 7, I present a relational model of semantic similarity that links relational and semantic similarity measures that were introduced in the thesis. In contract to the feature model of semantic similarity, which models objects using their attributes, the relational model attempts to compute the semantic similarity between two given words directly using the numerous semantic relations that hold between the two words. In Chapter 8, I conclude this thesis with a description of potential future work in web-based similarity measurement.","subitem_description_type":"Abstract"}]},"item_7_dissertation_number_26":{"attribute_name":"学位授与番号","attribute_value_mlt":[{"subitem_dissertationnumber":"甲第25374号"}]},"item_7_full_name_3":{"attribute_name":"著者別名","attribute_value_mlt":[{"nameIdentifiers":[{"nameIdentifier":"6700","nameIdentifierScheme":"WEKO"}],"names":[{"name":"ボッレーガラ, ダヌシカ"}]}]},"item_7_identifier_registration":{"attribute_name":"ID登録","attribute_value_mlt":[{"subitem_identifier_reg_text":"10.15083/00002426","subitem_identifier_reg_type":"JaLC"}]},"item_7_select_21":{"attribute_name":"学位","attribute_value_mlt":[{"subitem_select_item":"doctoral"}]},"item_7_subject_13":{"attribute_name":"日本十進分類法","attribute_value_mlt":[{"subitem_subject":"548","subitem_subject_scheme":"NDC"}]},"item_7_text_22":{"attribute_name":"学位分野","attribute_value_mlt":[{"subitem_text_value":"Information Science and Technology (情報理工学)"}]},"item_7_text_24":{"attribute_name":"研究科・専攻","attribute_value_mlt":[{"subitem_text_value":"Department of Information and Communication Engineering, Graduate School of Information Science and Technology (情報理工学系研究科電子情報学専攻)"}]},"item_7_text_27":{"attribute_name":"学位記番号","attribute_value_mlt":[{"subitem_text_value":"博情第256号"}]},"item_7_text_4":{"attribute_name":"著者所属","attribute_value_mlt":[{"subitem_text_value":"大学院情報理工学系研究科電子情報学専攻"},{"subitem_text_value":"Graduate School of Information Science and Technology Department of Information and Communication Engineering The University of Tokyo"}]},"item_creator":{"attribute_name":"著者","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"BOLLEGALA, DANUSHKA"}],"nameIdentifiers":[{"nameIdentifier":"6699","nameIdentifierScheme":"WEKO"}]}]},"item_files":{"attribute_name":"ファイル情報","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_date","date":[{"dateType":"Available","dateValue":"2017-05-31"}],"displaytype":"detail","filename":"48077408.pdf","filesize":[{"value":"1.2 MB"}],"format":"application/pdf","licensetype":"license_note","mimetype":"application/pdf","url":{"label":"48077408.pdf","url":"https://repository.dl.itc.u-tokyo.ac.jp/record/2432/files/48077408.pdf"},"version_id":"91767dbc-8727-4466-88d5-82baf21dfce4"}]},"item_keyword":{"attribute_name":"キーワード","attribute_value_mlt":[{"subitem_subject":"similarity","subitem_subject_scheme":"Other"},{"subitem_subject":"Web Mining","subitem_subject_scheme":"Other"}]},"item_language":{"attribute_name":"言語","attribute_value_mlt":[{"subitem_language":"eng"}]},"item_resource_type":{"attribute_name":"資源タイプ","attribute_value_mlt":[{"resourcetype":"thesis","resourceuri":"http://purl.org/coar/resource_type/c_46ec"}]},"item_title":"A study on attributional and relational similarity between word pairs on the Web","item_titles":{"attribute_name":"タイトル","attribute_value_mlt":[{"subitem_title":"A study on attributional and relational similarity between word pairs on the Web"}]},"item_type_id":"7","owner":"1","path":["280","330"],"pubdate":{"attribute_name":"公開日","attribute_value":"2012-03-01"},"publish_date":"2012-03-01","publish_status":"0","recid":"2432","relation_version_is_last":true,"title":["A study on attributional and relational similarity between word pairs on the Web"],"weko_creator_id":"1","weko_shared_id":null},"updated":"2022-12-19T04:50:33.029109+00:00"}