Language Evolves, so should WordNet - Automatically Extending WordNet ...

Language Evolves, so should WordNet - Automatically Extending WordNet with the Senses of Out of Vocabulary Lemmas

A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

OF THE UNIVERSITY OF MINNESOTA BY

Jonathan Rusert

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

Ted Pedersen

April 2017

? Jonathan Rusert 2017

Acknowledgements

I would like to acknowledge and support those individuals who helped motivate and gave guidance along the way to completing this thesis.

First, I would like to thank my advisor Dr. Ted Pedersen. Dr. Pedersen always gave timely and useful feedback on both the thesis code, and also thesis text. He was also patient but helped keep me on the track to completing my thesis in a timely manner.

Second, I would like to thank Dr. Pete Willemsen, our director of graduate studies. Like Dr. Pedersen, Dr. Willemsen provided a good base of knowledge in the writing of the thesis. He also helped answer any thesis questions and provided a sublime starting motivation when writing the thesis.

Third, I would like to thank Dr. Bruce Peckham for being on my thesis committee and provided support and feedback for my thesis.

Next, I would like to thank my fellow grad students for their help and support throughout the thesis process. They were helpful in completing the classes required, and also for keeping me sane during this process.

Finally, I would like to thank my good friends and family for their support throughout this process. I would specifically like to thank Ankit, Austin, and Mitch for providing motivational support as friends. Also, I would like to thank my parents, Nathan and Lori Rusert, for their continued support.

Thank you.

i

Dedication To my parents, Nathan and Lori Rusert. Thank you for your continued love and support.

ii

Abstract This thesis provides a solution which finds the optimal location to insert the sense of a word not currently found in lexical database WordNet. Currently WordNet contains common words that are already well established in the English language. However, there are many technical terms and examples of jargon that suddenly become popular, and new slang expressions and idioms that arise. WordNet will only stay viable to the degree to which it can incorporate such terminology in an automatic and reliable fashion. To solve this problem we have developed an approach which measures the relatedness of the definition of a novel sense with all of the definitions of all of senses with the same part of speech in WordNet. These measurements were done using a variety of measures, including Extended Gloss Overlaps, Gloss Vectors, and Word2Vec. After identifying the most related definition to the novel sense, we determine if this sense should be merged as a synonym or attached as a hyponym to an existing sense. Our method participated in a shared task on Semantic Taxonomy Enhancement conducted as a part of SemeEval-2016 are fared much better than a random baseline and was comparable to various other participating systems. This approach is not only effective it represents a departure from existing techniques and thereby expands the range of possible solutions to this problem.

iii

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download