Date of Award

2011

Degree Type

Thesis

Degree Name

Computer Science Honours

School

School of Computer and Security Science

Faculty

Computing, Health and Science

First Advisor

Dr. Mike Johnstone

Abstract

This thesis investigates research performed in the area of natural language processing. It is the aim of this research to compare a selection of predominant word sense disambiguation algorithms, and also determine if they can be optimised by small changes to the parameters used by the algorithms. To perform this research, several word sense disambiguation algorithms will be implemented in Java, and run on a range of test corpora. The algorithms will be judged on metrics such as speed and accuracy, and any other results obtained; while an algorithm may be fast and accurate, there may be other factors making it less desirable. Finally, to demonstrate the purpose and usefulness of using better algorithms, the algorithms will be used in conjunction with a real world application. Five algorithms were used in this research: The standard Lesk algorithm, the simplified Lesk algorithm, a Lesk algorithm variant using hypernyms, a Lesk algorithm variant using synonyms, and a baseline performance algorithm. While the baseline algorithm should have been less accurate than the other algorithms, testing found that it could disambiguate words more accurately than any of the other algorithms, seemingly because the baseline makes use of statistical data in WordNet, the machine readable dictionary used for testing; data unable to be used by the other algorithms. However, with a few modifications, the Simplified Lesk algorithm was able to reach performance just a few percent lower than that of the baseline algorithm. It is the aim of this research to apply word sense disambiguation to automatic concept mapping, to determine if more accurate algorithms are able to display noticeably better results in a real world application. It was found in testing, that the overall accuracy of the algorithm had little effect on the quality of concept maps produced, but rather depended on the text being examined.

Share

 
COinS