e-space
Manchester Metropolitan University's Research Repository

    The development of a fuzzy semantic sentence similarity measure

    Chandran, Gautam David (2013) The development of a fuzzy semantic sentence similarity measure. Doctoral thesis (PhD), Manchester Metropolitan University.

    [img]
    Preview

    Available under License Creative Commons Attribution Non-commercial No Derivatives.

    Download (1MB) | Preview

    Abstract

    A problem in the field of semantic sentence similarity is the inability of sentence similarity measures to accurately represent the effect perception based (fuzzy) words, which are commonly used in natural language, have on sentence similarity. This research project developed a new sentence similarity measure to solve this problem. The new measure, Fuzzy Algorithm for Similarity Testing (FAST) is a novel ontology-based similarity measure that uses concepts of fuzzy and computing with words to allow for the accurate representation of fuzzy based words. Through human experimentation fuzzy sets were created for six categories of words based on their levels of association with particular concepts. These fuzzy sets were then defuzzified and the results used to create new ontological relations between the fuzzy words contained within them and from that a new fuzzy ontology was created. Using these relationships allows for the creation of a new ontology-based fuzzy semantic text similarity algorithm that is able to show the effect of fuzzy words on computing sentence similarity as well as the effect that fuzzy words have on non-fuzzy words within a sentence. In order to evaluate FAST, two new test datasets were created through the use of questionnaire based human experimentation. This involved the generation of a robust methodology for creating usable fuzzy datasets (including an automated method that was used to create one of the two fuzzy datasets). FAST was evaluated through experiments conducted using the new fuzzy datasets. The results of the evaluation showed that there was an improved level of correlation between FAST and human test results over two existing sentence similarity measures demonstrating its success in representing the similarity between pairs of sentences containing fuzzy words.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    124Downloads
    6 month trend
    265Hits

    Additional statistics for this dataset are available via IRStats2.

    Repository staff only

    Edit record Edit record