Adel, Naeemeh ORCID: https://orcid.org/0000-0003-4449-7410 (2022) Fuzzy natural language similarity measures through computing with words. Doctoral thesis (PhD), Manchester Metropolitan University.
|
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (6MB) | Preview |
Abstract
A vibrant area of research is the understanding of human language by machines to engage in conversation with humans to achieve set goals. Human language is naturally fuzzy by nature, with words meaning different things to different people, depending on the context. Fuzzy words are words with a subjective meaning, typically used in everyday human natural language dialogue and often ambiguous and vague in meaning and dependent on an individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can compare two or more short texts which contain fuzzy words and return a numeric measure of similarity of meaning between them. The motivation for this research is to create a new FSSM called FUSE (FUzzy Similarity mEasure). FUSE is an ontology-based similarity measure that uses Interval Type-2 Fuzzy Sets to model relationships between categories of human perception-based words. Four versions of FUSE (FUSE_1.0 – FUSE_4.0) have been developed, investigating the presence of linguistic hedges, the expansion of fuzzy categories and their use in natural language, incorporating logical operators such as ‘not’ and the introduction of the fuzzy influence factor. FUSE has been compared to several state-of-the-art, traditional semantic similarity measures (SSM’s) which do not consider the presence of fuzzy words. FUSE has also been compared to the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited dictionary of fuzzy words and uses Type-1 Fuzzy Sets to model relationships between categories of human perception-based words. Results have shown FUSE is able to improve on the limitations of traditional SSM’s and the FAST algorithm by achieving a higher correlation with the average human rating (AHR) compared to traditional SSM’s and FAST using several published and gold-standard datasets. To validate FUSE, in the context of a real-world application, versions of the algorithm were incorporated into a simple Question & Answer (Q&A) dialogue system (DS), referred to as FUSION, to evaluate the improvement of natural language understanding. FUSION was tested on two different scenarios using human participants and results compared to a traditional SSM known as STASIS. Results of the DS experiments showed a True rating of 88.65% compared to STASIS with an average True rating of 61.36%. Results showed that the FUSE algorithm can be used within real world applications and evaluation of the DS showed an improvement of natural language understanding, allowing semantic similarity to be calculated more accurately from natural user responses. The key contributions of this work can be summarised as follows: The development of a new methodology to model fuzzy words using Interval Type-2 fuzzy sets; leading to the creation of a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other researchers in the field of natural language processing and Computing with Words with other fuzzy applications such as semantic clustering. The development of a FSSM known as FUSE, which was expanded over four versions, investigating the incorporation of linguistic hedges, the expansion of fuzzy categories and their use in natural language, inclusion of logical operators such as ‘not’ and the introduction of the fuzzy influence factor. Integration of the FUSE algorithm into a simple Q&A DS referred to as FUSION demonstrated that FSSM can be used in a real-world practical implementation, therefore making FUSE and its fuzzy dictionary generalisable to other applications.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.