An automatic corpus based method for a building Multiple Fuzzy Word Dataset

Chandran, D, Crockett, K, Mclean, D and Crispin, AJ (2015) An automatic corpus based method for a building Multiple Fuzzy Word Dataset. In: IEEE-FUZZ 2015.

Preview

Available under License In Copyright.
Download (559kB) | Preview

Abstract

Fuzzy sentence semantic similarity measures are designed to be applied to real world problems where a computer system is required to assess the similarity between human natural language and words or prototype sentences stored within a knowledge base. Such measures are often developed for a specific corpus/domain where a limited set of words and sentences are evaluated. As new “fuzzy” measures are developed the research challenge is on how to evaluate them. Traditional approaches have involved rigorous and complex human involvement in compiling benchmark datasets and obtaining human similarity measures. Existing datasets often contain limited fuzzy words and do allow the fuzzy measures to be exhaustively tested. This paper presents an automatic method for the generation of a Multiple Fuzzy Word Dataset (MFWD) from a corpus. A Fuzzy Sentence Pairing Algorithm is used to extract and augment high, medium and low similarity sentence pairs with multiple fuzzy words. Human ratings are collected through crowdsourcing and the MFWD is evaluated using both fuzzy and traditional sentence similarity measures. The results indicated that fuzzy measures returned a higher correlation with human ratings compared with traditional measures.

Item Type:	Conference or Workshop Item
Peer-reviewed:	No
Date Deposited:	18 May 2016 13:46
Publisher:	IEEE
Additional Information:	This is an Author Final Copy of a paper published in Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International Conference, published by and copyright IEEE.
Divisions:	Faculties > Science and Engineering
URI:	https://e-space.mmu.ac.uk/id/eprint/609602
DOI:	https://doi.org/10.1109/FUZZ-IEEE.2015.7337877

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

356Downloads

6 month trend

563Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record