e-space
Manchester Metropolitan University's Research Repository

    The Effect of Image Similarity on Melanoma Classification

    Xie, Hongyuan ORCID logoORCID: https://orcid.org/0009-0001-8256-8681 and Zhang, Yanlong ORCID logoORCID: https://orcid.org/0000-0002-9046-2289 (2023) The Effect of Image Similarity on Melanoma Classification. In: ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing, pp. 89-95. Presented at ICCBDC 2023: 2023 7th International Conference on Cloud and Big Data Computing, 17-19 August 2023, Manchester, UK.

    [img]
    Preview
    Published Version
    Available under License Creative Commons Attribution.

    Download (4MB) | Preview

    Abstract

    Skin cancer is one of the most common types of cancer, with research now increasingly focused on the use of deep learning algorithms to perform diagnosis in experimental settings. Deep neural networks can be used to assist early detection; however, accuracy can be highly reliant on aspects such as dataset quality and class distribution. This study investigates the impact on melanoma classification when using images that are visually similar from the publicly available ISIC 2019 dataset. The negative effect of image duplication is well known in deep learning; however, the effect of image similarity is an under-researched topic. In this work, we used an open source image similarity algorithm to identify similar images in the ISIC 2019 dataset. We identify groups of similar images at different similarity thresholds and investigate the effect of removing each threshold on a classification model. We then evaluate the best performing model on the ISIC 2019 datatest. Our results show that the best performing model was DenseNet201 when trained using the 100% similarity threshold images, and InceptionResNetV2 when trained using the 95% similarity threshold images. These results indicate that highly similar images present in the ISIC 2019 training set result in performance degrading bias, and that their removal shows in a boost to model performance.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    9Downloads
    6 month trend
    21Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record