e-space
Manchester Metropolitan University's Research Repository

    Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering

    Nagarajan, Senthil Murugan ORCID logoORCID: https://orcid.org/0000-0001-9284-7724, Devarajan, Ganesh Gopal ORCID logoORCID: https://orcid.org/0000-0003-0036-7841, Jerlin M, Asha, Arockiam, Daniel ORCID logoORCID: https://orcid.org/0000-0001-5564-2332, Bashir, Ali Kashif ORCID logoORCID: https://orcid.org/0000-0003-2601-9327 and Al Dabel, Maryam M ORCID logoORCID: https://orcid.org/0000-0003-4371-8939 (2025) Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering. IEEE Journal on Selected Topics in Signal Processing, 19 (4). pp. 613-622. ISSN 1932-4553

    [img]
    Preview
    Accepted Version
    Available under License Creative Commons Attribution.

    Download (2MB) | Preview

    Abstract

    As YouTube content continues to grow, advanced filtering systems are crucial to ensuring a safe and enjoyable user experience. We present MFusTSVD, a multi-modal model for classifying YouTube video content by analyzing text, audio, and video images. MFusTSVD uses specialized methods to extract features from audio and video images, while processing text data with BERT Transformers. Our key innovation includes two new BERT-based multi-modal fusion methods: B-SMTLMF and B-CMTLRMF. These methods combine features from different data types and improve the model's ability to understand each type of data, including detailed audio patterns, leading to better content classification and speech-related separation. MFusTSVD is designed to perform better than existing models in terms of accuracy, precision, recall, and F-measure. Tests show that MFusTSVD consistently outperforms popular models like Memory Fusion Network, Early Fusion LSTM, Late Fusion LSTM, and multi-modal Transformer across different content types and evaluation measures. In particular, MFusTSVD effectively balances precision and recall, which makes it especially useful for identifying inappropriate speech and audio content, as well as broader categories, ensuring reliable and robust content moderation.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    18Downloads
    6 month trend
    38Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record