Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering

Nagarajan, Senthil Murugan ORCID: https://orcid.org/0000-0001-9284-7724, Devarajan, Ganesh Gopal ORCID: https://orcid.org/0000-0003-0036-7841, Jerlin M, Asha, Arockiam, Daniel ORCID: https://orcid.org/0000-0001-5564-2332, Bashir, Ali Kashif ORCID: https://orcid.org/0000-0003-2601-9327 and Al Dabel, Maryam M ORCID: https://orcid.org/0000-0003-4371-8939 (2025) Deep Multi-Source Visual Fusion With Transformer Model for Video Content Filtering. IEEE Journal on Selected Topics in Signal Processing, 19 (4). pp. 613-622. ISSN 1932-4553

Preview

Accepted Version
Available under License Creative Commons Attribution.
Download (2MB) | Preview

Official URL: https://doi.org/10.1109/jstsp.2025.3569446

Abstract

As YouTube content continues to grow, advanced filtering systems are crucial to ensuring a safe and enjoyable user experience. We present MFusTSVD, a multi-modal model for classifying YouTube video content by analyzing text, audio, and video images. MFusTSVD uses specialized methods to extract features from audio and video images, while processing text data with BERT Transformers. Our key innovation includes two new BERT-based multi-modal fusion methods: B-SMTLMF and B-CMTLRMF. These methods combine features from different data types and improve the model's ability to understand each type of data, including detailed audio patterns, leading to better content classification and speech-related separation. MFusTSVD is designed to perform better than existing models in terms of accuracy, precision, recall, and F-measure. Tests show that MFusTSVD consistently outperforms popular models like Memory Fusion Network, Early Fusion LSTM, Late Fusion LSTM, and multi-modal Transformer across different content types and evaluation measures. In particular, MFusTSVD effectively balances precision and recall, which makes it especially useful for identifying inappropriate speech and audio content, as well as broader categories, ensuring reliable and robust content moderation.

Item Type:	Article (Article)
Peer-reviewed:	Yes
Date Deposited:	10 Jul 2025 12:35
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Additional Information:	This is an Author Accepted Manuscript of an article published in the IEEE Journal on Selected Topics in Signal Processings by IEEE.
Divisions:	Organisation > Science and Engineering Organisation > Science and Engineering > Department of Computing and Maths
Subject terms:	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering, 1005 Communications Technologies, Networking & Telecommunications, 4006 Communications engineering, 4603 Computer vision and multimedia computation
URI:	https://e-space.mmu.ac.uk/id/eprint/640565
DOI:	https://doi.org/10.1109/JSTSP.2025.3569446
ISSN	1932-4553
e-ISSN	1941-0484

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

43Downloads

6 month trend

78Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record