Ye, Hong, Cai, Jijing ORCID: https://orcid.org/0009-0009-9965-8454, Deng, Jiangtao
ORCID: https://orcid.org/0009-0000-9688-2631, Wang, Xiaodong
ORCID: https://orcid.org/0009-0001-1730-0080, Bashir, Ali Kashif
ORCID: https://orcid.org/0000-0003-2601-9327, Fang, Kai
ORCID: https://orcid.org/0000-0003-0419-1468 and Wang, Wei
(2025)
Efficient Machine Learning-Based Semantic Segmentation Algorithm for Consumer-Grade UAV Remote Sensing.
IEEE Transactions on Consumer Electronics.
pp. 1-14.
ISSN 0098-3063
|
Accepted Version
Available under License Creative Commons Attribution. Download (5MB) | Preview |
Abstract
The computational complexity of the Transformer model grows quadratically with input sequence length. This causes a sharp increase in computational cost and memory consumption for high-resolution remote sensing images. Consequently, its application in consumer-grade unmanned aerial vehicle remote sensing is limited. To address this issue, we propose an efficient machine learning-based semantic segmentation algorithm (EMLSSA). First, EMLSSA incorporates the hash clustering attention (HCAttention) mechanism. It employs the locality-sensitive hashing (LSH) algorithm to group similar features into hash buckets, enabling dynamic token clustering. Subsequently, tokens in the same hash bucket are aggregated by weighted summation. This compresses features and reduces the computational complexity of self-attention. Second, EMLSSA incorporates the frequency multi-layer perceptron (FMLP) mechanism. It combines frequency and spatial domain information, enhancing the ability of the Transformer to perceive local features. Experimental results show that EMLSSA-B4 reduces computational cost by 11.7% on FLAME, PWD, EarthVQA, and Potsdam datasets. Furthermore, it maintains comparable segmentation performance to SegFormer-B4.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.