基于局部选择Vision Transformer的遥感场景分类算法
Remote Sensing Scene Classification Based on Local Selection Vision Transformer
查看参考文献24篇
文摘
|
遥感场景分类旨在为航空图像指定特定的语义标签,是遥感图像解译中一个基础且重要的任务。现有的研究主要利用卷积神经网络(CNN)学习全局和局部特征,提高网络的判别性表达。然而基于CNN的方法的感受野在建模局部特征的远程依赖性方面存在局限性。近年来,Vision Transformer(ViT)在传统的分类任务中表现出了强大的性能。 Transformer的自我注意力机制将每个Patch标记与分类标记连接起来,捕捉图像像素之间的上下文关系,考虑空间域中的全局信息。提出一个基于局部选择ViT的遥感场景分类网络。首先将输入图像分割成小块的Patch,将其展开转换成序列,并进行位置编码添加到序列中;然后将得到的序列输入编码器中;除此之外,为了学习到局部判别特征,在最后一层输入前加入局部选择模块,选择具有判别性的Token作为输入,得到最后用于分类的输出。实验结果表明,所提方法在两个大型遥感场景分类数据集(AID和NWPU)取得不错的效果。 |
其他语种文摘
|
Remote sensing scene classification aims to assign specific semantic labels to aerial images, which is a fundamental and important task in remote sensing image interpretation. Existing studies have used convolutional neural networks (CNN) to learn global and local features and improve the discriminative representation of networks. However, the perceptual wilderness of CNN-based approaches has limitations in modeling the remote dependence of local features. In recent years, Vision Transformer (ViT) has shown powerful performances in traditional classification tasks. Its selfattention mechanism connects each Patch with a classification token and captures the contextual relationship between image pixels by considering global information in the spatial domain. In this paper, we propose a remote sensing scene classification network based on local selection ViT, in which an input image is first segmented into small chunks of Patch that are unfolded and converted into sequences with position encoding; thereafter, the obtained sequences are fed into an encoder. In addition, a local selection module is added before the last layer of input in order to learn the local discriminative features, and Token with discriminative properties are selected as input to obtain the final classification output. The experimental results show that the proposed method achieves good results on two large remote sensing scene classification datasets (AID and NWPU). |
来源
|
激光与光电子学进展
,2023,60(22):2228005 【核心库】
|
DOI
|
10.3788/LOP230539
|
关键词
|
遥感场景分类
;
深度学习
;
Vision Transformer
;
局部特征
|
地址
|
1.
中国科学院西安光学精密机械研究所, 中国科学院光谱成像技术重点实验室, 陕西, 西安, 710119
2.
中国科学院大学, 北京, 100049
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1006-4125 |
学科
|
自动化技术、计算机技术 |
基金
|
国家自然科学基金国家杰出青年科学基金
|
文献收藏号
|
CSCD:7622265
|
参考文献 共
24
共2页
|
1.
Xu C J. Multi-level alignment network for cross-domain ship detection.
Remote Sensing,2022,14(10):2389
|
CSCD被引
3
次
|
|
|
|
2.
Zheng X T. Deep-balanced discrete hashing for image retrieval.
Neurocomputing,2020,403:224-236
|
CSCD被引
7
次
|
|
|
|
3.
Zheng X T. Mutual attention inception network for remote-sensing visual question answering.
IEEE Transactions on Geoscience and Remote Sensing,2022,60:1-14
|
CSCD被引
4
次
|
|
|
|
4.
Lin J P. Applications of object detection networks in high-power laser systems and experiments.
High Power Laser Science and Engineering,2023,11:e7
|
CSCD被引
18
次
|
|
|
|
5.
Wu F. Machine-learning guided optimization of laser pulses for direct-drive implosions.
High Power Laser Science and Engineering,2022,10:e12
|
CSCD被引
13
次
|
|
|
|
6.
Wang X. Multilevel feature fusion networks with adaptive channel dimensionality reduction for remote sensing scene classification.
IEEE Geoscience and Remote Sensing Letters,2022,19:8010205
|
CSCD被引
4
次
|
|
|
|
7.
Ji J S. Combining multilevel features for remote sensing image scene classification with an attention model.
IEEE Geoscience and Remote Sensing Letters,2020,17(9):1647-1651
|
CSCD被引
1
次
|
|
|
|
8.
Chen X M. Remote sensing scene classification by local-global mutual learning.
IEEE Geoscience and Remote Sensing Letters,2022,19:6506405
|
CSCD被引
1
次
|
|
|
|
9.
Dosovitskiy A.
An image is worth 16×16 words: transformers for image recognition at scale,2020
|
CSCD被引
47
次
|
|
|
|
10.
Chen C F R. CrossViT: crossattention multi-scale vision transformer for image classification.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada,2022:347-356
|
CSCD被引
1
次
|
|
|
|
11.
Strudel R. Segmenter: transformer for semantic segmentation.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada,2022:7242-7252
|
CSCD被引
3
次
|
|
|
|
12.
Liu Z. Swin Transformer: hierarchical vision transformer using shifted windows.
2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada,2022:9992-10002
|
CSCD被引
19
次
|
|
|
|
13.
Song H.
ViDT: an efficient and effective fully transformer-based object detector,2021
|
CSCD被引
2
次
|
|
|
|
14.
Cui Y T. MixFormer: end-toend tracking with iterative mixed attention.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 18-24, 2022, New Orleans, LA, USA,2022:13598-13608
|
CSCD被引
2
次
|
|
|
|
15.
Bazi Y. Vision transformers for remote-sensing image classification.
Remote Sensing,2021,13(3):516
|
CSCD被引
17
次
|
|
|
|
16.
Zhang J R. TRS: transformers for remote sensing scene classification.
Remote Sensing,2021,13(20):4143
|
CSCD被引
6
次
|
|
|
|
17.
Song H.
ViDT: an efficient and effective fully transformer-based object detector,2021
|
CSCD被引
2
次
|
|
|
|
18.
Abnar S.
Quantifying attention flow in transformers,2020
|
CSCD被引
2
次
|
|
|
|
19.
Sun H. Remote sensing scene classification by a gated bidirectional network.
IEEE Transactions on Geoscience and Remote Sensing,2020,58(1):82-96
|
CSCD被引
12
次
|
|
|
|
20.
Li E Z. Improved bilinear CNN model for remote sensing scene classification.
IEEE Geoscience and Remote Sensing Letters,2022,19:8004305
|
CSCD被引
2
次
|
|
|
|
|