帮助 关于我们

返回检索结果

基于局部选择Vision Transformer的遥感场景分类算法
Remote Sensing Scene Classification Based on Local Selection Vision Transformer

查看参考文献24篇

杨凯 1,2   卢孝强 1 *  
文摘 遥感场景分类旨在为航空图像指定特定的语义标签,是遥感图像解译中一个基础且重要的任务。现有的研究主要利用卷积神经网络(CNN)学习全局和局部特征,提高网络的判别性表达。然而基于CNN的方法的感受野在建模局部特征的远程依赖性方面存在局限性。近年来,Vision Transformer(ViT)在传统的分类任务中表现出了强大的性能。 Transformer的自我注意力机制将每个Patch标记与分类标记连接起来,捕捉图像像素之间的上下文关系,考虑空间域中的全局信息。提出一个基于局部选择ViT的遥感场景分类网络。首先将输入图像分割成小块的Patch,将其展开转换成序列,并进行位置编码添加到序列中;然后将得到的序列输入编码器中;除此之外,为了学习到局部判别特征,在最后一层输入前加入局部选择模块,选择具有判别性的Token作为输入,得到最后用于分类的输出。实验结果表明,所提方法在两个大型遥感场景分类数据集(AID和NWPU)取得不错的效果。
其他语种文摘 Remote sensing scene classification aims to assign specific semantic labels to aerial images, which is a fundamental and important task in remote sensing image interpretation. Existing studies have used convolutional neural networks (CNN) to learn global and local features and improve the discriminative representation of networks. However, the perceptual wilderness of CNN-based approaches has limitations in modeling the remote dependence of local features. In recent years, Vision Transformer (ViT) has shown powerful performances in traditional classification tasks. Its selfattention mechanism connects each Patch with a classification token and captures the contextual relationship between image pixels by considering global information in the spatial domain. In this paper, we propose a remote sensing scene classification network based on local selection ViT, in which an input image is first segmented into small chunks of Patch that are unfolded and converted into sequences with position encoding; thereafter, the obtained sequences are fed into an encoder. In addition, a local selection module is added before the last layer of input in order to learn the local discriminative features, and Token with discriminative properties are selected as input to obtain the final classification output. The experimental results show that the proposed method achieves good results on two large remote sensing scene classification datasets (AID and NWPU).
来源 激光与光电子学进展 ,2023,60(22):2228005 【核心库】
DOI 10.3788/LOP230539
关键词 遥感场景分类 ; 深度学习 ; Vision Transformer ; 局部特征
地址

1. 中国科学院西安光学精密机械研究所, 中国科学院光谱成像技术重点实验室, 陕西, 西安, 710119  

2. 中国科学院大学, 北京, 100049

语种 中文
文献类型 研究性论文
ISSN 1006-4125
学科 自动化技术、计算机技术
基金 国家自然科学基金国家杰出青年科学基金
文献收藏号 CSCD:7622265

参考文献 共 24 共2页

1.  Xu C J. Multi-level alignment network for cross-domain ship detection. Remote Sensing,2022,14(10):2389 CSCD被引 3    
2.  Zheng X T. Deep-balanced discrete hashing for image retrieval. Neurocomputing,2020,403:224-236 CSCD被引 7    
3.  Zheng X T. Mutual attention inception network for remote-sensing visual question answering. IEEE Transactions on Geoscience and Remote Sensing,2022,60:1-14 CSCD被引 4    
4.  Lin J P. Applications of object detection networks in high-power laser systems and experiments. High Power Laser Science and Engineering,2023,11:e7 CSCD被引 18    
5.  Wu F. Machine-learning guided optimization of laser pulses for direct-drive implosions. High Power Laser Science and Engineering,2022,10:e12 CSCD被引 13    
6.  Wang X. Multilevel feature fusion networks with adaptive channel dimensionality reduction for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters,2022,19:8010205 CSCD被引 4    
7.  Ji J S. Combining multilevel features for remote sensing image scene classification with an attention model. IEEE Geoscience and Remote Sensing Letters,2020,17(9):1647-1651 CSCD被引 1    
8.  Chen X M. Remote sensing scene classification by local-global mutual learning. IEEE Geoscience and Remote Sensing Letters,2022,19:6506405 CSCD被引 1    
9.  Dosovitskiy A. An image is worth 16×16 words: transformers for image recognition at scale,2020 CSCD被引 47    
10.  Chen C F R. CrossViT: crossattention multi-scale vision transformer for image classification. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada,2022:347-356 CSCD被引 1    
11.  Strudel R. Segmenter: transformer for semantic segmentation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada,2022:7242-7252 CSCD被引 3    
12.  Liu Z. Swin Transformer: hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10-17, 2021, Montreal, QC, Canada,2022:9992-10002 CSCD被引 19    
13.  Song H. ViDT: an efficient and effective fully transformer-based object detector,2021 CSCD被引 2    
14.  Cui Y T. MixFormer: end-toend tracking with iterative mixed attention. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 18-24, 2022, New Orleans, LA, USA,2022:13598-13608 CSCD被引 2    
15.  Bazi Y. Vision transformers for remote-sensing image classification. Remote Sensing,2021,13(3):516 CSCD被引 17    
16.  Zhang J R. TRS: transformers for remote sensing scene classification. Remote Sensing,2021,13(20):4143 CSCD被引 6    
17.  Song H. ViDT: an efficient and effective fully transformer-based object detector,2021 CSCD被引 2    
18.  Abnar S. Quantifying attention flow in transformers,2020 CSCD被引 2    
19.  Sun H. Remote sensing scene classification by a gated bidirectional network. IEEE Transactions on Geoscience and Remote Sensing,2020,58(1):82-96 CSCD被引 12    
20.  Li E Z. Improved bilinear CNN model for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters,2022,19:8004305 CSCD被引 2    
引证文献 1

1 朱桐 基于提示学习的鸟类细粒度识别增量学习方法 激光与光电子学进展,2024,61(24):2437008
CSCD被引 0 次

显示所有1篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号