帮助关于我们

返回检索结果

基于局部选择Vision Transformer的遥感场景分类算法
Remote Sensing Scene Classification Based on Local Selection Vision Transformer

查看参考文献24篇

杨凯 ^1,2 卢孝强 ¹ ^*

文摘	遥感场景分类旨在为航空图像指定特定的语义标签,是遥感图像解译中一个基础且重要的任务。现有的研究主要利用卷积神经网络(CNN)学习全局和局部特征,提高网络的判别性表达。然而基于CNN的方法的感受野在建模局部特征的远程依赖性方面存在局限性。近年来,Vision Transformer(ViT)在传统的分类任务中表现出了强大的性能。 Transformer的自我注意力机制将每个Patch标记与分类标记连接起来,捕捉图像像素之间的上下文关系,考虑空间域中的全局信息。提出一个基于局部选择ViT的遥感场景分类网络。首先将输入图像分割成小块的Patch,将其展开转换成序列,并进行位置编码添加到序列中;然后将得到的序列输入编码器中;除此之外,为了学习到局部判别特征,在最后一层输入前加入局部选择模块,选择具有判别性的Token作为输入,得到最后用于分类的输出。实验结果表明,所提方法在两个大型遥感场景分类数据集(AID和NWPU)取得不错的效果。
其他语种文摘	Remote sensing scene classification aims to assign specific semantic labels to aerial images, which is a fundamental and important task in remote sensing image interpretation. Existing studies have used convolutional neural networks (CNN) to learn global and local features and improve the discriminative representation of networks. However, the perceptual wilderness of CNN-based approaches has limitations in modeling the remote dependence of local features. In recent years, Vision Transformer (ViT) has shown powerful performances in traditional classification tasks. Its selfattention mechanism connects each Patch with a classification token and captures the contextual relationship between image pixels by considering global information in the spatial domain. In this paper, we propose a remote sensing scene classification network based on local selection ViT, in which an input image is first segmented into small chunks of Patch that are unfolded and converted into sequences with position encoding; thereafter, the obtained sequences are fed into an encoder. In addition, a local selection module is added before the last layer of input in order to learn the local discriminative features, and Token with discriminative properties are selected as input to obtain the final classification output. The experimental results show that the proposed method achieves good results on two large remote sensing scene classification datasets (AID and NWPU).
来源	激光与光电子学进展 ,2023,60(22):2228005 【核心库】
DOI	10.3788/LOP230539
关键词	遥感场景分类 ; 深度学习 ; Vision Transformer ; 局部特征
地址	1. 中国科学院西安光学精密机械研究所, 中国科学院光谱成像技术重点实验室, 陕西, 西安, 710119 2. 中国科学院大学, 北京, 100049
语种	中文
文献类型	研究性论文
ISSN	1006-4125
学科	自动化技术、计算机技术
基金	国家自然科学基金国家杰出青年科学基金
文献收藏号	CSCD:7622265

参考文献共 24 共2页

引证文献 1 篇

1 朱桐基于提示学习的鸟类细粒度识别增量学习方法激光与光电子学进展,2024,61(24):2437008
CSCD被引 0 次

显示所有1篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号