帮助 关于我们

返回检索结果

多域字符距离感知的场景文本图像超分辨率重建
Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance

查看参考文献36篇

文摘 场景文本图像超分辨率(Scene Text Image Super-Resolution, STISR)旨在提高文本在低分辨率图像中的分辨率和可读性.但是在空间变形或低分辨率的文本图像中,由于缺乏文本区域细节,语义线索和视觉特征信息难以与字符位置匹配对齐,文本识别效果不佳.针对该问题,本文提出多域字符距离感知的场景文本图像超高分辨率重建方法(Perceiving Multi-Domain Character distance super-resolution, PMDC),强化视觉语义特征,提高文本区域和纹理信息.首先,采用非对称卷积以及语义先验信息模块,提取文本图像的视觉和语义特征信息;其次,融合字符距离感知模块中的视觉和语义特征,得到增强位置编码感知字符间的间距变化和语义相似性;最后,结合引导线索和视觉特征对像素进行重组得到超分辨率文本图像.在公开数据集TextZoom上的实验结果,与最近TATT文本超分网络性能相比,在峰值信噪比指标上提高0.11 dB,有效提高文本清晰度和边缘纹理细节,同时提升1.5%的平均识别准确率,改进文本图像的可读性.
其他语种文摘 Scene text image super-resolution (STISR) aims to enhance the resolution and legibility of text in low-resolution images. In cases of spatial deformation or low-resolution text images, the lack of details in text regions and the difficulty in aligning semantic cues and visual features with character position make it difficult to recognize text effectively. In order to address these challenges, this paper proposes a perceiving multi-domain character distance for scene text image super-resolution method (PMDC), which improves the image text region and edge texture details. Firsly, the visual and semantic features are extracted by using the asymmetric convolution module along with the semantic prior module. Then the enhanced position coding is obtained by the character distance perception module to perceive the distance change and semantic similarity between characters. Finally, the guiding cues and visual features are combined to restructure the pixels and generate a super-resolution text image. In comparison to TATT, experimental results from the public dataset TextZoom showed an increase of 0.11 dB in the fidelity of the peak signal-to-noise ratio index. This improvement effectively enhances the clarity of the text area and the detailed edge texture. Additionally, the recognition accuracy was improved by 1.4%, which effectively enhances the readability of the text image.
来源 电子学报 ,2024,52(7):2262-2270 【核心库】
DOI 10.12263/DZXB.20240090
关键词 计算机视觉 ; 场景文本图像 ; 超分辨率 ; 注意力机制 ; 特征信息关联
地址

福州大学物理与信息工程学院, 福建, 福州, 350108

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 电子技术、通信技术;自动化技术、计算机技术
基金 国家自然科学基金 ;  福建省杰青项目 ;  福建省教育厅重点攻关项目 ;  福州科技局项目
文献收藏号 CSCD:7805457

参考文献 共 36 共2页

1.  Zhang C S. Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems. IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4727-4743 CSCD被引 3    
2.  Singh A. Towards VQA models that can read. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2019:8317-8326 CSCD被引 1    
3.  Jaderberg M. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision,2016,116(1):1-20 CSCD被引 42    
4.  Cheng Z Z. Focusing attention: Towards accurate text recognition in natural images. 2017 IEEE International Conference on Computer Vision (ICCV),2017:5076-5084 CSCD被引 1    
5.  Shi B G. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304 CSCD被引 136    
6.  Graves A. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning-ICML'06,2006:369-376 CSCD被引 3    
7.  Luo C J. MORAN: A multi-object rectified attention network for scene text recognition. Pattern Recognition,2019,90(C):109-118 CSCD被引 22    
8.  Shi B G. ASTER: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(9):2035-2048 CSCD被引 37    
9.  Qiao Z. SEED: Semantics enhanced encoder-decoder framework for scene text recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2020:13528-13537 CSCD被引 1    
10.  Fang S C. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2021:7098-7107 CSCD被引 2    
11.  Wang W J. Scene text image super-resolution in the wild. Computer Vision-ECCV 2020,2020:650-666 CSCD被引 2    
12.  Mancas-Thillou C. An introduction to super-resolution text. Digital Document Processing,2007:305-327 CSCD被引 1    
13.  刘杰. 文物图像的超分辨率重建算法研究. 电子学报,2023,51(1):139-145 CSCD被引 1    
14.  Xu X Y. Learning to super-resolve blurry face and text images. 2017 IEEE International Conference on Computer Vision (ICCV),2017:251-260 CSCD被引 1    
15.  Ma J Q. Text prior guided scene text image super-resolution. IEEE Transactions on Image Processing,2023,32:1341-1352 CSCD被引 3    
16.  Chen J Y. Scene text telescope: Textfocused scene image super-resolution. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2021:12026-12035 CSCD被引 2    
17.  李滔. 基于深监督跨尺度注意力网络的深度图像超分辨率重建. 电子学报,2023,51(1):128-138 CSCD被引 1    
18.  Wang T W. Decoupled attention network for text recognition. Proceedings of the AAAI Conference on Artificial Intelligence. 34(7),2020:12216-12224 CSCD被引 1    
19.  Yue X Y. RobustScanner: Dynamically enhancing positional clues for robust text recognition. Computer Vision-ECCV 2020,2020:135-151 CSCD被引 1    
20.  Wan Z Y. TextScanner: Reading characters in order for robust scene text recognition. Proceedings of the AAAI Conference on Artificial Intelligence. 34(7),2020:12120-12127 CSCD被引 1    
引证文献 1

1 刘宗昊 语义增强的零样本甲骨文字符识别 电子学报,2024,52(10):3347-3358
CSCD被引 0 次

显示所有1篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号