帮助 关于我们

返回检索结果

融合语言模型的端到端中文语音识别算法
An End-to-End Chinese Speech Recognition Algorithm Integrating Language Model

查看参考文献28篇

吕坤儒 1   吴春国 1,2,3   梁艳春 1,2,3   袁宇平 1   任智敏 1   周柚 1,2   时小虎 1,2,3  
文摘 为了解决语音识别模型在识别中文语音时鲁棒性差,缺少语言建模能力而无法有效区分同音字或近音字的不足,本文提出了融合语言模型的端到端中文语音识别算法.算法建立了一个基于深度全序列卷积神经网络和联结时序分类的从语音到拼音的语音识别声学模型,并借鉴Transformer的编码模型,构建了从拼音到汉字的语言模型,之后通过设计语音帧分解模型将声学模型的输出和语言模型的输入相连接,克服了语言模型误差梯度无法传递给声学模型的难点,实现了声学模型和语言模型的联合训练.为验证本文方法,在实际数据集上进行了测试.实验结果表明,语言模型的引入将算法的字错误率降低了21%,端到端的联合训练算法起到了关键作用,其对算法的影响达到了43%.和已有5种主流算法进行比较的结果表明本文方法的误差明显低于其他5种对比模型,与结果最好的Deep-Speech2模型相比字错误率降低了28%.
其他语种文摘 To address the problems of poor robustness, lack of language modeling ability and inability to distinguish between homophones or near-tone characters effectively in the recognition of Chinese speech, an end-to-end Chinese speech recognition algorithm integrating language model is proposed. Firstly, an acoustic model from speech to Pinyin is established based on Deep Fully Convolutional Neural Network (DFCNN) and Connectionist Temporal Classification (CTC). Then the language model from Pinyin to Chinese character is constructed by using the encoder of Transformer. Finally, the speech frame decomposition model is designed to link the output of the acoustic model with the input of the language model, which overcomes the difficulty that the gradient of loss function cannot be passed from the language model to the acoustic model, and realizes the end-to-end training of the acoustic model and the language model. Real data sets are applied to verify the proposed method. Experimental results show that the introduction of language model reduces the word error rate (WER) of the algorithm by 21%, and the end-to-end integrating training algorithm plays a key role, which improves the performance by 43%. Compared with five up-to-date algorithms, our method achieves a 28% WER, lower than that of the best model among comparison methods—DeepSpeech2.
来源 电子学报 ,2021,49(11):2177-2185 【核心库】
DOI 10.12263/DZXB.20201187
关键词 语音识别 ; 联结时序分类 ; 语言模型 ; 声学模型 ; 语音帧分解
地址

1. 吉林大学计算机科学与技术学院, 吉林, 长春, 130012  

2. 吉林大学, 符号计算与知识工程教育部重点实验室, 吉林, 长春, 130012  

3. 珠海科技学院计算机学院, 广东, 珠海, 519041

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 自动化技术、计算机技术
基金 国家自然科学基金 ;  吉林省预算内基本建设资金 ;  广东省科技厅国际合作项目 ;  吉林省自然科学基金
文献收藏号 CSCD:7109400

参考文献 共 28 共2页

1.  杨明浩. 对话意图及语音识别错误对交互体验的影响. 软件学报,2016,27(S2):69-75 CSCD被引 1    
2.  Rodriguez E. Speech/speaker recognition using a HMM/GMM hybrid model. International Conference on Audio-and Video-Based Biometric Person Authentication,1997:227-234 CSCD被引 1    
3.  Mohamed A R. Deep belief networks using discriminative features for phone recognition. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing,2011:5060-5063 CSCD被引 2    
4.  Yu D. Deep learning and its applications to signal and information processing. IEEE Signal Processing Magazine,2011,28(1):145-154 CSCD被引 62    
5.  Graves A. Speech recognition with deep recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2013:6645-6649 CSCD被引 2    
6.  Sak H. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. The 15th Annual Conference of the International Speech Communication Association,2014:338-342 CSCD被引 2    
7.  Abdel-Hamid O. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio Speech & Language Processing,2014,22(10):1533-1545 CSCD被引 98    
8.  Graves A. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. International Conference on Machine Learning, ICML 2006,2006:369-376 CSCD被引 1    
9.  Zhang Y. Towards end-to-end speech recognition with deep convolutional neural networks. The 17th Annual Conference of the International Speech Communication Association,2016:410-414 CSCD被引 1    
10.  Yang X D. Simple data augmented transformer end-to-end Tibetan speech recognition. IEEE 3rd International Conference on Information Communication and Signal Processing,2020:148-152 CSCD被引 1    
11.  Chang H J. End-to-end whispered speech recognition with frequency-weighted approaches and pseudo whisper pre-training. IEEE Spoken Language TechnologyWorkshop,2021:186-193 CSCD被引 1    
12.  Fan C H. Gated recurrent fusion with joint training framework for robust end-to-end speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing,2021,29:198-209 CSCD被引 5    
13.  Graves A. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks,2005,18(5/6):602-610 CSCD被引 270    
14.  Sainath T N. Convolutional, long short-term memory, fully connected deep neural networks. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2015:4580-4584 CSCD被引 8    
15.  Amodei D. Deep speech 2: end-to-end speech recognition in English and Mandarin. International Conference on Machine Learning 2016,2016:173-182 CSCD被引 1    
16.  王海坤. 语音识别技术的研究进展与展望. 电信科学报,2018,2:1-11 CSCD被引 1    
17.  Kannan A. An analysis of incorporating an external language model into a sequence-tosequence model. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing,2017:5824-5828 CSCD被引 1    
18.  Gulcehre C. On using monolingual corpora in neural machine translation,2015 CSCD被引 1    
19.  Anuroop S. Cold fusion: Training seq2seq models together with language models. The 19th Annual Conference of the International Speech Communication Association,2018:387-391 CSCD被引 1    
20.  Toshniwal S. A comparison of techniques for language model integration in encoder-decoder speech recognition. IEEE Workshop on Spoken Language Technology,2018:369-375 CSCD被引 1    
引证文献 4

1 侯海薇 基于无监督表征学习的深度聚类研究进展 模式识别与人工智能,2022,35(11):999-1014
CSCD被引 2

2 沈逸文 结合Transformer的轻量化中文语音识别 计算机应用研究,2023,40(2):424-429
CSCD被引 4

显示所有4篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号