帮助 关于我们

返回检索结果

基于组合-卷积神经网络的中文新闻文本分类
A Combined-Convolutional Neural Network for Chinese News Text Classification

查看参考文献28篇

张昱 1,2   刘开峰 1 *   张全新 3   王艳歌 1   高凯龙 1  
文摘 目前的新闻分类研究以英文居多,而且常用的传统机器学习方法在长文本处理方面,存在局部文本块特征提取不完善的问题.为了解决中文新闻分类缺乏专门术语集的问题,采用构造数据索引的方法,制作了适合中文新闻分类的词汇表,并结合word2vec预训练词向量进行文本特征构建.为了解决特征提取不完善的问题,通过改进经典卷积神经网络模型结构,研究不同的卷积和池化操作对分类结果的影响.为提高新闻文本分类的精确率,本文提出并实现了一种组合-卷积神经网络模型,设计了有效的模型正则化和优化方法.实验结果表明,组合-卷积神经网络模型对中文新闻文本分类的精确率达到93.69%,相比最优的传统机器学习方法和经典卷积神经网络模型精确率分别提升6.34%和1.19%,并在召回率和F值两项指标上均优于对比模型.
其他语种文摘 At present,most of the researches on news classification are in English, and the traditional machine learning methods have a problem of incomplete extraction of local text block features in long text processing. In order to solve the problem of lack of special term set for Chinese news classification,a vocabulary suitable for Chinese text classification is made by constructing a data index method,and the text feature construction is combined with word2vec pre-trained word vector. In order to solve the problem of incomplete feature extraction, the effects of different convolution and pooling operations on the classification results are studied by improving the structure of classical convolution neural network model. In order to improve the precision of Chinese news text classification, this paper proposes and implements a combined-convolution neural network model, and designs an effective method of model regularization and optimization. The experimental results show that the precision of the combined-convolutional neural network model for Chinese news text classification reaches 93.69%,which is 6.34% and 1.19% higher than the best traditional machine learning method and classic convolutional neural network model, and it is better than the comparison model in recall and F-measure.
来源 电子学报 ,2021,49(6):1059-1067 【核心库】
DOI 10.12263/DZXB.20200134
关键词 自然语言处理 ; 词向量 ; 组合-卷积神经网络 ; 中文新闻 ; 文本分类
地址

1. 北京建筑大学电气与信息工程学院, 建筑大数据智能处理方法研究北京市重点实验室, 北京, 100044  

2. (北京)中国矿业大学, 深部岩土力学与地下工程国家重点实验室, 北京, 100083  

3. 北京理工大学计算机科学与技术学院, 北京, 100081

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 自动化技术、计算机技术
基金 北京建筑大学优秀主讲教师培育计划 ;  国家重点研发计划 ;  教育部2018产学合作协同育人项目 ;  北京市属高校基本科研业务费 ;  北京建筑大学研究生创新项目
文献收藏号 CSCD:7018836

参考文献 共 28 共2页

1.  Chung T. Empirical study on character level neural network classifier for Chinese text. Engineering Applications of Artificial Intelligence,2019,80(4):1-7 CSCD被引 2    
2.  He J. Convolutional neural networks for Chinese sentiment classification of social network. 2017 IEEE International Conference on Mechatronics and Automation (ICMA),2017:1877-1881 CSCD被引 1    
3.  唐焕玲. 有监督主题模型的SLDA-TC文本分类新方法. 电子学报,2019,47(6):1300-1308 CSCD被引 9    
4.  钟将. 主题特征格分析:一种用户生成文本质量评估方法. 电子学报,2018,46(9):2201-2206 CSCD被引 1    
5.  Yang Y. Combining lexical and syntactic features for detecting content-dense texts in news. Journal of Artificial Intelligence Research,2017,60(9):179-219 CSCD被引 1    
6.  Wang Y. Attitude of the Chinese public toward off-site construction: A text mining study. Journal of Cleaner Production,2019,238(11):117926 CSCD被引 1    
7.  Liu C. Quality-relatedenglish text classification based on recurrent neural network. Journal of Visual Communication and Image Representation,2019,71(8):102724 CSCD被引 1    
8.  吕品. 融合锚词抽取的海量短文本主题层次挖掘. 电子学报,2018,46(5):1084-1088 CSCD被引 1    
9.  Liao W. Improved sequence generation model for multi-label classification via CNN and initialized fully connection. Neurocomputing,2020,382(3):88-195 CSCD被引 1    
10.  吕泽芳. 人工智能安全的概念、分类及研究现状综述(一). 智慧电力,2019,8(47):32-42 CSCD被引 1    
11.  Ong Hui J L. Effects of word class and text position in sentiment-based news classification. Procedia Computer Science,2017,124(11):77-85 CSCD被引 1    
12.  Yang X. Sentiment analysis of weibo comment texts based on extended vocabulary and convolutional neural network. Procedia computer science,2019,147(2):361-368 CSCD被引 7    
13.  Khan A. Multi-channel fusion convolutional neural network to classify syntactic anomaly from language-related ERP components. Information Fusion,2019,52(12):53-61 CSCD被引 1    
14.  Liu P. Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark. Journal of Central South University,2019,26(1):1-12 CSCD被引 11    
15.  Jiang J Y. FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors. Expert Systems with Applications,2012,39(3):2813-2821 CSCD被引 15    
16.  Malviya R. Anovel text categorization approach based on K-means and support vector machine. International Journal of Computer Applications,2015,130(14):1-7 CSCD被引 1    
17.  Bengio Y. Neural Probabilistic Language Models,2003 CSCD被引 1    
18.  Collobert R. A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th international Conference on Machine learning,2008:160-167 CSCD被引 97    
19.  Mikolov T. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems,2013:3111-3119 CSCD被引 340    
20.  Xu R. Wordembedding composition for data imbalances in sentiment and emotion classification. Cognitive Computation,2015,7(2):226-240 CSCD被引 7    
引证文献 10

1 黄友文 DistillBIGRU:基于知识蒸馏的文本分类模型 中文信息学报,2022,36(4):81-89
CSCD被引 3

2 陈传刚 恶劣环境条件下海外天然气管道站场事故演化知识图谱建模及预警方法 清华大学学报. 自然科学版,2022,62(6):1081-1087
CSCD被引 2

显示所有10篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号