基于组合-卷积神经网络的中文新闻文本分类
A Combined-Convolutional Neural Network for Chinese News Text Classification
查看参考文献28篇
文摘
|
目前的新闻分类研究以英文居多,而且常用的传统机器学习方法在长文本处理方面,存在局部文本块特征提取不完善的问题.为了解决中文新闻分类缺乏专门术语集的问题,采用构造数据索引的方法,制作了适合中文新闻分类的词汇表,并结合word2vec预训练词向量进行文本特征构建.为了解决特征提取不完善的问题,通过改进经典卷积神经网络模型结构,研究不同的卷积和池化操作对分类结果的影响.为提高新闻文本分类的精确率,本文提出并实现了一种组合-卷积神经网络模型,设计了有效的模型正则化和优化方法.实验结果表明,组合-卷积神经网络模型对中文新闻文本分类的精确率达到93.69%,相比最优的传统机器学习方法和经典卷积神经网络模型精确率分别提升6.34%和1.19%,并在召回率和F值两项指标上均优于对比模型. |
其他语种文摘
|
At present,most of the researches on news classification are in English, and the traditional machine learning methods have a problem of incomplete extraction of local text block features in long text processing. In order to solve the problem of lack of special term set for Chinese news classification,a vocabulary suitable for Chinese text classification is made by constructing a data index method,and the text feature construction is combined with word2vec pre-trained word vector. In order to solve the problem of incomplete feature extraction, the effects of different convolution and pooling operations on the classification results are studied by improving the structure of classical convolution neural network model. In order to improve the precision of Chinese news text classification, this paper proposes and implements a combined-convolution neural network model, and designs an effective method of model regularization and optimization. The experimental results show that the precision of the combined-convolutional neural network model for Chinese news text classification reaches 93.69%,which is 6.34% and 1.19% higher than the best traditional machine learning method and classic convolutional neural network model, and it is better than the comparison model in recall and F-measure. |
来源
|
电子学报
,2021,49(6):1059-1067 【核心库】
|
DOI
|
10.12263/DZXB.20200134
|
关键词
|
自然语言处理
;
词向量
;
组合-卷积神经网络
;
中文新闻
;
文本分类
|
地址
|
1.
北京建筑大学电气与信息工程学院, 建筑大数据智能处理方法研究北京市重点实验室, 北京, 100044
2.
(北京)中国矿业大学, 深部岩土力学与地下工程国家重点实验室, 北京, 100083
3.
北京理工大学计算机科学与技术学院, 北京, 100081
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
0372-2112 |
学科
|
自动化技术、计算机技术 |
基金
|
北京建筑大学优秀主讲教师培育计划
;
国家重点研发计划
;
教育部2018产学合作协同育人项目
;
北京市属高校基本科研业务费
;
北京建筑大学研究生创新项目
|
文献收藏号
|
CSCD:7018836
|
参考文献 共
28
共2页
|
1.
Chung T. Empirical study on character level neural network classifier for Chinese text.
Engineering Applications of Artificial Intelligence,2019,80(4):1-7
|
CSCD被引
2
次
|
|
|
|
2.
He J. Convolutional neural networks for Chinese sentiment classification of social network.
2017 IEEE International Conference on Mechatronics and Automation (ICMA),2017:1877-1881
|
CSCD被引
1
次
|
|
|
|
3.
唐焕玲. 有监督主题模型的SLDA-TC文本分类新方法.
电子学报,2019,47(6):1300-1308
|
CSCD被引
9
次
|
|
|
|
4.
钟将. 主题特征格分析:一种用户生成文本质量评估方法.
电子学报,2018,46(9):2201-2206
|
CSCD被引
1
次
|
|
|
|
5.
Yang Y. Combining lexical and syntactic features for detecting content-dense texts in news.
Journal of Artificial Intelligence Research,2017,60(9):179-219
|
CSCD被引
1
次
|
|
|
|
6.
Wang Y. Attitude of the Chinese public toward off-site construction: A text mining study.
Journal of Cleaner Production,2019,238(11):117926
|
CSCD被引
1
次
|
|
|
|
7.
Liu C. Quality-relatedenglish text classification based on recurrent neural network.
Journal of Visual Communication and Image Representation,2019,71(8):102724
|
CSCD被引
1
次
|
|
|
|
8.
吕品. 融合锚词抽取的海量短文本主题层次挖掘.
电子学报,2018,46(5):1084-1088
|
CSCD被引
1
次
|
|
|
|
9.
Liao W. Improved sequence generation model for multi-label classification via CNN and initialized fully connection.
Neurocomputing,2020,382(3):88-195
|
CSCD被引
1
次
|
|
|
|
10.
吕泽芳. 人工智能安全的概念、分类及研究现状综述(一).
智慧电力,2019,8(47):32-42
|
CSCD被引
1
次
|
|
|
|
11.
Ong Hui J L. Effects of word class and text position in sentiment-based news classification.
Procedia Computer Science,2017,124(11):77-85
|
CSCD被引
1
次
|
|
|
|
12.
Yang X. Sentiment analysis of weibo comment texts based on extended vocabulary and convolutional neural network.
Procedia computer science,2019,147(2):361-368
|
CSCD被引
7
次
|
|
|
|
13.
Khan A. Multi-channel fusion convolutional neural network to classify syntactic anomaly from language-related ERP components.
Information Fusion,2019,52(12):53-61
|
CSCD被引
1
次
|
|
|
|
14.
Liu P. Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark.
Journal of Central South University,2019,26(1):1-12
|
CSCD被引
11
次
|
|
|
|
15.
Jiang J Y. FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors.
Expert Systems with Applications,2012,39(3):2813-2821
|
CSCD被引
15
次
|
|
|
|
16.
Malviya R. Anovel text categorization approach based on K-means and support vector machine.
International Journal of Computer Applications,2015,130(14):1-7
|
CSCD被引
1
次
|
|
|
|
17.
Bengio Y.
Neural Probabilistic Language Models,2003
|
CSCD被引
1
次
|
|
|
|
18.
Collobert R. A unified architecture for natural language processing: Deep neural networks with multitask learning.
Proceedings of the 25th international Conference on Machine learning,2008:160-167
|
CSCD被引
97
次
|
|
|
|
19.
Mikolov T. Distributed representations of words and phrases and their compositionality.
Advances in Neural Information Processing Systems,2013:3111-3119
|
CSCD被引
340
次
|
|
|
|
20.
Xu R. Wordembedding composition for data imbalances in sentiment and emotion classification.
Cognitive Computation,2015,7(2):226-240
|
CSCD被引
7
次
|
|
|
|
|