帮助 关于我们

返回检索结果

异构分类器堆叠泛化及其在恶意评论检测中的应用
Stacked Generalization of Heterogeneous Classifiers and Its Application in Toxic Comments Detection

查看参考文献21篇

吕品 1   于文兵 2   汪鑫 1   计春雷 1   周曦民 3  
文摘 恶意评论检测是预防社会媒体平台给用户带来负面影响的一项重要工作,是自然语言处理的重要领域之一.为解决单分类器实现恶意评论检测时模型精度不稳定、boosting集成模型精度较低的问题,提出一种异构分类器堆叠泛化的方法.该方法用深度循环神经网络将多标签的恶意评论分类问题转变为二类分类,防止了模型精度不稳定;用堆叠泛化集成时单个分类器GRU( Gated Recurrent Unit)和NB-SVM( Naive Bayes-Support Vector Machine)在模型结构和分类偏差上的差异性,改善了模型精度.在维基百科恶意评论数据集上的对比实验证明:提出的方法优于boosting集成,说明堆叠泛化异构分类器实现恶意评论检测是可行且有效的.
其他语种文摘 Toxic comment detection is an important work to prevent the negative impact of social media platform on users, and it is also one of the important fields of natural language processing. In order to solve the problems of unstable model accuracy and low accuracy of boosting ensemble model when an individual classifier detects toxic comments,a stack generalization with heterogeneous classifiers is proposed. In this method, the classification problem of multi-label toxic comments is transformed into binary categories by using deep recurrent neural network,which prevents the model accuracy from being unstable. Individual classifiers called GRU ( Gated Recurrent Unit) and NB-SVM ( Naive Bayes-Support Vector Machine) are used during stacked generalization in order to embody the differences on model structure and classification deviation of individual classifiers, the goal is to improve the model accuracy. Experimental results on Wikipedia toxic comments show that the proposed method has better than boosting ensemble,which reports that stacked generalization of heterogeneous classifiers is feasible and effective for toxic comments detection.
来源 电子学报 ,2019,47(10):2228-2234 【核心库】
DOI 10.3969/j.issn.0372-2112.2019.10.026
关键词 堆叠泛化 ; 恶意评论 ; 循环神经网络 ; NB-SVM ; 词嵌入
地址

1. 上海电机学院电子信息学院, 上海, 201306  

2. 上海电机学院文理学院, 上海, 201306  

3. 上海超级计算中心, 上海, 201203

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 自动化技术、计算机技术
基金 上海市教育科学研究项目 ;  上海电机学院计算机科学与技术优势学科
文献收藏号 CSCD:6668677

参考文献 共 21 共2页

1.  周孟. 基于情感标签的极性分类. 电子学报,2017,45(4):1018-1024 CSCD被引 2    
2.  卜湛. 在线评论情感计算与博弈预测. 电子学报,2015,43(12):2530-2535 CSCD被引 7    
3.  翟延东. 一种基于WordNet的短文本语义相似性算法. 电子学报,2012,40(3):617-620 CSCD被引 4    
4.  Ellery Wulczyn. Ex machina: Personal attacks seen at scale. Proceedings of the 26th International Conference on World Wide Web,2017:1391-1399 CSCD被引 4    
5.  Maeve Duggan. Online harassment,2014 CSCD被引 1    
6.  Dawei Yin. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB 2.0 Workshop at WWW2009,2009:1-7 CSCD被引 1    
7.  Prashant Ravi. Detecting Insults in Social Commentary,2019 CSCD被引 1    
8.  Maral Dadvar. Improving cyberbullying detection with user context. ECIR 2013 Lecture Notes in Computer Science. vol 7814,2013:693-696 CSCD被引 1    
9.  Chen Ying. Detecting offensive language in interactive social media to protect adolescent online safety. International Conference on Privacy, Security,Risk and Trust and International Conference on Social Computing,2013:71-80 CSCD被引 1    
10.  Nemanja Djuric. Hate speech detection with comment embeddings. Proceedings of the 24th International Conference on World Wide Web,2015:29-30 CSCD被引 1    
11.  Georgakopoulos Spiros V. Convolutional neural networks for toxic comment classification. Proceedings of the 10th Hellenic Conference on Artificial Intelligence,2018:Article No. 35 CSCD被引 1    
12.  Betty vanAken. Challenges for toxic comment classification an in-depth error analysis. Proceedings of the Second Workshop on Abusive Language Online,2018:33-42 CSCD被引 1    
13.  Wolpert David H. Stacked generalization. Neural Network,1992,5(2):241-259 CSCD被引 217    
14.  Breiman Leo. Stacked regressions. Machine Learning,1996,24:49-64 CSCD被引 48    
15.  Naimi Ashley I. Stacked generalization: an introduction to super learning. European Journal of Epidemiology,2018,33(5):459-464 CSCD被引 8    
16.  Jin Chen. Using stacked generalization to combine SVMs in magnitude and shape feature spaces for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing,2009,47(7):2193-2205 CSCD被引 10    
17.  Gao Lei. Detecting online hate speech using context aware models. Proceedings of the International Conference Recent Advances in Natural Language Processing,2017:260-266 CSCD被引 1    
18.  Yang Zichao. Hierarchical attention networks for document classification. The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2016:1480-1489 CSCD被引 1    
19.  Wang Sida. Baselines and bigrams: simple,good sentiment and topic classification. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics,2012:90-94 CSCD被引 5    
20.  Mai Ibrahim. Imbalanced toxic comments classification using data augmentation and deep learning. 17th IEEE International Conference on Machine Learning and Applications,2018:875-878 CSCD被引 1    
引证文献 4

1 吴浩 基于BERT-RCNN的中文违规评论识别研究 中文信息学报,2022,36(1):92-103
CSCD被引 2

2 张秀全 基于近红外-可见光高光谱的堆叠泛化模型褐土有机质预测 光谱学与光谱分析,2023,43(3):903-910
CSCD被引 5

显示所有4篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号