基于预训练语言模型的健康谣言检测
Health Rumor Detection based on Pre-Trained Language Model
查看参考文献17篇
文摘
|
当前大多数谣言检测主要面向社交媒体数据,所处理文本序列较短,然而面向包含多个句子的段落或长序列文本篇章输入时,因不能提取有效特征进而影响模型识别效果.为获取谣言检测的有效信息,文章提出基于I-BERT-BiLSTM (Improved-BERT-BiLSTM)的健康类谣言检测方法,通过提取文档级长序列文本的摘要,并输入到以多层注意力机制为框架的深层神经网络进行特征提取,最后输入到BiLSTM进行谣言分类.实验结果表明:文章提出的I-BERT-BiLSTM模型在自建健康类谣言数据集与公开数据集上达到了97.75%和91.15%的准确率. |
其他语种文摘
|
Currently,most studies on rumor detection mainly focus on social media data and the length of text sequence is short.We argue that existing methods could not capture effective features from health rumors with long texts and then affect the validity of methods.To solve this,we propose an improved BERT-BiLSTM model (I-BERT-BiLSTM),which leverages effective information extracted from texts with long sequences for the health rumor detection.We first conduct text summarization from document-level text.The results are regarded as the input of the deep network model with multi-layer self-attention mechanisms for feature extraction.Finally,we feed the output into BiLSTM for rumor classification.The experimental results show that the model we proposed in this paper achieves 97.75% and 91.15% accuracy on the self-built health rumor data and public data. |
来源
|
系统科学与数学
,2022,42(10):2582-2589 【核心库】
|
DOI
|
10.12341/jssms22646KSS
|
关键词
|
谣言检测
;
预训练语言模型
;
摘要提取
;
I-BERT-BiLSTM
|
地址
|
中国传媒大学, 北京, 100024
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1000-0577 |
学科
|
自动化技术、计算机技术 |
基金
|
中国传媒大学中央高校基本科研业务费专项
|
文献收藏号
|
CSCD:7356056
|
参考文献 共
17
共1页
|
1.
刘知远. 中文社交媒体谣言统计语义分析.
中国科学:信息科学,2015,45:1536-1546
|
CSCD被引
15
次
|
|
|
|
2.
Castillo C. Information credibility on Twitter.
Proceedings of International Conference on World Wide Web,2011:675-684
|
CSCD被引
3
次
|
|
|
|
3.
Zhang Q. Automatic detection of rumor on social network.
Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing,2015:113-122
|
CSCD被引
4
次
|
|
|
|
4.
Qazvinian V. Rumor has it: Identifying misinformation in microblogs.
Proceedings of the Conference on Empirical Methods in Natural Language Processing,2011:1589-1599
|
CSCD被引
19
次
|
|
|
|
5.
Zhao Z. Enquiring minds: Early detection of rumors in social media from enquiry posts.
Proceedings of the 24th International Conference on World Wide Web,2015:1395-1405
|
CSCD被引
29
次
|
|
|
|
6.
Yang F. Automatic detection of rumor on Sina Weibo.
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics,2012:13-20
|
CSCD被引
18
次
|
|
|
|
7.
Liang G. Rumor Identification in Microblogging systems based on users' behavior.
IEEE Trans. Comput. Soc. Syst,2015,2:99-108
|
CSCD被引
26
次
|
|
|
|
8.
LeCun Y. Backpropagation applied to handwritten zip code recognition.
Neural Comput,1989,1:541-551
|
CSCD被引
536
次
|
|
|
|
9.
Elman J L. Finding structure in time.
Cogn. Sci,1990,14:179-211
|
CSCD被引
298
次
|
|
|
|
10.
Hochreiter S. Long short-term memory.
Neural Comput,1997,9:1735-1780
|
CSCD被引
3673
次
|
|
|
|
11.
Ma J. Detect rumors using time series of social context information on microblogging websites.
Proceedings of the 24th ACM International Conference on Information and Knowledge Management,2015:1751-1754
|
CSCD被引
21
次
|
|
|
|
12.
廖祥文. 基于分层注意力网络的社交媒体谣言检测.
中国科学:信息科学,2018,48(11):1558-1574
|
CSCD被引
8
次
|
|
|
|
13.
Mihalcea R. TextRank: Bringing order into texts.
Proceedings of Empirical Methods in Natural Language Processing,2004:404-411
|
CSCD被引
8
次
|
|
|
|
14.
Devlin J. BERT: Pre-training of deep bidirectional transformers for language understanding.
Computation and Language,2018,23(2):3-19
|
CSCD被引
6
次
|
|
|
|
15.
李铮. 基于ELMo和Bi-SAN的中文文本情感分析.
计算机应用研究,2021,38(8):2303-2307
|
CSCD被引
2
次
|
|
|
|
16.
Sun M S.
THUCTC: An Efficient Chinese Text Classifier,2016
|
CSCD被引
1
次
|
|
|
|
17.
王紫音. 基于BERT-BiGRU模型的文本分类研究.
天津理工大学学报,2021,37(4):40-46
|
CSCD被引
2
次
|
|
|
|
|