帮助 关于我们

返回检索结果

基于短语向量和主题加权的关键词抽取方法
The Theme-Weighted Keyphrase Extraction Algorithm Based on Phrase Embedding

查看参考文献15篇

孙新 1,2   盖晨 1   申长虹 1   张颖捷 1  
文摘 现有关键词抽取算法缺乏对短语的有效表示,为抽取出更能反映文本主题的关键短语,本文提出一种基于短语向量的关键词抽取方法PhraseVecRank.首先设计基于LSTM(Long Short-Term Memory)和CNN(Convolutional Neural Network)自编码器的短语向量构建模型,解决复杂短语的语义表示问题.然后,利用短语向量对每个候选短语计算主题权重,通过主题加权排序提高关键词抽取的效果.在公共数据集和学术论文数据上的实验表明,本文提出的方法能够有效提取与文本主题信息相关的关键短语,同时利用自编码器构造的短语向量可以更好地表示短语的语义信息.
其他语种文摘 Keyword extraction is a key basic problem in the field of natural language processing. The keyphrase extraction algorithms(PhraseVecRank) is proposed based on phrase embedding. Firstly, a phrase vector construction model based on LSTM(Long Short-Term Memory)and CNN(Convolutional Neural Network)is designed to solve the semantic representation of complex phrases. Then, PhraseVecRank uses phrase embedding to calculate theme weight for each candidate phrase, and uses semantic similarity between candidate phrase embedding and co-occurrence information to calculate edge weight together, which can improve the extraction effect of keyphrases through topic weighted ranking. The experimental results verify that PhraseVecRank can effectively extract keyphrases covering the topic information of text, and the phrase embedding models we proposed can better represent the semantic information of phrases.
来源 电子学报 ,2021,49(9):1682-1690 【核心库】
DOI 10.12263/DZXB.20200014
关键词 短语向量 ; 自编码器 ; 主题加权 ; 关键词抽取
地址

1. 北京理工大学计算机学院, 北京市海量语言信息处理与云计算应用工程技术研究中心, 北京, 100081  

2. 北京理工大学东南信息技术研究院, 福建, 莆田, 351100

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 自动化技术、计算机技术
基金 国家重点研发计划项目
文献收藏号 CSCD:7077342

参考文献 共 15 共1页

1.  Papagiannopoulou E. A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2020,10(2):e1339 CSCD被引 6    
2.  刘慧婷. 一般间隙序列模式挖掘的关键词抽取. 电子学报,2019,47(5):1121-1128 CSCD被引 1    
3.  Mihalcea R. TextRank: Bringing order into texts. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing,2004:404-411 CSCD被引 97    
4.  Wan X J. Single document keyphrase extraction using neighborhood knowledge. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence,2008:855-860 CSCD被引 2    
5.  Liu Z. Automatic keyphrase extraction via topic decomposition. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Massachusetts,2010:366-376 CSCD被引 1    
6.  Florescu C. PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017:1105-1115 CSCD被引 11    
7.  马慧芳. 基于加权超图随机游走的文献关键词提取算法. 电子学报,2018,46(6):1410-1414 CSCD被引 5    
8.  Bojanowski P. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics,2017,5:135-146 CSCD被引 111    
9.  Sun Y. SIFRank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access,2020,8:10896-10906 CSCD被引 8    
10.  Peters M. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018:2227-2237 CSCD被引 22    
11.  Salton G. Term-weighting approaches in automatic text retrieval. Information Processing & Management,1988,24(5):513-523 CSCD被引 331    
12.  Li P. Recursive autoencoders for ITGbased translation. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,2013:567-577 CSCD被引 2    
13.  Witten I H. KEA: practical automatic keyphrase extraction. Proceedings of the fourth ACM conference on Digital Libraries,1999:254-255 CSCD被引 12    
14.  Medelyan O. Human-competitive tagging using automatic keyphrase extraction. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing,2009:1318-1327 CSCD被引 6    
15.  Meng R. Deep keyphrase generation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017:582-592 CSCD被引 13    
引证文献 1

1 周炫余 多模态信息增强表示的中文关键词抽取方法 清华大学学报. 自然科学版,2024,64(10):1785-1796
CSCD被引 0 次

显示所有1篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号