帮助 关于我们

返回检索结果

稀疏地理实体关系的关键词提取方法
A Method of Context Enhanced Keyword Extraction for Sparse Geo-entity Relation

查看参考文献27篇

余丽 1   陆锋 1 *   刘希亮 1   程诗奋 1   张雪英 2  
文摘 网络文本蕴含地理实体关系抽取技术,需要高时效、强鲁棒的关键词提取方法。与监督学习方法相比,无监督学习方法能捕获文本的动态变化特征并发现新增的关系类型,因此备受关注。其中,基于频率的关键词提取方法获得广泛研究,然而,网络文本蕴含的地理实体关系分布稀疏,基于频率的方法难以直接应用于地理实体关系的关键词提取。为解决该问题,本文基于公开访问的网络资源,提出一种语境增强的关键词提取方法。首先,基于在线百科和开放的同义词词典,通过语境合并和语义融合创建增强的语境,以降低语境中词语的稀疏性。接着,Domain Frequency和Entropy频率统计方法从增强语境中自动构建一个大规模语料。然后,基于该语料选择词法特征并统计其权值,用于扩大语境中词语间的差异。最后,使用选择的词法特征度量增强语境中词语的重要性,将权值最大的词语作为描述地理实体关系的关键词,并基于大规模真实网络文本开展实验。实验结果表明:对于地理实体关系的关键词识别,本文方法的平均精度为85.5%,比Domain Frequency和Entropy方法分别提高41%和36%;对于新增关键词识别,本文方法的精度达到60.3%。语境增强的关键词提取方法能有效地处理地理实体关系分布的稀疏性,可服务于网络文本蕴含地理实体关系的抽取。
其他语种文摘 Geo-entity relation recognition from rich web texts requires robust and effective keyword extraction method. Unsupervised learning methods attract more attention because they can capture dynamic variations of features in text and discover additional relation types. Frequency-based methods for keyword extraction have been extensively studied. However, the sparse distribution of geo-entity relations in web texts makes it difficult to directly apply frequency-based methods to geo-entity keyword extraction. This paper proposes a context enhanced keyword extraction method to solve this problem. Firstly, the contexts of geo-entities are enhanced to reduce the sparseness of terms, with context merging and semantic fusion. Secondly, two well-known frequency-based statistical methods (Domain Frequency and Entropy) are used to automatically build a large-scale corpus. Thirdly, the lexical features and their weights are statistically determined based on the corpus. Finally, all terms in the enhanced contexts are measured according to their lexical features and the most important terms are picked as keywords of geo-entity pairs. Experiments are conducted with large and real web texts. The results show that compared with the Document Frequency and Entropy methods, the presented method improved the precision by 41% and 36%, respectively. It also correctly generated additional 60% of keywords.
来源 地球信息科学学报 ,2016,18(11):1465-1475 【核心库】
DOI 10.3724/SP.J.1047.2016.01465
关键词 地理信息检索 ; 地理实体关系 ; 关键词提取 ; 文本挖掘 ; 语境增强
地址

1. 中国科学院地理科学与资源研究所, 资源与环境信息系统国家重点实验室, 北京, 100101  

2. 南京师范大学, 虚拟地理环境教育部重点实验室, 南京, 210046

语种 中文
文献类型 研究性论文
ISSN 1560-8999
基金 国家863计划 ;  国家自然科学基金项目
文献收藏号 CSCD:5845073

参考文献 共 27 共2页

1.  Jones C B. Geographical information retrieval. International Journal of Geographical Information Science,2008,22(3):219-228 被引 21    
2.  Vasardani M. Locating place names from place descriptions. International Journal of Geographical Information Science,2013,27(12):2509-2532 被引 6    
3.  Derungs C. From text to landscape:locating, identifying and mapping the use of landscape features in a Swiss Alpine corpus. International Journal of Geographical Information Science,2014,28(6):1272-1293 被引 1    
4.  Purves R S. The design and implementation of SPIRIT:a spatially aware search engine for information retrieval on the Internet. International Journal of Geographical Information Science,2007,21(7):717-745 被引 13    
5.  袁烨城. 基于语义知识的空间关系识别研究. 地球信息科学学报,2014,16(5):681-690 被引 3    
6.  余丽. 网络文本蕴涵地理信息抽取:研究进展与展望. 地球信息科学学报,2015,17(2):127-134 被引 20    
7.  Li W W. Towards geospatial semantic search:exploiting latent semantic relations in geospatial data. International Journal of Digital Earth,2014,7(1):17-37 被引 1    
8.  杨博. 开放式信息抽取研究进展. 中文信息学报,2014,28(4):1-11,36 被引 11    
9.  Yan Y L. Unsupervised relation extraction by mining Wikipedia texts using information from the web. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP,2009:1021-1029 被引 2    
10.  Shen M M. Extracting semantic relations to enrich domain ontologies. Journal of Intelligent Information Systems,2012,39(3):749-761 被引 1    
11.  张苇如. 基于维基百科和模式聚类的实体关系抽取方法. 中文信息学报,2012,26(2):75-81,127 被引 8    
12.  余丽. 开放式地理实体关系抽取的Bootstrapping方法. 测绘学报,2016,45(5):616-622 被引 16    
13.  Mesquita F. Clustering techniques for open relation extraction. Proceedings of SIGMOD/PODS 2012 PhD Symposium, 20 May USA,2012:27-32 被引 1    
14.  秦兵. 无指导的中文开放式实体关系抽取. 计算机研究与发展,2015,52(5):1029-1035 被引 20    
15.  Chen J X. Unsupervised feature selection for relation extraction. Proceedings of the 2nd International Joint Conference on Natural Language Processing,2005:262-267 被引 1    
16.  张雪英. 空间关系词汇与地理实体要素类型的语义约束关系构建方法. 武汉大学学报·信息科学版,2012,37(11):1266-1270 被引 8    
17.  Schockaert S. Mining topological relations from the web. Proceedings of the 19th International Conference on Database and Expert Systems Applications,2008:652-656 被引 1    
18.  Smole D. Evaluation of inductive logic programming for information extraction from natural language texts to support spatial data recommendation services. International Journal of Geographical Information Science,2011,25(11):1809-1827 被引 2    
19.  Elia A. A linguistic-based method for automatically extracting spatial relations from large non-structured data. Proceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing,2013:193-200 被引 2    
20.  Cao C G. A practical approach to extracting names of geographical entities and their relations from the web. The 7th International Conference on Knowledge Science, Engineering and Management. Lecture Notes in Computer Science,2014:200-221 被引 1    
引证文献 7

1 陆锋 论地理知识图谱 地球信息科学学报,2017,19(6):723-734
被引 47

2 王姬卜 基于自动回标的地理实体关系语料库构建方法 地球信息科学学报,2018,20(7):871-879
被引 6

显示所有7篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号