帮助关于我们

返回检索结果

基于模式和投影学习的领域概念上下位关系自动识别研究
Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning

查看参考文献31篇

王思丽 ^1,2 ^* 祝忠明 ^1,2 杨恒 ¹ 刘巍 ¹

文摘	【目的】实现对领域概念上下位关系的自动识别,以解决领域本体自动化构建中领域概念间语义关系的自动获取和确立问题。【方法】将传统无监督的基于模式的方法和当前先进的有监督的基于投影学习的方法有机结合起来,应用于领域概念上下位关系自动识别,并进行了实验研究。【结果】能识别出领域概念的上位词集合,在医学领域的识别精度为0.88,通用领域的识别精度为0.83,在评估基准集BLESS上的平均精度为0.85。【局限】受句法歧义、语料集质量等影响,模型精度尚未达到峰值,存在错误识别的情况。【结论】可发现同一概念词的不同意义的上位词,对低频词和命名实体也具有较好识别效果。未来可考虑从对高频顶层上位词进行适当减权、提升有监督语料集的质量等方面进行优化。
其他语种文摘	[Objective] This paper tries to automatically identify the hypernym-hyponym relations of domain concepts and establish their ontology. [Methods] First, we combined the traditional unsupervised pattern-based method and the advanced supervised-based projection learning method to automatically extract domain concepts. Then, we examined our new method with an empirical study. [Results] The proposed method could identify the hypernym sets of domain concepts. The identification accuracy in medical and general fields, as well as with the benchmark dataset BLESS were 0.88, 0.83, and 0.85 respectively. [Limitations] More research is needed to reduce the weight of high-frequency top-level words and improve the corpus quality. There are also some misidentified relationships. [Conclusions] The proposed model could find hypernym with different meanings for the same concept, which could also extract low-frequency words and named entities.
来源	数据分析与知识发现 ,2020,4(11):15-25 【扩展库】
DOI	10.11925/infotech.2096-3467.2020.0299
关键词	Hearst模式 ; 投影学习 ; 词嵌入 ; 上下位关系 ; 领域概念
地址	1. 中国科学院西北生态环境资源研究院文献情报中心, 兰州, 730000 2. 中国科学院大学, 北京, 100049
语种	中文
文献类型	研究性论文
ISSN	2096-3467
学科	社会科学总论;自动化技术、计算机技术
基金	国家科技部重点研发计划课题 ; 中国科学院2019年西部之光项目 ; 中国科学院西北生态环境资源研究院文献情报中心2018年文献情报创新能力建设项目
文献收藏号	CSCD:6853463

参考文献共 31 共2页

引证文献 2 篇

1 戴志宏上下位关系抽取方法及其在金融市场的应用数据分析与知识发现,2021,5(10):60-70
被引 0 次

2 张国防基于主题词共现的文档非对称关系量化研究数据分析与知识发现,2023,7(3):110-120
被引 0 次

显示所有2篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号