帮助关于我们

返回检索结果

基于深度学习的文本中细粒度知识元抽取方法研究
Extracting Fine-grained Knowledge Units from Texts with Deep Learning

查看参考文献27篇

余丽 ^1,2 钱力 ^1,3 ^* 付常雷 ¹ 赵华茗 ¹

文摘	【目的】改进Bootstrapping方法,建立深度学习模型从文本中抽取多类型细粒度的知识元。【方法】利用搜索引擎和Elsevier关键词构建知识元词库;基于Bootstrapping技术自动构建大规模的标注语料库,利用知识元评分模型和模式评分模型控制标注的质量;基于已标注多类型知识元的语料库训练LSTM-CRF模型,从文本中抽取新的知识元。【结果】基于17 756篇ACL论文摘要抽取“研究范畴”、“研究方法”、“实验数据”、“评价指标及取值”这4种知识元,其人工评价平均正确率为91%。【局限】模型参数的预设与调整需要人工参与,未对不同领域文本进行适用性验证。【结论】引入知识元与模式的评分模型,能够有效缓解“语义漂移”问题;基于深度学习模型抽取知识元实现快速且正确率高,为情报大数据智能分析提供了一种高效可靠的数据获取手段。
其他语种文摘	[Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method.[Methods] First,we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier.Second,we created a large annotated corpus based on the bootstrapping method.Third,we controlled the quality of annotation with the estimation models of patterns and knowledge units.Finally,we trained the proposed LSTM-CRF model with the annotated corpus,and extracted new knowledge units from texts.[Results] We retrieved four types of knowledge units(study scope,research method,experimental data,as well as evaluation criteria and their values) from 17,756 ACL papers.The average precision was 91%,which was calculated manually.[Limitations] The parameters of models were pre-defined and modified by human.More research is needed to evaluate the performance of this method with texts from other domains.[Conclusions] The proposed model effectively addresses the issue of semantic drifting.It could extract knowledge units precisely,which is an effective solution for the big data acquisition process of intelligence analysis.
来源	数据分析与知识发现 ,2019,3(1):38-45 【扩展库】
DOI	10.11925/infotech.2096-3467.2018.1352
关键词	知识元抽取 ; 命名实体识别 ; 深度学习 ; Bootstrapping ; LSTM-CRF
地址	1. 中国科学院文献情报中心, 北京, 100190 2. 资源与环境信息系统国家重点实验室, 资源与环境信息系统国家重点实验室, 北京, 100101 3. 中国科学院大学图书情报与档案管理系, 北京, 100190
语种	中文
文献类型	研究性论文
ISSN	2096-3467
学科	自动化技术、计算机技术
基金	国家自然科学基金项目 ; 国家社会科学基金 ; 中国科学院文献情报中心青年创新团队项目
文献收藏号	CSCD:6552313

参考文献共 27 共2页

引证文献 6 篇

1 周海晨学术全文本的学术创新贡献识别探索情报学报,2020,39(8):845-851
被引 3 次

2 李娇基于多因子算法的自动分类研究数据分析与知识发现,2020,4(11):43-51
被引 0 次

显示所有6篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号