帮助 关于我们

返回检索结果

本体与条件随机场结合的涉农商品名称抽取与类别标注
Agriculture-related product name extraction and category labeling based on ontology and conditional random field

查看参考文献24篇

文摘 传统的基于条件随机场(CRF)的信息抽取方法在进行涉农商品名称抽取与类别标注时,需要大量的训练语料,标注工作量大,且抽取精度不高。为解决该问题,提出了一种基于农业本体与CRF相结合的涉农商品名称抽取与类别标注方法,将涉农商品名称的自动抽取与分类看作序列标注的任务。首先是原始数据的分词处理和词、词性、地理属性、本体概念特征选择; 然后,采用改进的拟牛顿算法训练CRF模型参数,用维特比算法实现解码,共完成4组对比实验,识别出7种类别,并将CRF和隐马尔可夫模型(HMM) 、最大熵马尔可夫模型(MEMM)通过实验进行比较;最后,将CRF应用于农产品供求趋势分析。结合合适的特征模板,本体概念的加入使CRF开放测试的总体准确率提高10.20%,召回率提高59.78%,F值提高37.17%,证明了本体与CRF结合方法在涉农商品名称和类别抽取中的可行性和有效性,可以促进农产品供求对接。
其他语种文摘 Traditional information extraction method based on Conditional Random Field (CRF) requires large-scale labeled corpus,it is expensive to label corpus manually and the extraction precision is low in processing agriculture-related product name extraction and category labeling. In order to solve this problem,a method of agriculture-related product name extraction and category labeling based on agricultural ontology and CRF was proposed,automatic extraction and classification of agriculture-related product names was regarded as sequence labeling. Firstly,original data was processed,word,part of speech,geographical attributes and ontology concept features were selected. Then,parameters of the CRF model were trained by the improved quasi-Newton algorithm and decoding was implemented by Viterbi algorithm. A total of four groups of comparative experiments were completed and seven categories were identified. CRF,Hidden Markov Model (HMM) and Maximum Entropy Markov Model (MEMM) were compared through experiments. Finally,the supply and demand trend analysis of agriculture produce was accomplished. The experimental results show that the overall precision,recall and F-score of the open test were increased by 10.20%,59.78% and 37.17% respectively by adding ontology concepts with appropriate CRF features; it also proves the feasibility,effectiveness and practical significance of the method in promoting automatic supply and demand docking of agricultural products.
来源 计算机应用 ,2017,37(1):233-238 【扩展库】
DOI 10.11772/j.issn.1001-9081.2017.01.0233
关键词 条件随机场 ; 农业本体 ; 涉农商品名称 ; 供求趋势 ; 序列标注
地址

中国科学院合肥智能机械研究所, 合肥, 230031

语种 中文
文献类型 研究性论文
ISSN 1001-9081
学科 自动化技术、计算机技术
基金 国家科技支撑计划项目 ;  中国科学院重点部署项目 ;  安徽省科技攻关项目
文献收藏号 CSCD:5897675

参考文献 共 24 共2页

1.  ScienceChina 中国科学文献服务系统

您还没有权限

 


请您 返回ScienceChina—中国科学文献服务系统首页重新检索,如果您在使用ScienceChina—中国科学文献服务系统遇到问题。

销售咨询联系:

北京中科进出口有限责任公司

联系电话: (010) 84039345-635

电子邮件:chuw@bjzhongke.com.cn

联系地址:北京市东城区安定门外大街138号皇城国际大厦B座801 100011

服务咨询联系:

中国科学院文献情报中心

联系电话: (010) 82627496

传 真:(010) 82627496

电子邮件:cscd@mail.las.ac.cn

联系地址:北京市 海淀区 北四环西路33号 100190

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号