帮助 关于我们

返回检索结果

中文电子病历命名实体识别的研究与进展
Research and Development of Named Entity Recognition in Chinese Electronic Medical Record

查看参考文献110篇

杜晋华 1   尹浩 1 *   冯嵩 2  
文摘 海量电子病历(Electronic Medical Record,EMR)数据是支撑医疗智能化研究的重要原料,然而电子病历文本数据的半结构化甚至无结构化特点,造成后续对其分析利用的极大困难.虽然近年来基于深度学习的命名实体识别(Named Entity Recognition,NER)成为对电子病历进行自动化信息抽取的核心技术,但鉴于中文电子病历(Chinese Electronic Medical Record,CEMR)具有包括病历文本的非规范性与专业性、医疗实体的独特性和标注语料的稀缺性在内的独特文本数据特征,该研究目前仍存在诸多挑战.本文对中文电子病历命名实体识别的研究与进展进行了综述,系统梳理了命名实体识别的概念、相关理论模型以及制约中文电子病历命名实体识别准确率和识别效率的主要原因;从技术发展角度详细分析了中文电子病历命名实体识别方法的变革历程;并对中文电子病历命名实体识别效果做了实验验证与深入分析,指出了现有模型的不足与改进方向.鉴于国内近年来与中文信息学处理相关的测评会议CCKS持续关注中文电子病历命名实体识别,本文特别对CCKS在该领域五年来的全部代表性测评论文做了纵横对比分析,并通过在主流模型上的深入实验与研究,为后续该领域的继续推进寻求了思路.
其他语种文摘 Massive electronic medical record(EMR) data is an important raw material to support the research of medical intelligence, but the semi-structured or even unstructured characteristics of EMR text data make it extremely difficult to analyze and utilize them subsequently. Although named entity recognition(NER) based on deep learning has become a core technology for automated information extraction from electronic medical records in recent years, there are still many challenges in this research given the unique textual data characteristics of Chinese electronic medical record(CEMR), including the non-normative and specialized nature of medical record text, the uniqueness of medical entities and the scarcity of annotated corpus. This paper provides an overview of the research and progress of named entity recognition in Chinese electronic medical records, systematically sorting out the concept of named entity recognition, related theoretical models and the main reasons limiting the accuracy and efficiency of named entity recognition in Chinese electronic medical records; analyzes in detail the change history of named entity recognition methods in Chinese electronic medical records from the perspective of technical development; and makes an experimental verification and in-depth analysis of the effect of named entity recognition in Chinese electronic medical records, and points out the shortcomings and improvement directions of existing models. In view of the fact that CCKS, a domestic evaluation conference related to Chinese informatics processing, has continued to focus on the recognition of named entities in Chinese electronic medical records in recent years, this paper presents a longitudinal and cross-sectional analysis of all the representative evaluation papers of CCKS in this field over the past five years, and seeks ideas for the continued advancement of this field through in-depth experiments and research on the mainstream model.
来源 电子学报 ,2022,50(12):3030-3053 【核心库】
DOI 10.12263/DZXB.20220485
关键词 中文电子病历 ; 命名实体识别 ; 深度学习 ; 预训练模型 ; 自然语言处理 ; 医疗信息化
地址

1. 清华大学, 北京信息科学与技术国家研究中心, 北京, 100084  

2. 中南大学湘雅医院, 湖南, 长沙, 410008

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 自动化技术、计算机技术
基金 国家重点研发计划 ;  国家自然科学基金
文献收藏号 CSCD:7415480

参考文献 共 110 共6页

1.  ScienceChina 中国科学文献服务系统

您还没有权限

 


请您 返回ScienceChina—中国科学文献服务系统首页重新检索,如果您在使用ScienceChina—中国科学文献服务系统遇到问题。

销售咨询联系:

北京中科进出口有限责任公司

联系电话: (010) 84039345-635

电子邮件:chuw@bjzhongke.com.cn

联系地址:北京市东城区安定门外大街138号皇城国际大厦B座801 100011

服务咨询联系:

中国科学院文献情报中心

联系电话: (010) 82627496

传 真:(010) 82627496

电子邮件:cscd@mail.las.ac.cn

联系地址:北京市 海淀区 北四环西路33号 100190

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号