文摘
|
海量电子病历(Electronic Medical Record,EMR)数据是支撑医疗智能化研究的重要原料,然而电子病历文本数据的半结构化甚至无结构化特点,造成后续对其分析利用的极大困难.虽然近年来基于深度学习的命名实体识别(Named Entity Recognition,NER)成为对电子病历进行自动化信息抽取的核心技术,但鉴于中文电子病历(Chinese Electronic Medical Record,CEMR)具有包括病历文本的非规范性与专业性、医疗实体的独特性和标注语料的稀缺性在内的独特文本数据特征,该研究目前仍存在诸多挑战.本文对中文电子病历命名实体识别的研究与进展进行了综述,系统梳理了命名实体识别的概念、相关理论模型以及制约中文电子病历命名实体识别准确率和识别效率的主要原因;从技术发展角度详细分析了中文电子病历命名实体识别方法的变革历程;并对中文电子病历命名实体识别效果做了实验验证与深入分析,指出了现有模型的不足与改进方向.鉴于国内近年来与中文信息学处理相关的测评会议CCKS持续关注中文电子病历命名实体识别,本文特别对CCKS在该领域五年来的全部代表性测评论文做了纵横对比分析,并通过在主流模型上的深入实验与研究,为后续该领域的继续推进寻求了思路. |
其他语种文摘
|
Massive electronic medical record(EMR) data is an important raw material to support the research of medical intelligence, but the semi-structured or even unstructured characteristics of EMR text data make it extremely difficult to analyze and utilize them subsequently. Although named entity recognition(NER) based on deep learning has become a core technology for automated information extraction from electronic medical records in recent years, there are still many challenges in this research given the unique textual data characteristics of Chinese electronic medical record(CEMR), including the non-normative and specialized nature of medical record text, the uniqueness of medical entities and the scarcity of annotated corpus. This paper provides an overview of the research and progress of named entity recognition in Chinese electronic medical records, systematically sorting out the concept of named entity recognition, related theoretical models and the main reasons limiting the accuracy and efficiency of named entity recognition in Chinese electronic medical records; analyzes in detail the change history of named entity recognition methods in Chinese electronic medical records from the perspective of technical development; and makes an experimental verification and in-depth analysis of the effect of named entity recognition in Chinese electronic medical records, and points out the shortcomings and improvement directions of existing models. In view of the fact that CCKS, a domestic evaluation conference related to Chinese informatics processing, has continued to focus on the recognition of named entities in Chinese electronic medical records in recent years, this paper presents a longitudinal and cross-sectional analysis of all the representative evaluation papers of CCKS in this field over the past five years, and seeks ideas for the continued advancement of this field through in-depth experiments and research on the mainstream model. |
来源
|
电子学报
,2022,50(12):3030-3053 【核心库】
|
DOI
|
10.12263/DZXB.20220485
|
关键词
|
中文电子病历
;
命名实体识别
;
深度学习
;
预训练模型
;
自然语言处理
;
医疗信息化
|
地址
|
1.
清华大学, 北京信息科学与技术国家研究中心, 北京, 100084
2.
中南大学湘雅医院, 湖南, 长沙, 410008
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
0372-2112 |
学科
|
自动化技术、计算机技术 |
基金
|
国家重点研发计划
;
国家自然科学基金
|
文献收藏号
|
CSCD:7415480
|