中文电子病历命名实体识别的研究与进展
Research and Development of Named Entity Recognition in Chinese Electronic Medical Record
查看参考文献110篇
文摘
|
海量电子病历(Electronic Medical Record,EMR)数据是支撑医疗智能化研究的重要原料,然而电子病历文本数据的半结构化甚至无结构化特点,造成后续对其分析利用的极大困难.虽然近年来基于深度学习的命名实体识别(Named Entity Recognition,NER)成为对电子病历进行自动化信息抽取的核心技术,但鉴于中文电子病历(Chinese Electronic Medical Record,CEMR)具有包括病历文本的非规范性与专业性、医疗实体的独特性和标注语料的稀缺性在内的独特文本数据特征,该研究目前仍存在诸多挑战.本文对中文电子病历命名实体识别的研究与进展进行了综述,系统梳理了命名实体识别的概念、相关理论模型以及制约中文电子病历命名实体识别准确率和识别效率的主要原因;从技术发展角度详细分析了中文电子病历命名实体识别方法的变革历程;并对中文电子病历命名实体识别效果做了实验验证与深入分析,指出了现有模型的不足与改进方向.鉴于国内近年来与中文信息学处理相关的测评会议CCKS持续关注中文电子病历命名实体识别,本文特别对CCKS在该领域五年来的全部代表性测评论文做了纵横对比分析,并通过在主流模型上的深入实验与研究,为后续该领域的继续推进寻求了思路. |
其他语种文摘
|
Massive electronic medical record(EMR) data is an important raw material to support the research of medical intelligence, but the semi-structured or even unstructured characteristics of EMR text data make it extremely difficult to analyze and utilize them subsequently. Although named entity recognition(NER) based on deep learning has become a core technology for automated information extraction from electronic medical records in recent years, there are still many challenges in this research given the unique textual data characteristics of Chinese electronic medical record(CEMR), including the non-normative and specialized nature of medical record text, the uniqueness of medical entities and the scarcity of annotated corpus. This paper provides an overview of the research and progress of named entity recognition in Chinese electronic medical records, systematically sorting out the concept of named entity recognition, related theoretical models and the main reasons limiting the accuracy and efficiency of named entity recognition in Chinese electronic medical records; analyzes in detail the change history of named entity recognition methods in Chinese electronic medical records from the perspective of technical development; and makes an experimental verification and in-depth analysis of the effect of named entity recognition in Chinese electronic medical records, and points out the shortcomings and improvement directions of existing models. In view of the fact that CCKS, a domestic evaluation conference related to Chinese informatics processing, has continued to focus on the recognition of named entities in Chinese electronic medical records in recent years, this paper presents a longitudinal and cross-sectional analysis of all the representative evaluation papers of CCKS in this field over the past five years, and seeks ideas for the continued advancement of this field through in-depth experiments and research on the mainstream model. |
来源
|
电子学报
,2022,50(12):3030-3053 【核心库】
|
DOI
|
10.12263/DZXB.20220485
|
关键词
|
中文电子病历
;
命名实体识别
;
深度学习
;
预训练模型
;
自然语言处理
;
医疗信息化
|
地址
|
1.
清华大学, 北京信息科学与技术国家研究中心, 北京, 100084
2.
中南大学湘雅医院, 湖南, 长沙, 410008
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
0372-2112 |
学科
|
自动化技术、计算机技术 |
基金
|
国家重点研发计划
;
国家自然科学基金
|
文献收藏号
|
CSCD:7415480
|
参考文献 共
110
共6页
|
1.
国家卫健委.
关于印发电子病历应用管理规范(试行)的通知,2017
|
CSCD被引
1
次
|
|
|
|
2.
马欢欢. 中文电子病历命名实体识别方法研究.
医学信息学杂志,2020,41(4):24-29
|
CSCD被引
2
次
|
|
|
|
3.
辛海燕. 医院医疗科研大数据平台的建设与应用.
中国卫生信息管理杂志,2019,16(2):206-209
|
CSCD被引
1
次
|
|
|
|
4.
崔博文. 自由文本电子病历信息抽取综述.
计算机应用,2021,41(4):1055-1063
|
CSCD被引
8
次
|
|
|
|
5.
付秀. 基于智能预问诊的全景多学科会诊平台的设计与应用.
中国数字医学,2021,16(10):79-82
|
CSCD被引
2
次
|
|
|
|
6.
吴宗友. 电子病历文本挖掘研究综述.
计算机研究与发展,2021,58(3):513-527
|
CSCD被引
10
次
|
|
|
|
7.
杨锦锋. 电子病历命名实体识别和实体关系抽取研究综述.
自动化学报,2014,40(8):1537-1562
|
CSCD被引
54
次
|
|
|
|
8.
.
全国知识图谱与语义计算大会.CCKS 2021评测二:电子病历命名实体识别,2021
|
CSCD被引
1
次
|
|
|
|
9.
程楠. 基于NLP技术后结构化处理的电子病历应用.
河南医学研究,2021,30(24):4510-4513
|
CSCD被引
1
次
|
|
|
|
10.
Nadeau D. A survey of named entity recognition and classification.
Lingvisticae Investigationes,2007,30:3-26
|
CSCD被引
70
次
|
|
|
|
11.
Cortes C. Support-vector networks.
Machine Learning,1995,20(3):273-297
|
CSCD被引
2489
次
|
|
|
|
12.
Lafferty J D. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
International Conference on Machine Learning,2001:282-289
|
CSCD被引
6
次
|
|
|
|
13.
Ke X. Chinese organization name recognition based on co-training algorithm.
2008 3rd International Conference on Intelligent System and Knowledge Engineering,2008:771-777
|
CSCD被引
1
次
|
|
|
|
14.
Ando R. A framework for learning predictive structures from multiple tasks and unlabeled data.
Journal of Machine Learning Research,2005,6:1817-1853
|
CSCD被引
6
次
|
|
|
|
15.
Hochreiter S. Long short-term memory.
Neural Computation,1997,9(8):1735-1780
|
CSCD被引
3673
次
|
|
|
|
16.
Wang Z H. Attention-based bidirectional long short-term memory networks for relation classification using knowledge distillation from BERT.
2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress,2020:562-568
|
CSCD被引
1
次
|
|
|
|
17.
曹春萍. 基于E-CNN和BLSTM-CRF的临床文本命名实体识别.
计算机应用研究,2019,36(12):3748-3751
|
CSCD被引
6
次
|
|
|
|
18.
Strubell E. Fast and accurate entity recognition with iterated dilated convolutions.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,2017:2670-2680
|
CSCD被引
25
次
|
|
|
|
19.
许力. 基于BERT和BiLSTM-CRF的生物医学命名实体识别.
计算机工程与科学,2021,43(10):1873-1879
|
CSCD被引
8
次
|
|
|
|
20.
Mikolov T. Efficient estimation of word representations in vector space.
International Conference on Learning Representations,2013:1-12
|
CSCD被引
23
次
|
|
|
|
|