帮助关于我们

返回检索结果

基于Transformer编码器的中文命名实体识别模型
Chinese Named Entity Recognition Model Based on Transformer Encoder

查看参考文献20篇

司逸晨管有庆

文摘	命名实体识别是自然语言处理中的重要任务,且中文命名实体识别相比于英文命名实体识别任务更具难度。传统中文实体识别模型通常基于深度神经网络对文本中的所有字符打上标签,再根据标签序列识别命名实体,但此类基于字符的序列标注方式难以获取词语信息。提出一种基于Transformer编码器的中文命名实体识别模型,在字嵌入过程中使用结合词典的字向量编码方法使字向量包含词语信息,同时针对Transformer编码器在注意力运算时丢失字符相对位置信息的问题,改进Transformer编码器的注意力运算并引入相对位置编码方法,最终通过条件随机场模型获取最优标签序列。实验结果表明,该模型在Resume和Weibo中文命名实体识别数据集上的F1值分别达到94.7%和58.2%,相比于基于双向长短期记忆网络和ID-CNN的命名实体识别模型均有所提升,具有更优的识别效果和更快的收敛速度。
其他语种文摘	Named Entity Recognition(NER)is an important task in Natural Language Processing(NLP),and compared with English NER,Chinese NER is often more difficult to achieve.Traditional Chinese entity recognition models are usually based on deep neural networks used to label all characters in the text.Although they identify named entities according to the label sequence,such character-based labeling methods have difficulty obtaining the word information.To address this problem, this paper proposes a Chinese NER model based on the Transformer encoder.In the word embedding layer of the model,the word vector coding method is used in combination with a dictionary,such that the char vector contains the word information. At the same time,to solve the problem in which the Transformer encoder loses the relative position information of the characters during an attention calculation,this paper modifies the attention calculation method of the Transformer encoder and introduces a relative position coding method.Finally,a Conditional Random Field(CRF)model is introduced to obtain the optimal tag sequence.The experimental results show that the F1 value of this model when applied to the Resume dataset reaches 94.7%,and on the Weibo dataset reaches 58.2%,which are improvements in comparison with traditional NER models based on a Bidirectional Long Short-Term Memory(BiLSTM)network and Iterated Dilated Convolution Neural Network (ID-CNN).In addition,it achieves a better recognition and faster convergence speed.
来源	计算机工程 ,2022,48(7):66-72 【扩展库】
DOI	10.19678/j.issn.1000-3428.0061432
关键词	自然语言处理 ; 中文命名实体识别 ; Transformer编码器 ; 条件随机场 ; 相对位置编码
地址	南京邮电大学物联网学院, 南京, 210003
语种	中文
文献类型	研究性论文
ISSN	1000-3428
学科	自动化技术、计算机技术
基金	江苏省高校自然科学基础研究项目
文献收藏号	CSCD:7251236

参考文献共 20 共1页

引证文献 3 篇

1 史占堂基于CNN-Head Transformer编码器的中文命名实体识别计算机工程,2022,48(10):73-80
被引 4 次

2 王春东基于矫正理解的中文文本对抗样本生成方法计算机工程,2023,49(2):37-45
被引 0 次

显示所有3篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号