帮助 关于我们

返回检索结果

融合形态特征的最大熵蒙古文词性标注模型
Fusion of Morphological Features for Mongolian Part of Speech Based on Maximum Entropy Model

查看参考文献15篇

文摘 最大熵模型以其能够较好地包容各种约束信息及与自然语言模型相适应等优点在词性标注研究中取得了良好的效果。因此,将其作为基本框架,提出了一种融合语言特征的最大熵蒙古文词性标注模型。首先,根据蒙古文构词特点及统计分析结果,定义并选取特征模板,利用训练语料提取了大量的候选特征集合,针对错误或者无效的特征通过设置一些规则筛选特征。然后,训练最大熵概率模型参数。实验结果表明,融合蒙古文形态特征的最大熵模型可以较好地标注蒙古文。
其他语种文摘 Part of speech tagging is one of the basic research for natural language processing fields, which plays an important role on the syntactic analysis, semantic analysis and machine translation, etc. Maximum entropy model is an outstanding statistical model for its good integration of various constraints and it has been favored in the part of speech tagging research. An approach combining linguistic morphological features for Mongolian part of speech tagging based on maximum entropy model is proposed in this paper. Mongolian has great and long history. Nonetheless, there is less research about Mongolian language processing. Mongolian is a typical agglutinative language that is characterized by rich morphology, with a high level of ambiguity. Firstly, based on the analysis of Mongolian scripts, the context feature and internal feature templates are defined and extracted from the training corpus. Then, various morphological features of words are integrated in the maximum entropy model and the IIS algorithm is employed to calculate the parameters of maximum entropy model. Experimental results on the close and open testing set prepared for Mongolian POS tagging task show that the integration of morphological features of the maximum entropy model outperforms the HMM model and can be fitful for Mongolian scripts.
来源 计算机研究与发展 ,2011,48(12):2385-2390 【核心库】
关键词 形态特征 ; 最大熵模型 ; 蒙古文 ; 词性标注 ; 参数估计
地址

1. 合肥学院计算机科学与技术系, 安徽省网络与智能信息处理重点实验室, 合肥, 230601  

2. 内蒙古大学蒙古学学院, 呼和浩特, 010021  

3. 中国科学院合肥物质科学研究院, 合肥, 230001

语种 中文
文献类型 研究性论文
ISSN 1000-1239
学科 自动化技术、计算机技术
基金 国家自然科学基金项目 ;  国家教育部人文社会科学研究项目
文献收藏号 CSCD:4408067

参考文献 共 15 共1页

1.  Brill E. A simple rule-based part of speech tagger. HLT'91: Proc of the Workshop on Speech and Natural Language,1992:112-116 CSCD被引 1    
2.  Brill E. Transformation based error driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics,1995,21(4):543-565 CSCD被引 42    
3.  Black E. Decision tree models applied to the labeling of text with parts-of-speech. HLT'91: Proc of the Workshop on Speech and Natural Language,1992:117-121 CSCD被引 1    
4.  Brants T. TnT: A statistical part of speech tagger. Proc of the 6th Conf on Applied Natural Language Processing,2000:224-231 CSCD被引 1    
5.  Lee S Z. Lexicalized hidden Markov models for part-of-speech tagging. Proc of the 18th Conf on Computational Linguistics,2000:481-487 CSCD被引 1    
6.  Bar-haim R. Part of speech tagging of modern Hebrew text. Natural Language Engineering,2008,14(2):223-251 CSCD被引 1    
7.  Gimenez J. Fast and accurate part of speech tagging: The SVM approach. Proc of the 4th Int Conf on Recent Advances in Natural Language Processing,2010:158-165 CSCD被引 1    
8.  Ratnaparkhi A. A maximum entropy model of part of speech tagging. Proc EMNLP, Computational Linguistics,1996:133-141 CSCD被引 1    
9.  Gamback B. Methods for Amharic part of speech tagging. Proc of the EACL 2009 Workshop on Language Technologies for African Languages--AfLaT,2009:104-111 CSCD被引 1    
10.  赵岩. 融合聚类触发对特征的最大熵词性标注模型. 计算机研究与发展,2006,43(2):268-274 CSCD被引 14    
11.  赵伟. 一种基于改进的最大熵模型的汉语词性自动标注的新方法. 计算机研究与发展,2006,43(增刊):174-178 CSCD被引 2    
12.  Marsi E. Memory-based morphological analysis generation and part of speech tagging of Arabic. Proc of the ACL Workshop on Computational Approaches to Semitic Languages,2005:1-8 CSCD被引 1    
13.  Fadaei H. Persian POS tagging using probabilistic morphological analysis. International Journal of Computer Applications in Technology,2010,38(4):264-273 CSCD被引 1    
14.  乌达巴拉. 基于混合策略的蒙英机器翻译系统,2007 CSCD被引 1    
15.  Malouf R. A comparison of algorithms for maximum entropy parameter estimation. Proc of the 19th Int Conf on Computational Linguistics,2002 CSCD被引 1    
引证文献 6

1 于洪志 融合音节特征的最大熵藏文词性标注研究 中文信息学报,2013,27(5):160-165
CSCD被引 7

2 帕力旦·吐尔逊 融合形态特征的最大熵维吾尔语词性标注 西北大学学报. 自然科学版,2015,45(5):721-726
CSCD被引 0 次

显示所有6篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号