文摘
|
为了保留蒙古语词缀中大量的语法、语义信息和缩小蒙古语词典的规模, 蒙古语词性标注需要对词干和词缀都进行词性标注. 针对这一问题提出了一种基于条件随机场(CRF)的蒙古语词性标注方法. 该方法利用CRF模型能够添加任意特征的特点, 充分使用蒙文上下文信息, 针对词素之间的相互影响添加了新的统计特征, 并在3.8万句的蒙古语词性标注语料上进行了封闭测试, 该方法的标注准确率达到了96.65%, 优于使用隐马尔可夫模型 (HMM)的词性标注模型 |
其他语种文摘
|
It is necessary to tag both stem and affix in the Mongolian part of speech tagging, in order to save lots of syntax and semantic information of affix and to reduce the size of Mongolian dictionary. This paper presented a new approach of Mongolian part of speech tagging based on CRF. To take advantage of the ability of using arbitrary features as input in CRF, the system exploited not only the contexts of words, but also new statistical features adopted for mutual influence between the morphemes. The system was tested in the 38000 part-of-speech dataset provided by Inner Mongolia University. The closed test results show that POS tagging accuracy of the testing set reaches 96. 65%, outperforming the HMM-based model |
来源
|
计算机应用
,2010,30(8):2038-2040 【核心库】
|
关键词
|
词干
;
词缀
;
条件随机场
;
词性标注
;
词素
|
地址
|
中国科学院合肥智能机械研究所, 合肥, 230031
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1001-9081 |
学科
|
自动化技术、计算机技术 |
基金
|
中国科学院知识创新工程项目
|
文献收藏号
|
CSCD:3909767
|