调序规则表的深度过滤研究
Research of Deep Filtering Lexical Reordering Table
查看参考文献17篇
文摘
|
机器翻译系统中调序规则表和翻译表一般规模都很大,对翻译表进行优化过滤一直都是研究热点,而过滤调序规则表的研究却近乎空白。将调序规则表的过滤当成短文本分类问题,提出了一种基于自动编码机(Autoencoder)的调序规则表过滤模型。该模型首先使用一种基于自动编码机的分类器对调序规则进行打分评价,然后对调序规则表进行基于最小差异策略的过滤,最后使用过滤得到的调序规则表重新计算调序规则得分表用于机器翻译的解码过程。实验表明,在公开的英汉语料和维汉语料上使用该模型,可以在调序规则表减少40%的基础上分别将BLEU(bilingual evaluation understudy)值提高0.19和0.26。 |
其他语种文摘
|
In statistical machine translation system, lexical reordering table and phrase-table are always huge. Tuning and filtering the phrase-table has been research focus long time, while few researchers focus on filtering the lexical reordering table. This paper treats filtering lexical reordering table as the problem of short text classification, proposes a filtering model of lexical reordering table based on Autoencoder. This model uses the Autoencoder to score the reordering rules firstly, then filters the lexical reordering table by minimal difference strategy, finally recalculates lexical reordering score table used for machine translation decoding. The experimental results show that the size of lexical reordering table reduces 40% while the BLEU (bilingual evaluation understudy) increases 0.19 and 0.26 by using the proposed model on public English-Chinese corpus and Uyghur-Chinese corpus. |
来源
|
计算机科学与探索
,2017,11(5):785-793 【核心库】
|
DOI
|
10.3778/j.issn.1673-9418.1603056
|
关键词
|
自动编码机
;
过滤模型
;
调序规则表
;
机器翻译
|
地址
|
1.
中国科学院新疆理化技术研究所, 新疆民族语音语言信息处理重点实验室, 乌鲁木齐, 830011
2.
中国科学院新疆理化技术研究所, 乌鲁木齐, 830011
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1673-9418 |
学科
|
自动化技术、计算机技术 |
基金
|
国家高技术研究发展计划(863计划)
;
中国科学院战略性先导科技专项
;
中国科学院“西部之光“项目
|
文献收藏号
|
CSCD:5978162
|
参考文献 共
17
共1页
|
1.
Koehn P. Moses:open source toolkit for statistical machine translation.
Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic, Jun 23-30, 2007,2007:177-180
|
被引
1
次
|
|
|
|
2.
Stolcke A. SRILM-an extensible language modeling toolkit.
Proceedings of the 2002 International Conference on Spoken Language Processing,2002:1409-1412
|
被引
1
次
|
|
|
|
3.
Brown P F. The mathematics of statistical machine translation:parameter estimation.
Computational linguistics,1993,19(2):263-311
|
被引
93
次
|
|
|
|
4.
Bengio Y. Neural probabilistic language models.
Innovations in Machine Learning,2006:137-186
|
被引
23
次
|
|
|
|
5.
Deng Li. Binary coding of speech spectrograms using a deep auto-encoder.
Proceedings of the 11th Annual Conference of the International Speech Communication Association,2010:1692-1695
|
被引
1
次
|
|
|
|
6.
Graves A. Framewise phoneme classification with bidirectional LSTM and other neural network architectures.
Neural Networks,2005,18(5):602-610
|
被引
236
次
|
|
|
|
7.
Roska T. The CNN universal machine:an analogic array computer.
IEEE Transactions on Circuits and Systems II:Analog and Digital Signal Processing,1993,40(3):163-173
|
被引
14
次
|
|
|
|
8.
殷乐. 基于虚拟上下文的统计机器翻译短语表的过滤.
中文信息学报,2013,27(6):139-144
|
被引
1
次
|
|
|
|
9.
狄萍. 基于短语的统计机器翻译中短语表的过滤.
计算机应用与软件,2011,28(5):28-30
|
被引
1
次
|
|
|
|
10.
Zens R. A systematic comparison of phrase table pruning techniques.
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, Jul 12-14, 2012,2012:972-983
|
被引
1
次
|
|
|
|
11.
Koehn P. Statistical phrase-based translation.
Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, Edmonton, Canada, May 27- Jun 1, 2003,2003:48-54
|
被引
11
次
|
|
|
|
12.
Tillmann C. A localized prediction model for statistical machine translation.
Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Michigan, USA, Jun 25-30, 2005,2005:557-564
|
被引
1
次
|
|
|
|
13.
Xiong Deyi. Maximum entropy based phrase reordering model for statistical machine translation.
Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, Jul 17-21, 2006,2006:521-528
|
被引
5
次
|
|
|
|
14.
Li Peng. Recursive autoencoders for ITGbased translation.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, USA, Oct 18-21, 2013,2013:567-577
|
被引
1
次
|
|
|
|
15.
肖欣延. 面向层次短语翻译的词汇化调序方法研究.
中文信息学报,2012,26(1):37-41
|
被引
5
次
|
|
|
|
16.
Wang Chao. Chinese syntactic reordering for statistical machine translation.
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, Jun 28-30, 2007,2007:737-745
|
被引
1
次
|
|
|
|
17.
Papineni K. BLEU:a method for automatic evaluation of machine translation.
Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, USA, Jul 6-12, 2002,2002:311-318
|
被引
1
次
|
|
|
|
|