Tsuyoshi Okita -muenchen.de
[Pages:49]Statistical Machine Translation: Decoding
Tsuyoshi Okita
Ludwig-Maximilian-Universitat Munich
November 22, 2016
Credit: Slides by Ales Tamchyna
Outline
What features are used in PBMT? How to compute the score of a translation? Search for the best translation: decoding.
Overview of the translation process. Making decoding tractable: beam search. Other decoding algorithms.
Log-Linear Model
We know how to score a full translation hypothesis: P(e, a|f ) exp i fi (e, a, f )
i
i ... feature weights fi ... feature functions
Log-Linear Model: Features
Typical baseline feature set for PBMT: Phrase translation probability, both direct and inverse: PTM (e|f ) PTMinv (f |e) Lexical translation probability (direct and inverse): Plex (e|f ) Plexinv (f |e) Language model probability: PLM (e) Phrase penalty. Word penalty. Distortion penalty.
Lexical Weights (Plex )
The problem: many extracted phrases are rare. (Esp. long phrases might only be seen once in the parallel corpus.)
Lexical Weights (Plex )
The problem: many extracted phrases are rare. (Esp. long phrases might only be seen once in the parallel corpus.)
P("modr?y autobus prist?al na Marsu"|"a blue bus lands on Mars") = 1 P("a blue bus lands on Mars"|"modr?y autobus prist?al na Marsu") = 1 Is that a reliable probability estimate?
Lexical Weights (Plex )
The problem: many extracted phrases are rare. (Esp. long phrases might only be seen once in the parallel corpus.)
P("; distortion carried - over"|"; zkreslen?i") = 1 P("; zkreslen?i"|"; distortion carried - over") = 1
Data from the "wild" are noisy. Word alignment contains errors. This is a real phrase pair from our best English-Czech system. Both PTM (e|f ) and PTMinv (f |e) say that this is a perfect translation.
Lexical Weights (Plex )
Decompose the phrase pair into word pairs. Look at the word-level translation probabilities. Several possible definitions, e.g.:
Plex (e|f, a) =
le
1 |i|(i, j) a|
w (ej , fi )
j =1
(i ,j )a
psac?i
0.1
a
0.3
stroj
0.2
typewriter
Plex ("a typewriter"|"psac?i stroj") =
1 1
?0.1
?
1 2
?(0.3+0.2)
= 0.025
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- cis 526 assignment 4 report
- enterobactin and salmochelin β lactam conjugates induce
- at t firstnet msi lex l11 compensation and log credit
- statistical machine translation decoding
- komuanlno podjetje log
- salus animarum suprema lex festschrift für offizial max
- vol 71 no 10 october 2015 a star is re born
- l appalto by sergio grea
- explanation of leave and earnings statement les
- a terapia do som portuguese edition by clederson paduani
Related searches
- globo de noticias de hoje
- ultimas noticias de estados unidos de america
- noticias de portugal de hoje
- mapa de estados de mexico
- noticias de esportes de hoje
- jornal de angola de hoje
- jornal canal de mocambique de hoje
- agendamento de carteira de trabalho
- jornal de noticias de hoje
- jornal de noticias de angola
- jornal de noticias de portugal
- massa de torta de frango de liquidificador