On the Integration of Linguistic Features into Statistical ...

On the Integration of Linguistic Features into Statistical and Neural Machine Translation

Eva Odette Jef Vanmassenhove

B.A., M.A., M.Sc.

A dissertation submitted in fulfillment of the requirements for the award of

Doctor of Philosophy (Ph.D.)

to the

Dublin City University School of Computing

Supervisor: Prof. Andy Way

2019

I hereby certify that this material, which I now submit for assessment on the program of study leading to the award of Ph.D. is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work.

Signed:

ID No.: 15211377

Date: 09/09/2019

Aan Dimitar, Mama en Papa.

Contents

Abstract

xii

Acknowledgements

xiii

1 Introduction

2

1.1 Motivation and Research Questions . . . . . . . . . . . . . . . . . . . 3

1.1.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Relation to Published work . . . . . . . . . . . . . . . . . . . 8

1.2.2 Invited talks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.3 Additional Publications . . . . . . . . . . . . . . . . . . . . . 12

2 Background and Related Work

15

2.1 Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.1 Statistical Machine Translation . . . . . . . . . . . . . . . . . 16

2.1.2 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 19

2.1.3 Automatic Evaluation Metrics . . . . . . . . . . . . . . . . . . 29

2.2 Linguistics in Machine Translation . . . . . . . . . . . . . . . . . . . 30

2.2.1 Statistical Machine Translation . . . . . . . . . . . . . . . . . 33

2.2.2 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 35

2.3 Bias in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . 39

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

i

3 Subject-Verb Number Agreement in Statistical and Neural Ma-

chine Translation

44

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2.1 Statistical Machine Translation . . . . . . . . . . . . . . . . . 47

3.2.2 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 49

3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3.1 Modeling of the Source Language . . . . . . . . . . . . . . . . 53

3.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.4.1 Automatic Evaluation . . . . . . . . . . . . . . . . . . . . . . 57

3.4.2 Manual Error Evaluation . . . . . . . . . . . . . . . . . . . . . 60

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Aspect and Tense in Statistical and Neural Machine Translation 68 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.1 Statistical Machine Translation . . . . . . . . . . . . . . . . . 77 4.2.2 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 78 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.1 Statistical Machine Translation . . . . . . . . . . . . . . . . . 80 4.3.2 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 91 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4.1 Logistic Regression Model . . . . . . . . . . . . . . . . . . . . 94 4.4.2 Aspect in NMT/PB-SMT Translations . . . . . . . . . . . . . 96 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Integrating Semantic Supersenses and Syntactic Supertags into

Neural Machine Translation Systems

103

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

ii

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2.1 Statistical Machine Translation . . . . . . . . . . . . . . . . . 107 5.2.2 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 109

5.3 Semantics and Syntax in Neural Machine Translation . . . . . . . . . 111 5.3.1 Supersense Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.3.2 Supertags and POS-tags . . . . . . . . . . . . . . . . . . . . . 115

5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4.2 Description of the Neural Machine Translation System . . . . 116

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.5.1 English?French . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.5.2 English?German . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6 Gender Agreement in Neural Machine Translation

128

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.2.1 Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.2.2 Gender Prediction . . . . . . . . . . . . . . . . . . . . . . . . 136

6.2.3 Statistical Machine Translation . . . . . . . . . . . . . . . . . 138

6.2.4 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 140

6.3 Gender Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.4 Compilation of Datasets . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.4.1 Analysis of the EN?FR Annotated Dataset . . . . . . . . . . . 145

6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.5.2 Description of the NMT Systems . . . . . . . . . . . . . . . . 148

6.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

iii

7 Loss and Decay of Linguistic Richness in Neural and Statistical

Machine Translation

156

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.2.1 Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

7.2.2 Statistical Machine Translation . . . . . . . . . . . . . . . . . 162

7.2.3 Neural Machine Translation . . . . . . . . . . . . . . . . . . . 162

7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.3.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 166

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7.4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

8 Conclusions and Future Work

186

8.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

8.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

8.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Bibliography

196

iv

List of Figures

2.1 The noisy channel model of SMT (Jurafsky and Martin, 2014). . . . . 17 2.2 One-to-many relation between the French word `avant-hier' and its

English translation that consists of multiple words `the day before yesterday'. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 An encoder?decoder architecture consisting of three parts: the encoder encoding the English input sequence X ("Live long and prosper!"), the fixed-length encoded vector v generated by the encoder and the decoder generating the Klingon output sequence Y ("qaStaHvIS yIn 'ej chep!") from v. . . . . . . . . . . . . . . . . . . . . . . 21 2.4 The encoder?decoder architecture with RNNs. The encoder is shown in green and the decoder in blue . . . . . . . . . . . . . . . . . . . . . 22 2.5 BPE operations on a toy dictionary {`low',`lowest', `newer', `wider'} (Sennrich et al., 2016c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 BPE subwords of `stormtroopers' (Vanmassenhove and Way, 2018b). 28

3.1 One-to-many relation between English verb `work' and some of its possible translations in French . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Many-to-one relation between some of the French translations of the English word `work', mapped to their lemma `TRAVAIL' . . . . . . . 48

5.1 Baseline (BPE) vs Combined (SST?CCG) NMT Systems for EN?FR, evaluated on the newstest2013. . . . . . . . . . . . . . . . . . . . . . 120

v

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download