Master Thesis Using Machine Learning Methods for ...
Master Thesis
Using Machine Learning Methods for Evaluating the Quality of Technical Documents
Author: Michael LUCKERT Moritz SCHAEFER-KEHNERT
Supervisor: Prof. Dr. Welf L?WE Examiner: Prof. Dr. Andreas KERREN Reader: Dr. Ola PETERSSON Semester: HT2015 Subject: Computer Science Course: 15HT - 5DV50E/4DV50E
Abstract
In the context of an increasingly networked world, the availability of high quality translations is critical for success in the context of the growing international competition. Large international companies as well as medium sized companies are required to provide well translated, high quality technical documentation for their customers not only to be successful in the market but also to meet legal regulations and to avoid lawsuits. Therefore, this thesis focuses on the evaluation of translation quality, specifically concerning technical documentation, and answers two central questions:
? How can the translation quality of technical documents be evaluated, given the original document is available?
? How can the translation quality of technical documents be evaluated, given the original document is not available?
These questions are answered using state-of-the-art machine learning algorithms and translation evaluation metrics in the context of a knowledge discovery process. The evaluations are done on a sentence level and recombined on a document level by binarily classifying sentences as automated translation and professional translation. The research is based on a database containing 22, 327 sentences and 32 translation evaluation attributes, which are used for optimizations of five different machine learning approaches. An optimization process consisting of 795, 000 evaluations shows a prediction accuracy of up to 72.24% for the binary classification. Based on the developed sentence-based classification systems, documents are classified using recombination of the affiliated sentences and a framework for rating document quality is introduced. Therefore, the taken approach successfully creates a classification and evaluation system.
Contents
List of Figures
IV
List of Tables
V
1 Introduction
1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Purpose and Research Question . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Approach and Methodology . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Scope and Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Target group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Theoretical Background
6
2.1 Knowledge Discovery in Databases . . . . . . . . . . . . . . . . . . . . 6
2.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Instance-Based Learning (kNN) . . . . . . . . . . . . . . . . . . 17
2.3.5 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . 17
2.3.6 Evaluation of Machine Learning . . . . . . . . . . . . . . . . . 19
2.4 Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Rule-Based Machine Translation . . . . . . . . . . . . . . . . . . 22
2.4.2 Example-Based Machine Translation . . . . . . . . . . . . . . . 25
2.5 Evaluation of Machine Translation . . . . . . . . . . . . . . . . . . . . . 27
2.5.1 Round-Trip Translation . . . . . . . . . . . . . . . . . . . . . . 27
2.5.2 Word Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.3 Translation Error Rate . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.4 BLEU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.5 NIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
I
2.5.6 METEOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.6 Technical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Method
35
3.1 Identification of the Data Mining Goal . . . . . . . . . . . . . . . . . . . 37
3.2 Translation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Choice of Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Specification of a Data Mining Approach . . . . . . . . . . . . . . . . . 46
3.5 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Further Use of the Results . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6.1 Document-Based Analysis . . . . . . . . . . . . . . . . . . . . . 49
3.6.2 Proposal of an Evaluation Framework . . . . . . . . . . . . . . . 50
4 Results / Empirical data
52
4.1 Empirical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Research Question 1 . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 Research Question 2 . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.3 Quality Ranking of Technical Documentation . . . . . . . . . . . 65
4.2.4 Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Discussion
67
5.1 Results Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.1 Research Question 1 Sentence-Based . . . . . . . . . . . . . . . 67
5.1.2 Research Question 1 Document-Based . . . . . . . . . . . . . . . 70
5.1.3 Research Question 2 Sentence-Based . . . . . . . . . . . . . . . 73
5.1.4 Research Question 2 Document-Based . . . . . . . . . . . . . . . 74
5.1.5 Comparison of the two Research Questions . . . . . . . . . . . . 75
5.1.6 Evaluation Framework . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Method reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6 Conclusion
84
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Bibliography
87
A Optimization Ranges
91
B Additional Results
92
II
C Detailed Working Steps
93
III
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- undergraduate programme in computer science
- research and thinking of smart home technology
- internet of things iot research challenges and future
- current topics for networking research
- suggested topics for new research proposals
- master thesis using machine learning methods for
- computer science 2020 21
- science projects in renewable energy and energy
- 22 political science code no 028 2019 20
- ten project proposals in artificial intelligence
Related searches
- learning methods in education
- teaching and learning methods pdf
- machine learning audiobook
- teaching learning methods pdf
- matlab machine learning pdf
- probability for machine learning pdf
- machine learning testing
- ai vs machine learning vs deep learning
- machine learning vs deep learning
- machine learning and artificial intelligence
- machine learning vs ai vs deep learning
- difference between machine learning and ai