Markus Dickinson & Marwa Ragheb June 9, 2013
[Pages:159]Annotation for Learner English Guidelines, v. 0.1
Markus Dickinson & Marwa Ragheb June 9, 2013
ii
Contents
Front matter
vii
0.1 Notes for Researchers . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
0.2 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Getting Started
1
1.1 Quick Intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 General Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Benefit of the doubt . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.5 Underspecification . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Mismatches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.7 Mismatches to define shortest distance . . . . . . . . . . . . . . 8
1.3 Label Inventories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Dependency Relations . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 POS Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Initial Annotation Layers
17
2.1 Segmentation & Tokenization . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Sentence segmentation . . . . . . . . . . . . . . . . . . . . . . 17
2.1.2 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Irregulars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Misspellings . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3 Spelling vs. Morphology . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Spacing issues . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 Lowercase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.6 Anonymization . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.7 Acronyms, unknowns, & foreign terms . . . . . . . . . . . . . 25
2.3 POS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iii
iv
CONTENTS
2.3.1 POS mismatches . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Defining distributional POS . . . . . . . . . . . . . . . . . . . 27 2.4 Lexical violations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.1 Lexical violations vs. Lemma changes . . . . . . . . . . . . . . 32 2.4.2 Lexical violations vs. POS mismatches . . . . . . . . . . . . . 33
3 Dependencies
35
3.1 Morphosyntactic Dependencies . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.2 The syntactic in morphosyntactic dependencies . . . . . . . . . 43
3.2 Where we differ from CHILDES . . . . . . . . . . . . . . . . . . . . . 46
3.2.1 Possessives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.2 NJCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Particles (PRT) . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.4 Dependents of adjectives . . . . . . . . . . . . . . . . . . . . . 47
3.2.5 Heads for verbal chains . . . . . . . . . . . . . . . . . . . . . . 47
3.2.6 Secondary Dependencies . . . . . . . . . . . . . . . . . . . . . 48
3.2.7 Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.8 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.9 Ellipsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Overview of annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Verbs & Verbal relations . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Nouns and Noun relations . . . . . . . . . . . . . . . . . . . . 59
3.3.3 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.4 Arguments vs. Adjuncts . . . . . . . . . . . . . . . . . . . . . 75
3.3.5 Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4 Subcategorization frames . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.1 Grammaticality . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4.2 Determining subcategorization requirements . . . . . . . . . . 82
3.4.3 Specific cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4 A Variety of Dependency Constructions
89
4.1 Attachment decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 INCROOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Extraposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4 wh-words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.1 Displacement . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.2 wh-questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.3 Embedded clauses . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4.4 Relative clauses . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Prepositions vs. Complementizers . . . . . . . . . . . . . . . . . . . . 97
CONTENTS
v
4.6 Comparative constructions . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6.1 as X as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6.2 Xer than . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6.3 Discontinuous comparatives . . . . . . . . . . . . . . . . . . . 100
4.7 Purpose Clauses (cf. in order to) . . . . . . . . . . . . . . . . . . . . . 101 4.8 Appositives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.9 Parentheticals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.10 Ellipsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.11 Linguistic mentions (quotations) . . . . . . . . . . . . . . . . . . . . . 106 4.12 Multi-Word Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 107
5 Learner Innovations
115
5.1 Missing elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.1 Missing head . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.2 Missing argument . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Extra elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2.1 Extra head . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2.2 Extra dependent . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2.3 Extra word with unclear function . . . . . . . . . . . . . . . . 123
5.3 Word order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6 Extended Examples of Difficult Cases
125
6.1 Example 1: one, complement clause . . . . . . . . . . . . . . . . . . . 125
6.2 Example 2: lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Example 3: syntax vs. meaning . . . . . . . . . . . . . . . . . . . . . . 127
6.4 Example 4: lemma, word order, run-on . . . . . . . . . . . . . . . . . . 127
6.5 Example 5: missing verb, double conjunction, unclear phrase . . . . . . 129
6.6 Example 6: syntax vs. discourse, multi-ambiguity . . . . . . . . . . . . 130
6.7 Example 7: syntax vs. meaning . . . . . . . . . . . . . . . . . . . . . . 131
6.8 Example 8: complement clause . . . . . . . . . . . . . . . . . . . . . . 132
6.9 Example 9: Lemma, POS ambiguity . . . . . . . . . . . . . . . . . . . 133
6.10 Example 10: non-finite subordinate clause, problematic misspellings . . 134
A Practical matters
139
A.1 Brat annotation tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A.1.1 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . 140
A.1.2 Basics of annotation . . . . . . . . . . . . . . . . . . . . . . . 140
A.1.3 Annotating a word . . . . . . . . . . . . . . . . . . . . . . . . 140
A.1.4 Annotating dependencies . . . . . . . . . . . . . . . . . . . . . 142
A.2 CoNLL file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
vi
CONTENTS
Front matter
0.1 Notes for Researchers
The annotation scheme here is based on thinking that has evolved over several years, as captured in various papers, listed here. It is also part of an ongoing dissertation project at Indiana University. You can find pdf versions of the papers, as well as other information about the project, at .
? Dickinson and Ragheb (2009): Dependency Annotation for Learner Corpora. Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories (TLT8). Milan, Italy.
? Dickinson and Ragheb (2011): Dependency Annotation of Coordination for Learner Language. Proceedings of the International Conference on Dependency Linguistics. Barcelona, Spain.
? Ragheb and Dickinson (2011): Avoiding the Comparative Fallacy in the Annotation of Learner Corpora. Selected Proceedings of the 2010 Second Language Research Forum: Reconsidering SLA Research, Dimensions, and Directions. Cascadilla Proceedings Project: Somerville, MA. pp. 114?124.
? Ragheb and Dickinson (2012): Defining Syntax for Learner Language Annotation. Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Poster Session. Mumbai, India.
? Ragheb and Dickinson (2013): Inter-annotator Agreement for Dependency Annotation of Learner Language. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, GA.
The goal is to annotate syntactic and, to some extent, morpho-syntactic information, without necessarily encoding errors. The papers give more justification for this, but you can also read section 1.2 for more on what the annotation does and does not encode.
vii
viii
FRONT MATTER
Information for citing these guidelines is:
? Dickinson and Ragheb (2013): Annotation for Learner English Guidelines, v. 0.1. Technical report, Indiana University, Bloomington, IN. June 9, 2013.
And the BiBTeX entry is:
@TechReport{salle:13, author = {Markus Dickinson and Marwa Ragheb}, title = {Annotation for Learner {E}nglish Guidelines, v. 0.1}, institution = {Indiana University}, year = {2013}, address = {Bloomington, IN}, month = {June}, note = {June 9, 2013},
}
It is important to note that, while we have made hundreds of decisions, these are not the only decisions one could have made. We hope that these guidelines are useful, not just to understand what the annotation means, but as a starting point for other annotation and analysis efforts. That is, similar to what we stated in Ragheb and Dickinson (2012), one of the most important contributions of these guidelines may be "to outline the questions which need to be addressed for grammatical annotation of learner language."
As long as these guidelines are still in progress, we welcome feedback and discussion. Contact us at: mragheb@indiana.edu or md7@indiana.edu.
Our data will eventually be released, but as a) we are a small annotation effort, and b) we have had to take significant time determining what to annotate and what the annotation denotes--questions which form the core of a PhD thesis-in-progress--please bear with our slowness.
0.2 Acknowledgements
As can be gathered from the acknowledgments in the papers above, this work has been affected by many researchers in different areas.
We would also like to acknowledge our student annotators, who had the challenging task of annotating while decisions were still being made: Eric Benzschawel, Frank Linville, Shannon Manley, Lauren Swanson, Zachary Wampler, and Samantha Zimny.
The process was a little unusual, in that, for the most part, students were receiving course credit for annotating, and part of the credit was based on discussing syntax,
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- home week activities weebly
- on the interpretation of aspect and tense in chiyao
- nasca workbook english first additional language
- peer reviewed article bioresources
- on the integration of linguistic features into statistical
- rule 1 0 1 terminology rule approved by the supreme court
- a practical english grammar
- let the data speak for themselves a form driven cross
- instructor s answer key rowman littlefield
- basic english grammar fourth edition workbook pdf
Related searches
- mark 9 9 13 commentary
- dickinson nd drivers license renewal
- dickinson nd dot
- watch june 9 online
- department of transportation dickinson nd
- enrichment activity 9 chapter 9 civics
- sermons on matthew 9 9 13
- 6 9 9 6
- grade 9 june 2011 paper 1 memorundum
- grade 9 mathematics june 2021
- north dakota dot dickinson nd
- motor vehicle department dickinson nd