Using parsed corpora to compare the evolution of word ...
[Pages:12]Using parsed corpora to compare the evolution of word
order in English and French
Anthony Kroch University of Pennsylvania
March 2010
ling.upenn.edu/~kroch/handouts/gcoe.pdf
What is a morphosyntactically annotated corpus?
Wednesday March 3 2010
? morphological tagging case, gender, number features on nouns tense, mood, aspect features on verbs, etc.
? lemmatization word sense disambiguation spelling normalization
? part of speech tagging elementary syntactic functions
? syntactic parsing hierarchical structure of phrases/clauses grammatical function of phrases/clauses
Wednesday March 3 2010
Wednesday March 3 2010
An example sentence
((IP-MAT(NP-SBJ (PRO They))
(HVP have)
(NP-ACC(D a)
(ADJ native)
(N justice)
(, ,)
(CP-REL(WNP-1 (WPRO which))
(C 0)
(IP-SUB (NP-SBJ *T*-1)
(VBP knows)
(NP-ACC(Q no)
(N fraud)))))
(. ;))
(ID BEHN-E3-P1,150.48))
Wednesday March 3 2010
Wednesday March 3 2010
Available historical corpus resources for European languages
Wednesday March 3 2010
The annotation task
? Annotation is multilevel and complex, so that using human effort for the whole job is impractical.
? At the same time, accuracy is crucial and unattainable at present with fully automated methods.
? In consequence, parsed corpora are built by interleaving automated analysis with human correction of the output.
Wednesday March 3 2010
English Parsed Corpora I
? Anthony Kroch and Ann Taylor. Penn-Helsinki Parsed Corpus of Middle English, second edition. University of Pennsylvania, 2000.
1.3 million words
? Anthony Kroch, Beatrice Santorini, and Ariel Diertani. PennHelsinki Parsed Corpus of Early Modern English, first edition. University of Pennsylvania, 2004.
1.8 million words
? Anthony Kroch, Beatrice Santorini, and Ariel Diertani. Penn
Parsed Corpus of Modern British English, first edition. University
of Pennsylvania, 2010.
1.0 million words
Wednesday March 3 2010
English Parsed Corpora II
? Ann Taylor,Anthony Warner, Susan Pintzuk, and Frank Beths. York-Toronto-Helsinki Parsed Corpus of Old English Prose, first edition. Oxford Text Archive, 2003.
1.5 million words
? Ann Taylor,Arja Nurmi,Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Parsed Corpus of Early English Correspondence, first edition. Oxford Text Archive, 2006.
2.2 million words
Other languages
? Charlotte Galves et al. Tycho Brahe Corpus of Historical Portuguese, first edition. University of Campinas, S?o Paulo, Brazil, 2003.
2 million words
? France Martineau et al. MCVF Corpus of Historical French, first edition. University of Ottawa, 2006.
1 million words
Wednesday March 3 2010
Total Currently Available Parsed Historical Text
English Portuguese
French
7.8 million 2 million 1 million
Wednesday March 3 2010
Wednesday March 3 2010
Wednesday March 3 2010
The loss of verb-second word order and the decline of topicalization in
English
% Topicalized
Decline of direct object topicalization in English
12
10
8
6
4
2
0 OE (Early) OE (Late) 1151-1250 1251-1350 1351-1420 1421-1500 1501-1569 1570-1639 1639-1710
Date
Wednesday March 3 2010
Frequency of direct object topicalization in modern spoken Dutch (Bouma 2008)
Table 4.2: Summary of Vorfeld occupation of arguments.
Vorfeld
Argument
yes
no
subject direct object indirect object
43 523 3 418 38
18 597 20 432
815
Prop est (%)
pt
70.1 14.3 4.5
% Preposed
Wednesday March 3 2010
Evolution of PP preposing in English
50
40
30
20
10
0 OE (Early) OE (Late) 1151-1250 1251-1350 1351-1420 1421-1500 1501-1569 1570-1639 1639-1710
Date
Wednesday March 3 2010
Wednesday March 3 2010
The history of topicalization in English (Speyer 2008)
Decline of direct object topicalization by subject type
% V2
% V2
? Why does topicalization decline in Middle English but
not disappear? If the change is parametric, it should go to completion. Otherwise, topicalization, a clear case of stylistic variation, might be expected to be stable in frequency over time.
? This question finds an answer in the specific interaction
between parametric settings and stylistic variation in the history of English.
Wednesday March 3 2010
Correlation between frequencies of object topicalization and of V2 in Middle English texts (Wallenberg 2007)
100 90 80 70 60 50 40 30 20 10 0 0
Wednesday March 3 2010
edvern
5
10
15
20
25
% Full DP Topicalization
15
pronoun subjects
10
full DP subjects
5
0 OE (Early) OE (Late) 1151-1250 1251-1350 1351-1420 1421-1500 1501-1569 1570-1639 1639-1710
Date
Wednesday March 3 2010
Distribution of subject types in a corpus of topicalized and non-topicalized sentences in natural speech
personal pronoun 140 46.4
demonstrative pronoun 20 6.6
full noun phrase 142 47.0
Subject type in sentences with in situ objects
personal pronoun 181 90.5%
demonstrative pronoun 2 1%
full noun phrase 17 8.5 %
Subject type in sentences with topicalized objects
Wednesday March 3 2010
Clash avoidance
? The type of topicalization that declines:
(1) The n?wspaper J?hn read; the n?vel M?ry did.
(Compare:The n?wspaper read J?hn.)
? The type of topicalization that doesn't:
(2) The n?wspaper I r?ad; the n?vel I d?dn't.
Translating German topicalized arguments into English in three modern German novels [by B?ll, D?rrenmatt and Grass]
Topicalized to topicalized: G: Mahlkes Haupt bedeckte dieser Hut besonders peinlich. E: On Mahlke's head this hat made a particularly painful impression.
Topicalized to non-topicalized: G: Zu den sechs kamen noch drei weitere. E:Three others joined these six in the afternoon.
Wednesday March 3 2010
Wednesday March 3 2010
Accent placement and topicalization frequencies in translating German topicalized arguments into English
Distribution of contrastive topicalization by focus accent placement in Middle English
topicalization in the English translation
no topicalization in the English
focus accent on the German subject 0 0
25 25
accent elsewhere
31 31
100 100
focus position
distribution of cases
N (total= 207)
focus on subject
113
focus on
focus
tensed verb elsewhere
29
65
% inversion
89
14
71
% of cases
55
14
31
Wednesday March 3 2010
Wednesday March 3 2010
% V2
Frequency of DP topicalization
V2 loss in English sentences with topicalized objects and PPs
80
70
topicalized objects
60
50
40
topicalized PPs
30
20
10
0 1151-1250
1251-1350
1351-1420
1421-1500
Date
1501-1569
1570-1639
1639-1710
Wednesday March 3 2010
V2 in Old and Middle French
The loss of verb-second word order in French
Wednesday March 3 2010
Decline of direct object topicalization in French
(1)l'estreu li tint sun uncle Guinemer
the stirrup him held his uncle Guinemer
Roland 27.329
(2)Espaigne vus durat il en fiet
Spain you will-give he in fief
Roland, 36.446
(3)or est ele bien venue
now is she welcome
Yvain 43.1440
Wednesday March 3 2010
0.2
0.15
0.1
0.05
0 1101-1150 1151-1200 1201-1250 1251-1300 1301-1350 1351-1400 1401-1450 1451-1500 1501-1550 1551-1600
Old French
Middle French
Modern French
Wednesday March 3 2010
Evolution of adverb fronting and V2 word order in French
Evolution of V2 word order in French
Frequency of adverb fronting
0.6
1
0.45
0.75
0.3
0.5
0.15
0.25
0
0
1101-1150 1151-1200 1201-1250 1251-1300 1301-1350 1351-1400 1401-1450 1451-1500 1501-1550 1551-1600
Old French
Middle French
Modern French
Frequency of V2 word order
Frequency V2
1
0.9
topicalized objects
0.8
0.7
0.6
topicalized adverbs
0.5
0.4
0.3
0.2
0.1
0 1101-1150 1151-1200 1201-1250 1251-1300 1301-1350 1351-1400 1401-1450 1451-1500 1501-1550 1551-1600
Old French
Middle French
Modern French
Wednesday March 3 2010
Wednesday March 3 2010
% V2
V2 loss in English sentences with topicalized objects and PPs
80
70
topicalized objects
60
50
40
topicalized PPs
30
20
10
0 1151-1250
1251-1350
1351-1420
1421-1500
Date
1501-1569
1570-1639
1639-1710
Wednesday March 3 2010
Why does French completely lose object topicalization?
Wednesday March 3 2010
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- let the data speak for themselves a form driven cross
- rule 1 0 1 terminology rule approved by the supreme court
- using parsed corpora to compare the evolution of word
- chapter iv finding and discussion
- serbian an essential grammar
- problems of english grammar hm
- the sketch engine
- quantitative research dissertation chapters 4 and 5
- comparing the evolution of v2 in english and french
- a comparative analysis of the arabic and english verb
Related searches
- the evolution of surgery
- the evolution of humans
- the evolution of humans timeline
- the evolution of earth
- the evolution of life
- the evolution of slavery
- vlookup to compare two sets of data
- the evolution of the airplane
- the evolution of cameras timeline
- timeline the evolution of life
- the evolution of technology
- the evolution of society